“No matter what demand users place on the system, we’re incredibly confident it will scale. It’s time to retire our Oracle servers.” - Sean Mare at GIA
GIA Puts Diamonds in the AWS Cloud
GIA invented the 4 Cs of diamond quality—Cut, Color, Clarity, and Carat—and they remain the leaders in assessing gem quality all over the world.
Their expert staff grades about twenty thousand gemstones a day. Each gemstone that comes through their labs gets a detailed quality report, which is then used by jewelers, wholesalers, and consumers worldwide to verify the quality of their diamonds and other precious gems.
Until very recently, this system relied on a mono-repo application that was over a decade old. GIA wanted to move a critical piece of the application into the cloud, and monetize it as a highly available API that their external partners could access on demand.
However, this system was used by almost every team at GIA, and they would have to have a careful strategy for rearchitecting it.
“Making changes to this application would be like changing the engines on a 747 while it's in flight.” - Sean Mare, Application Architect at GIA
To help them maneuver such a delicate shift, GIA reached out to Trek10.
Thinking for the future
GIA’s initial plans for their serverless API were solid—an AWS Lambda and API Gateway powered endpoint, GraphQL for API maintenance simplicity, and a DynamoDB backend:
But when building a serverless service against a monolith, interesting challenges can sometimes surface.
There were dozens of millions of existing gem reports and terabytes of multimedia data (images of gems and report print-outs). Data needed validation on the way in to make sure that any media referenced by reports was present in the cloud, and it would have to be performant for users accessing the data around the globe.
Trek10 suggested that GIA’s data ingestion should be its own microservice. Separating out the data ingestion and validation from the client-facing API was a great way to not only decrease risk on deployments, but to make it easier and faster to iterate on the two systems.
A Kinesis event stream architecture would make the system extensible, as well, making it more resilient to future needs and changes. Plus, the data could be archived in S3 as a general backup.
“We really appreciated being able to break the project into smaller phases. We took things one chunk at a time, spent a couple months, then took on the next phase. Trek10, it amazes me how fast you can knock things out—especially compared to other vendors we’ve worked with.” - Sean Mare
The stream architecture
The on-prem GIA monolith would produce reports data on to a RawReports stream, after which a Lambda function would do all sorts of schema and data validation. Then that RawReports consumer would place all of the validated report events on to another generalized stream component:
This decoupled ingestion on the stream architecture kept things flexible and performant. Valid reports could go into a data store for the client-facing API, or direct to other subscribers.
GIA could easily extend this out for interfacing with other partners via webhooks, without disturbing any existing components.
Designing for global performance
For high availability and performance, Trek10 and GIA decided to implement an active-active multi-region setup for the client-facing API.
Clients can get routed to two regions, depending on geo-location or latency. But to make this work, data needed to be synchronized in both regions, including some quota metrics and checks.
In the end design, there are only two client-facing Lambda functions involved, which leaves very little room for something to break:
All of this comes together as a highly available, low-management system that is incredibly cost effective. GIA’s biggest costs for the system are bandwidth and S3 storage costs; everything else is almost negligible.
“During load testing, we hammered the system. We brought an AWS region down and saw it failover in a matter of seconds.” - Sean Mare
The cloud means planning for surprises
Just as Trek10 and GIA were finalizing their updated architecture designs, AWS AppSync went into beta. AppSync, a managed GraphQL service, would let GIA further reduce business logic complexity involved running the API.
They decided to change course and put AppSync into their plans.
The lesson here is: the cloud moves fast. What wasn’t possible when you started designing an architecture might be possible by the time you finish it. Don’t be afraid to lean on the latest and best managed services.
The reporting system today
GIA’s gem quality reporting system is going strong, accessed daily by jewelers all over the world.
It currently has about 40M records in DynamoDB and 56TB of data stored in S3. In terms of performance and availability, It has a 0.3s response time, has had 100% uptime so far, and GIA employs 0 dedicated support engineers to manage it.
“We couldn’t be happier with how things have turned out. No matter what demand users place on the system, we’re incredibly confident it will scale. It’s time to retire our Oracle servers.” - Sean Mare