Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise

Interests
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Interests

Amazon API Gateway

Amazon Athena

Amazon CloudFront

Amazon CloudWatch

Amazon Cognito

Amazon Connect

Amazon DynamoDB

Amazon Elastic Kubernetes Service (EKS)

Amazon ElastiCache

Amazon EventBridge

Amazon Kinesis

Amazon QuickSight

Amazon RDS

Amazon Redshift

Amazon SageMaker

Amazon WorkSpaces

AWS CloudFormation

AWS CodePipeline

AWS Config

AWS Control Tower

AWS Database Migration Service

AWS Fargate

AWS Glue

AWS Glue Databrew

AWS IoT Architecture

AWS IoT Devices

AWS IoT Greengrass

AWS IoT SiteWise

AWS Lambda

AWS MAP (Migration Acceleration Program)

AWS Serverless Application Model (SAM)

AWS WAF

AWS Well-Architected Framework

Containers on AWS

Data & Analytics on AWS

DevOps Security in AWS

Disaster Recovery

Industrial Machine Connectivity/Connected Factory

Machine Learning Ops

Serverless Analytics in AWS

Serverless Architectures in AWS

Overview

Amazon API Gateway is a fully-managed, easily configurable entry point for your web services.

Overview

Analyze and query data easily at a mass scale from a variety of platform services using Amazon Athena.

Overview

Amazon CloudFront is a content delivery network (CDN) which is a distributed system that delivers applications, websites, and content to users based on factors such as users’ geographical locations, or the origins of the content and delivery servers.

Overview

CloudWatch is an AWS service that allows for basic-to-detailed performance monitoring of your applications and AWS environment resources within a single platform.

Overview

Make it easy to add user sign-up, sign-in, and access control to your web and mobile applications with Amazon Cognito.

Overview

Amazon Connect is an affordable omni-channel cloud-based contact center that enables companies to deliver advanced level support to customers without the burden of maintaining on-premise legacy systems.

Overview

Amazon DynamoDB is the one of the fastest and most versatile, serverless key-value and document database options available in the cloud today.

Overview

The benefits of Kubernetes without the upfront infrastructure hassles.

Overview

Traditional relational databases do not scale well horizontally, and even right-sized NoSQL databases can become a bottleneck under high traffic.

Overview

AWS EventBridge makes it easy to connect applications together using data from Software-as-a-Service(SaaS), AWS services, and one’s own applications.

Overview

An AWS-managed service, Kinesis is a solution that allows users to analyze streaming data in real-time.

Overview

QuickSight is an AWS-managed business intelligence tool that allows you to quickly assess your business.

Overview

Managed Relational Database Service

Overview

An AWS cloud data warehousing solution that stands out.

Overview

Amazon SageMaker is a fully managed service that allows developers and data scientists to build, train, and deploy machine learning (ML) models much faster and efficiently for your specific use cases.

Overview

Amazon WorkSpaces is a managed, secure Desktop-as-a-Service (DaaS) that helps you cut the noise and cost of traditional VDI platforms.

Overview

CloudFormation is a free AWS service that enables taking declarative code and creating AWS resources configured exactly as declared via templates.

Overview

A continuous delivery service.

Overview

Continually assess, audit, and evaluate your AWS resources using AWS Config.

Overview

Set up and govern multi-account AWS environments with AWS Control Tower.

Overview

Migrate a wide variety of databases to or within AWS utilizing AWS Database Migration Service.

Overview

With AWS Fargate, you can deploy containers in AWS without managing any underlying host infrastructure.

Overview

AWS Glue is a fully managed, scalable, serverless data ingestion service that enables customers to extract, transform, and load (ETL) data for analytics.

Overview

AWS Glue DataBrew is an interactive data preparation tool for cleaning, normalizing, analyzing, and adjusting datasets.

Overview

We break down IoT ecosystems into five foundational components that revolve around cloud-based data insights.

Overview

In general IoT device platforms can be divided into two categories: Embedded Systems and Edge devices.

Overview

AWS IoT Greengrass is an open-source runtime for IoT devices to interact with AWS cloud services.

Overview

IoT SiteWise is an AWS service that can be used to collect, process, analyze and monitor industrial IoT data on AWS.

Overview

AWS Lambda is one of the most revolutionary serverless compute services offered in cloud computing today, allowing you to easily run code for practically any type of application or backend service.

Overview

MAP helps you accelerate cloud migration and modernization with an outcome-driven methodology.

Overview

Enable your team to build serverless applications faster with this open-source framework from AWS.

Overview

Protect against web attacks.

Overview

A Complete Guide to the AWS Well-Architected Framework.

Overview

Amazon Elastic Container Registry (ECR) makes data storage, management sharing, and deployment possible from anywhere.

Overview

AWS provides integrated end-to-end solutions for modern data management and advanced analytics.

Overview

Applying Devops Security for an AWS application.

Overview

A Disaster Recovery Plan (DRP) is a structured and detailed set of instructions geared to recover a system and networks in the event of failure or attack, with the aim of helping the organization get back to being operational as fast as possible.

Overview

In addition to the full range of AWS IoT architecture and support capabilities, we offer an Industrial IoT Proof of Value (POV) solution.

Overview

Machine learning operations (MLOps) is the umbrella term for best practices surrounding machine learning.

Overview

Using AWS serverless services as building blocks, you can now easily and rapidly build data lakes and data pipelines that process and analyze petabytes of data without needing to manage any infrastructure components.

Overview

Let AWS handle the burden of server management so you can focus your time on solutions for clients. By adopting a serverless architecture, you tremendously reduce the operational complexity of running your application, enabling you to focus on delivering new features faster without compromising security, reliability, and performance.

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Serverless

From relational DB to single DynamoDB table: a step-by-step exploration

Just because it's NoSQL, doesn't mean it's non-relational

Forrest Brazeal | Jan 02 2019

Seriously, watch that video, then come back to this article. You won't be disappointed.

Of all the sessions I’ve seen from AWS re:Invent 2018, my favorite is certainly this bewildering drop-kick of NoSQL expertise from AWS Principal Technologist and certified outer space wizard Rick Houlihan.

So, for the first 45 minutes of this #reInvent session I was nodding my head like "Yup, that's how I think about DynamoDB." Then Rick morphed into some kind of NoSQL wizard from outer space and my mind exploded. Absolute must watch: https://t.co/YNr6TcMEJI
— Forrest Brazeal (@forrestbrazeal) December 4, 2018

Rick cracks the lid on a can of worms that many of us who design DynamoDB tables try to avoid: the fact that DynamoDB is not just a key-value store for simple item lookups. If you design it properly, a single DynamoDB table can handle the access patterns of a legitimate multi-table relational database without breaking a sweat.

That little phrase “designed properly” is the caveat, of course. Rick’s video, and the related documentation that I suspect he had a hand in, are densely packed with advice on how to construct a DynamoDB table that will match your relational DB’s query performance at arbitrary horizontal scale.

Not gonna lie though, it’s heavy stuff, especially for us non-certified outer space wizards.

So in this post, I want to work through some DynamoDB single-table design considerations in step-by-step detail. We won’t cover every possible design pattern, but hopefully you’ll start to get a feel for the possible use cases and the inevitable tradeoffs. We’ll conclude with the ultimate question: is any of this a good idea when relational databases are still, like, right over there?

Help Me Implement DynamoDB for a
Data or Machine Learning Project

From RDB to DynamoDB: a practical example

Fair warning: I dive in deep! If you just want to skip to the high-level conclusion, take the shortcut!

So what relational database should we, er, Dynamize? I decided to go with the most SQL-y example I could think of: Northwind, the classic relational database used to teach the Microsoft Access product back in the ’90s.

Here’s the full Northwind ERD. It’s not huge, but it’s at least as complex as the data requirements of many modern microservices you might want to back with DynamoDB.

Lo and behold, the sample data for the Northwind schema is available in cleaned-up CSV form on Github. We’ll ignore a couple of the ancillary tables to focus on the “big eight”: Categories, Customers, Employees, Orders/Order Details, Products, Shippers, and Suppliers.

I’ve included all the code necessary to create the DynamoDB table and load the data as shown throughout this post in this Github repo. Feel free to check it out and play along!

Step by step

Now, how do we turn our ERD and CSV tables into a DynamoDB table?

Step 1: Define the access patterns you think you’ll need

Right away, we come up against a huge difference between DynamoDB and a relational database: our data model is going to be wholly pragmatic, rather than theoretically self-consistent. We’re going to mold our table specifically around the things we need to do with the data, kind of like spraying insulation foam into a roof.

In the real world, we’d gather these requirements from the app team, prospective users, etc. This isn’t a real use case, though, so we’ll have to invent some access patterns by looking at the ERD. Here are some arbitrary query requirements I came up with:

Get employee by employee ID
Get direct reports for an employee
Get discontinued products
List all orders of a given product
Get the most recent 25 orders
Get shippers by name
Get customers by contact name
List all products included in an order
Get suppliers by country and region

All these would be simple SQL queries involving at most a couple of joins. (We’ll save write patterns for a future post.) But remember, we don’t have JOIN or GROUP BY in DynamoDB. Instead, we’ve got to structure our data in such a way that it’s “pre-joined” right in the table.

Step 2: Create a DynamoDB table with three generic attributes: “partition key”, “sort key”, and “data”

This brings us to one of the most important precepts in DynamoDB single-table design:

Attribute names have no relationship to attribute values.

Not only is our “key-value store” schema-less; in a way, it’s also keyless. We need to get used to thinking of the attribute names on a DynamoDB item as arbitrary. Our “partition key” attribute on the table may contain a different type of value depending on whether it’s an Order, a Product, an Employee, or whatever:

Storing different types of data in the same attribute feels weird and squicky, I know. But it’s actually super powerful. This technique is called index overloading, and it will enable us to squash tons of access patterns into a very small number of indexes.

The three generic attributes will be used to support two indexes: the main table index which uses pk as the partition and sk as the sort key, and a global secondary index which uses sk as the partition and data as the sort.

What's the big deal about indexes, anyway? In general, your DynamoDB cost and performance will be best if you restrict yourself to "gets" (key/value lookups on single items) and "queries" (conditional lookup on items that have the same partition key, but different range/sort keys). Scanning, where you indiscriminately gobble all items from a table, is a slow, expensive antipattern. Useful gets and queries require ... useful indexes. So here we are.

These two indexes, as we’ll see, will open up a huge number of access patterns. The other attributes in the table can be named whatever you want; they don’t have to be consistent between items. But even if you give every attribute of every item a random name, it doesn’t affect the behavior of the table at all. (It just makes the table layout harder for humans to read and understand … as we’ll discuss further below.)

Step 3: Create an item in the DynamoDB table for each record in each entity (non-join) table

Each Customer, each Order, each Shipper record gets an item in our new table. In each of our cases, we’ll make the pk attribute correspond to the primary key of the relational record. The sk and data attributes, though, we’ll vary based on the kinds of queries we need to write. See the breakdown below:

We’ve left out the “OrderDetails” join table for now; it’ll get special treatment in the next step.

Let’s note a couple of tricks here:

The Order, Product, and Supplier records use a static value as the partition key for GS1. This lets us look up all items of a particular type (such as all orders that match a date range) without resorting to an expensive scan operation. You can think of this as a workaround for the loss of our precious attribute keys: we’re using a value as a key instead.
We’ve used a composite value called a hierarchical sort key as the data field for the Customer and Supplier records. By combining all the address details into one field, we can get country, region and city lookups for the price of a single GSI.
We’ve used the “discontinued” value as a sort key on the GSI for Product items. Assuming we only populate that value for discontinued products (which isn’t true in the raw Northwind data), we can search for discontinued items without having to scan the entire “PRODUCT” partition. This technique is called a sparse index.

We're basically playing Tetris with our data at this point, sliding different values in and out of our limited GSI slots to get the maximum utility. And we're not done, because we still have to...

Why are we so obsessed with minimizing global secondary indexes? Wouldn't it be easier just to slap a ton of indexes on this table? For a long time, the answer was no; DynamoDB tables had a hard limit of 5 GSIs. DynamoDB just recently raised that limit to a soft 20, meaning you probably can have an undefined number of GSIs on a table.

But lots of GSIs make writes geometrically more expensive, consuming extra capacity units each time you update an item. So we'll win on cost and performance if we can squash our lookups down into the smallest possible index footprint.

Step 4: Represent many-to-many relationships with adjacency lists

DynamoDB best practices borrow from graph theory the concept of adjacency lists, which are … a bit of a slippery concept. To hang onto the graph idea for a moment, you can think of all the items we’ve placed in our table so far as “node” records. They correspond to entities, like customers and orders. We’re now going to create some additional “edge” records that represent the many-to-many relationships between nodes.

In the Northwind dataset, the many-to-many relationship we’ll focus on is expressed in the OrderDetails join table. An order can have many products, one product can appear in many orders, and the attributes of that relationship are expressed in OrderDetails. We’ll model this relationship by placing the OrderDetails records in the Order partition of our table.

Why are we putting all this stuff in one table again? The DynamoDB documentation emphatically recommends using as few tables as possible, usually one per app/service unless you have hugely divergent access patterns. Locating your related data close together will give you Dynamo's performance and scale benefits without the latency and frustration of querying multiple tables via HTTP and trying to "join" them client-side.

That said, I see lots of relational databases that should be split into separate DynamoDB tables, because the same database is used as a dumping ground for all kinds of unrelated data. That 70-GB table of access logs in your Postgres database doesn't need to go in the same DynamoDB table with your product and order data.

What does this get us? We now have the ability to query the primary table partition to get all products in an order. We can query the GS1 PK to do a reverse lookup on all the orders of a given product. This is the adjacency list pattern. You can try it yourself with the "EmployeeTerritories" join table in the Northwind data, which we haven't included here. You may need to break this access pattern out into its own GSI if you take it much further.

Step 5 (optional): Create more GSIs to support additional access patterns

Believe it or not, even with all the tricks we used back in step 2, a single GSI may not be enough to support every possible query! (Shocking, I know.) The good news is that you can add additional GSIs, if needed, without totally disrupting your carefully pieced-together Tetris board. The DynamoDB docs have a good example of adding a second GSI with specially-constructed partition and sort keys to handle certain types of range queries.

In our case, though, the main table partition plus one GSI are more than enough to handle all the use cases we defined in step 1. Let’s break down the queries:

What about sharding? We've been thinking a lot about how to make our single-table queries easy, but not necessarily about how to make them fast. Even with DynamoDB's new adaptive capacity functionality, you want to keep your access patterns smoothed out so you don't have disproportionate load on a single partition. This often involves creating an index with randomized keys. Alex DeBrie has a marvelous breakdown in his DynamoDB guide of how this works, and when you might need it. (In particular, sharding would be important for our GSIs with a static partition key, like "ORDER" -- right now that's a lot of records packed into a single partition.)

You can see working examples of all these queries using the AWS Python SDK in the accompanying repo. Plus, we've preserved individual key-value lookup of every entity in the table, so we haven't strayed too far from DynamoDB's roots.

What can’t we do?

We now have a basic blueprint to convert a relational database into a single DynamoDB table. But remember, this is a spray-foam approach to data. Like insulation hardened in the contours of a ceiling, our DynamoDB single-table data model is both informal and inflexible. It’s not necessarily going to accommodate new access patterns.

For example, suppose we need to see all products in a given category. The “Product” records have a CategoryID, but it’s not included in any of our indexes at the moment. Our options are:

Query all products, filter by category ID (not the most optimal query), or
Break out new items in one of our existing partitions that index product data by category ID (creates more duplicate data, which is potentially harder to manage), or
Create a new GSI with Product ID as partition and Category ID as sort key (increases table cost)

As you can see, tradeoffs abound! Only you can decide which option makes the most sense for the long-term health of your app and the sanity of your developers. The more GSIs with generic attributes you add, the harder this table will be to read and understand without a load of supporting documentation.

In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.

Which leads to the most important question of all:

Is modeling my relational database in a single DynamoDB table really a good idea?

About a year ago, I wrote a fairly popular article called “Why DynamoDB isn’t for everyone”. Many of the technical criticisms of DynamoDB I put forth at that time (lack of operational controls such as backup/restore; a persistent problem with hot keys) have since been partially or fully resolved due to a truly awe-inspiring run of feature releases from the DynamoDB team.

However, the central argument of that article remains valid: DynamoDB is a powerful tool when used properly, but if you don’t know what you’re doing it’s a deceptively user-friendly guide into madness. And the further you stray into esoteric applications like relational modeling, the more sure you’d better be that you know what you’re getting into. Especially with SQL-friendly “serverless” databases like Amazon Aurora hitting their stride, you have a lot of fully-managed options with a smaller learning curve.

That said, remember that Amazon’s original Dynamo paper was predicated on the observation that most interactions with their vast Oracle databases were simple key-value reads, no JOINs or other relational magic required.

In the same way, a lot of superficially relational datasets boil down into a relatively small number of usage patterns. If you can work through the steps in this post to identify and implement those patterns for your data, DynamoDB’s scale, performance and low operational overhead may seem more compelling than ever.

Unless, you know, you’re still a big fan of Microsoft Access.

Thanks to Alex DeBrie, Jared Short, and Andy Warzon for providing technical review on this post.

Need DynamoDB expertise? Trek10 has been there, done that. If we can help you, feel free to let us know.

Author

Forrest Brazeal

Go to Stories by Forrest

Similar Blog

Spotlight

AWS Lambda Functions: Return Response and Continue Executing

A how-to guide using the Node.js Lambda runtime.

Joel Haubold | Dec 07 2023
5 min read

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events

How to synthesize an (almost) identical payload using Amazon EventBridge rules.

Joel Haubold | Nov 02 2023
5 min read

Cloud Native

How and When to Use Amazon EventBridge Pipes

Amazon EventBridge Pipes: Useful, but not magical.

Matt Skillman | Aug 28 2023
4 min read

Hire the Experts

Interested in using our knowledge to further your business goals?

Explore Our Services

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview