Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise

Interests
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Interests

Amazon API Gateway

Amazon Athena

Amazon CloudFront

Amazon CloudWatch

Amazon Cognito

Amazon Connect

Amazon DynamoDB

Amazon Elastic Kubernetes Service (EKS)

Amazon ElastiCache

Amazon EventBridge

Amazon Kinesis

Amazon QuickSight

Amazon RDS

Amazon Redshift

Amazon SageMaker

Amazon WorkSpaces

AWS CloudFormation

AWS CodePipeline

AWS Config

AWS Control Tower

AWS Database Migration Service

AWS Fargate

AWS Glue

AWS Glue Databrew

AWS IoT Architecture

AWS IoT Devices

AWS IoT Greengrass

AWS IoT SiteWise

AWS Lambda

AWS MAP (Migration Acceleration Program)

AWS Serverless Application Model (SAM)

AWS WAF

AWS Well-Architected Framework

Containers on AWS

Data & Analytics on AWS

DevOps Security in AWS

Disaster Recovery

Industrial Machine Connectivity/Connected Factory

Machine Learning Ops

Serverless Analytics in AWS

Serverless Architectures in AWS

Overview

Amazon API Gateway is a fully-managed, easily configurable entry point for your web services.

Overview

Analyze and query data easily at a mass scale from a variety of platform services using Amazon Athena.

Overview

Amazon CloudFront is a content delivery network (CDN) which is a distributed system that delivers applications, websites, and content to users based on factors such as users’ geographical locations, or the origins of the content and delivery servers.

Overview

CloudWatch is an AWS service that allows for basic-to-detailed performance monitoring of your applications and AWS environment resources within a single platform.

Overview

Make it easy to add user sign-up, sign-in, and access control to your web and mobile applications with Amazon Cognito.

Overview

Amazon Connect is an affordable omni-channel cloud-based contact center that enables companies to deliver advanced level support to customers without the burden of maintaining on-premise legacy systems.

Overview

Amazon DynamoDB is the one of the fastest and most versatile, serverless key-value and document database options available in the cloud today.

Overview

The benefits of Kubernetes without the upfront infrastructure hassles.

Overview

Traditional relational databases do not scale well horizontally, and even right-sized NoSQL databases can become a bottleneck under high traffic.

Overview

AWS EventBridge makes it easy to connect applications together using data from Software-as-a-Service(SaaS), AWS services, and one’s own applications.

Overview

An AWS-managed service, Kinesis is a solution that allows users to analyze streaming data in real-time.

Overview

QuickSight is an AWS-managed business intelligence tool that allows you to quickly assess your business.

Overview

Managed Relational Database Service

Overview

An AWS cloud data warehousing solution that stands out.

Overview

Amazon SageMaker is a fully managed service that allows developers and data scientists to build, train, and deploy machine learning (ML) models much faster and efficiently for your specific use cases.

Overview

Amazon WorkSpaces is a managed, secure Desktop-as-a-Service (DaaS) that helps you cut the noise and cost of traditional VDI platforms.

Overview

CloudFormation is a free AWS service that enables taking declarative code and creating AWS resources configured exactly as declared via templates.

Overview

A continuous delivery service.

Overview

Continually assess, audit, and evaluate your AWS resources using AWS Config.

Overview

Set up and govern multi-account AWS environments with AWS Control Tower.

Overview

Migrate a wide variety of databases to or within AWS utilizing AWS Database Migration Service.

Overview

With AWS Fargate, you can deploy containers in AWS without managing any underlying host infrastructure.

Overview

AWS Glue is a fully managed, scalable, serverless data ingestion service that enables customers to extract, transform, and load (ETL) data for analytics.

Overview

AWS Glue DataBrew is an interactive data preparation tool for cleaning, normalizing, analyzing, and adjusting datasets.

Overview

We break down IoT ecosystems into five foundational components that revolve around cloud-based data insights.

Overview

In general IoT device platforms can be divided into two categories: Embedded Systems and Edge devices.

Overview

AWS IoT Greengrass is an open-source runtime for IoT devices to interact with AWS cloud services.

Overview

IoT SiteWise is an AWS service that can be used to collect, process, analyze and monitor industrial IoT data on AWS.

Overview

AWS Lambda is one of the most revolutionary serverless compute services offered in cloud computing today, allowing you to easily run code for practically any type of application or backend service.

Overview

MAP helps you accelerate cloud migration and modernization with an outcome-driven methodology.

Overview

Enable your team to build serverless applications faster with this open-source framework from AWS.

Overview

Protect against web attacks.

Overview

A Complete Guide to the AWS Well-Architected Framework.

Overview

Amazon Elastic Container Registry (ECR) makes data storage, management sharing, and deployment possible from anywhere.

Overview

AWS provides integrated end-to-end solutions for modern data management and advanced analytics.

Overview

Applying Devops Security for an AWS application.

Overview

A Disaster Recovery Plan (DRP) is a structured and detailed set of instructions geared to recover a system and networks in the event of failure or attack, with the aim of helping the organization get back to being operational as fast as possible.

Overview

In addition to the full range of AWS IoT architecture and support capabilities, we offer an Industrial IoT Proof of Value (POV) solution.

Overview

Machine learning operations (MLOps) is the umbrella term for best practices surrounding machine learning.

Overview

Using AWS serverless services as building blocks, you can now easily and rapidly build data lakes and data pipelines that process and analyze petabytes of data without needing to manage any infrastructure components.

Overview

Let AWS handle the burden of server management so you can focus your time on solutions for clients. By adopting a serverless architecture, you tremendously reduce the operational complexity of running your application, enabling you to focus on delivering new features faster without compromising security, reliability, and performance.

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Databases

Leveraging ULIDs to create order in unordered datastores

Unique Lexicographically Sortable Identifiers can be leveraged to query s3 objects by time without a backing metadata store, here's how!

Ryan Scott Brown | Apr 14 2020

The rise of distributed data stores and the general decomposition of systems into smaller pieces means that coordination between each server, service, or function is less available. In my first applications, unique ID generation meant setting auto_increment=True on a column in the SQL database. Easy, done, no problem. Today, each microservice has its own data source(s) and NoSQL stores are common. Every NoSQL DB is "NoSQL" in its own way, but they usually eschew coordinated and single-writer solutions in the name of reliability/performance/both. You can't have an auto-increment column without implementing the coordination client-side.

Using numbers as identifiers also creates problems. Auto-incrementing can lead to enumeration-based attacks. Fields can have fixed sizes. These issues can go unrealized until you overflow the uint32 field, and now your logs are a pile of ID conflict errors. Instead of integers, we can use a different kind of fixed-length field and make it non-sequential so that different hosts can generate IDs without a central coordinating point.

UUID's are an improvement and avoid collisions in distributed settings, but being strictly random you don't have a way to easily sort them or determine rough order. Segment blogged a while ago about one replacement for UUIDs with the KSUID (K-Sortable Universal ID) but it has limitations and uses a strange 14e8 offset to avoid running out of epoch time in the next 100 years.

Enter the Unique Lexicographically Sortable Identifier (ULID). These are sortable, high-entropy identifiers that we can generate anywhere in our pipeline without coordination and have confidence that there won't be collisions. A ULID looks like 01E5TZRCM5WZYPB2BH7KMYR5HT, and the first 10 characters are a timestamp, and the next 16 characters are random.

What About UUID?

I came across the need for ULID / KSUID when working with S3 objects that needed to be named, but I also wanted to be able to query for recent objects. Typically, when I need a random identifier, I reach for UUID-v4. Why v4?

UUID v1 and v2 contain MAC addresses based on the host that generates them. This isn't really a security issue since an L2 address won't help you much on the public internet. However, it does mean if my UUIDs are generated in Lambdas, the MAC addresses have no semantic value. I can't SSH into my Lambda and look up the MAC or otherwise use that information.
UUID v3 requires a seed, and I would just be using random.randint() or the equivalent to pick my seed value. Any system that requires a seed means that I have to think about what to use as a seed, how that impacts the randomness, and how that might impact security or collisions.
UUID v4 is random, but because it is entirely random, it provides no semantic overloading.

Why would I want to semantically overload the UUID in my system? I took a cue from the Wizard of Semantic Overloading himself, Rick Houlihan. I've spent time on single-table DynamoDB designs and that way of thinking spilled over into the design of my S3 storage system.

ULIDs to Enable S3 Time Queries

Index-based thinking can be illuminating, especially since IT is full of intrinsically sorted storage systems. S3 sorts your object keys and prefixes when returning them, no matter what order they were added in.

Keys are selected for listing by bucket and prefix. For example, consider a bucket named "dictionary" that contains a key for every English word. You might make a call to list all the keys in that bucket that start with the letter "q". List results are always returned in UTF-8 binary order.

— AWS S3 documentation

What does this mean for our application? It means if we provide sortable keys to S3 and sort them in the order we actually want to receive items in, then we will be able to get our objects in order without having to do any sort client-side. Using a ULID in an object name (or better, splitting a ULID with a prefix) lets us avoid collisions and also prevent enumeration-related attacks on our objects.

Using ULIDs in Python is simple. First, you need to install the ulid-py library, then you can import ulid and start generating identifiers:

This would upload an object with just a ULID as the name, with the contents abc. Then when we list objects in the CLI or in any other app, they're sorted by the time they were created even if there were several new objects in a single millisecond.

$ aws --profile personal s3 ls s3://t10-blog-ulids
2020-04-13 21:17:53          3 01E5V474WE4DE0N63ZWT7P6YWH
2020-04-13 21:17:54          3 01E5V475QFRCEHXKJAS3BRS6BV
2020-04-13 21:24:51          3 01E5V4KXFTP52C9M5DVPQ2XR8T
2020-04-13 21:48:33          3 01E5V5Z9J0GX72VFSENBCKMHF0

Automatic sorting is helpful, and of course, ULIDs can be formatted in different ways depending on your needs.

>>> import ulid
>>> u = ulid.new()
>>> u.str
'01E5V7GWA9CHP337PB8SR18ZP4'
>>> u.bytes
b'\x01qvxqIdl1\x9e\xcbFp\x14~\xc4'
>>> u.int
1918360407572615930874316424782053060
>>> u.uuid
UUID('01717a42-cde2-b5be-eed8-55222c867b58')
>>> u.float
1.918360407572616e+36
>>> bin(u.int)
'0b1011100010111011001111000011100010100100101100100011011000011000110011110110010110100011001110000000101000111111011000100'

Especially useful is the u.uuid type that allows you to replace existing UUIDs in your system with ULIDs without changing the value format. This means you can start benefitting from the ordering properties of ULIDs in existing systems.

Decentralized Generation

Because the ULID format of 48-bit timestamp + 100-bit randomness means that we get 100 bits per millisecond which almost eliminates the chance of collisions*. Contrast this with our auto-incrementing numeric column. The incrementation causes us to have to centralize managing that number in the database to avoid ID conflicts. With ULIDs, we can generate IDs in any of our Lambdas, containers, or EC2 instances.

Because the IDs are timestamped natively, we can tolerate partitions and delays. Inserting late data doesn't cause ordering problems because the items are timestamped when the ID is generated, and we can always add another timestamp field at ingestion if needed. The IDs allow us to maintain order and insert data late without having to add a separate ingestion process.

Distributed generation does mean that there is no "true clock" that allows us to perfectly order the items we put ULIDs on. This trade-off between a central synchronization point (for ordering) and greater reliability/resiliency is common in systems of any size and becomes nearly-required at scale.

Further, you may choose to go off-spec and use the 2 most-significant bits of the ULID that our encoding gives us. This is possible because there are 150 bits available in text representation, minus 148 used by the timestamp and randomness in the spec. You can get 4 sub-types of ULID in the same spirit of descriptive IDs like i-0123456789 and AKIAXNMVN by making the ID itself contain an encoded type.

* If you are Amazon Retail, don't follow this advice, one in a million things happen a couple times an hour at sufficient scale.

ULIDs in DynamoDB

The new trend in DynamoDB is single-table designs. Using a single table with a design allowing different GSIs to serve multiple queries. Rick tweeted this real-world example from the Kindle Collection Rights service that serves 9 queries with 4 GSIs.

Kindle DynamoDB Table Design

These single-table designs rely on using properties that are sortable to enable queries, typically by combining Hash&Range in novel ways for each type of object. For example, you might create a key like Hash=Org#Trek10 Range=Post#2020-04-03#ca21477c-5693-4f2d-92e5-068102b24be9 which is composed of a type, org name, create time, and UUIDv4. Instead, with a ULID, you would be able to obviate the timestamp and ID combination and use a range key of Range=Post#01E5WF8AERWH9F8PDTQ5K4GW7R. This is a more efficient representation which also lets you use that same ID as a foreign key.

ULIDs can also be used to associate similar items that are created at the same time by manipulating the randomness values to be monotonic.

Take this NodeJS example that creates a ULID then uses the randomness from that ULID to create a series of related items that will lexically sort together:

> const monotonicFactory = require('ulid').monotonicFactory;
> const ulid = monotonicFactory()
> ulid(1586872590191)
'01E5WFM7VFPWCNF4DM76ADV80W'
> ulid(1586872590191)
'01E5WFM7VFPWCNF4DM76ADV80X'
> ulid(1586872590191)
'01E5WFM7VFPWCNF4DM76ADV80Y'
> ulid(1586872590191)
'01E5WFM7VFPWCNF4DM76ADV80Z'
> ulid(1586872590191)
'01E5WFM7VFPWCNF4DM76ADV810'

These ULIDs can be used to associate actions and events or to group activity from a particular task or host.

Going Plaid in S3

Let's return to our earlier S3 example for a moment. When looking for data in a specific time range, you can reduce the number of objects returned by ListObjects significantly. The Delimiter argument lets you narrow the range of your search in 5-bit increments. A ULID has 10 leading characters that represent a 48-bit timestamp with millisecond precision, with each character encoding 5 bits of the number.

48-bit millisecond epoch timestamps will run out of space in 10889 AD, mark your calendar. The astute reader will also note that a 48-bit timestamp value doesn't evenly encode to 50 bits available in a Crockford Base32 string, so the highest timestamp that will ever be representable is actually 7ZZZZZZZZZ not ZZZZZZZZZZ.

t = time character
r = randomness character
ttttttttttrrrrrrrrrrrrrrrr

How long is the range per character? Well, here are some orders of magnitude of the least significant bit representable in each.

1st character: 407226 days
2nd character: 12725 days
3rd character: 397 days
4th character: 12 days, 10 hours
5th character: 9 hours, 19 minutes
6th character: 17 minutes, 28 seconds
7th character: 32 seconds
8th character: 1 second
9th character: 30 milliseconds
10th character: 1 millisecond

This means that with S3's ListObjectsV2 API and the Delimiter parameter, you can grab 17-minute spans of your written data by using the 6th character of the ULID as your Delimiter. Take these objects:

2020-04-13 21:17:54          3 01E5V475QFRCEHXKJAS3BRS6BV
2020-04-13 21:24:51          3 01E5V4KXFTP52C9M5DVPQ2XR8T
2020-04-13 21:48:33          3 01E5V5Z9J0GX72VFSENBCKMHF0

We can slice between the 01E5V5Z... span with the following code:

>>> [k['Key'] for k in s3.list_objects_v2(
    Bucket='t10-blog-ulids',
    Delimiter='4',
    Prefix='01E5V4'
)['Contents']]
['01E5V475QFRCEHXKJAS3BRS6BV', '01E5V4KXFTP52C9M5DVPQ2XR8T']


>>> [k['Key'] for k in s3.list_objects_v2(
    Bucket='t10-blog-ulids',
    Delimiter='5',
    Prefix='01E5V5'
)['Contents']]
['01E5V5Z9J0GX72VFSENBCKMHF0']

As expected, the keys are ordered when they are returned and we can use bitwise operators (a.k.a. magic) to shift whatever time stamp or range we want into an S3 prefixed query. This lets us do time-range based filters without listing every object in the bucket or using an external job like S3 Inventory to list all the object names and timestamps.

Wrapping Up

In this post, we've touched on a couple of ways that semantically-charged identifiers can be useful in your storage layer. Overall, ULIDs and similar specs for sortable identifiers are an improvement on the standard fully-random UUID. They can make your app faster while still avoiding collisions and enumeration attacks, and also can be stored more efficiently (26 characters vs. 36).

Author

Ryan Scott Brown

Go to Stories by Ryan Scott

Similar Blog

Data and Analytics

Building a Simple AWS Data Warehouse Solution with Data Streaming

Easy and affordable data storage and analysis on AWS.

Kelly Briceno | Apr 05 2023
10 min read

Cloud Native

What to do When Your Amazon DynamoDB or NoSQL Database Becomes the Bottleneck

An exploration of the three key questions to diagnose NoSQL database bottlenecks and approaches to resolve them.

Nikody Keating | Jun 29 2022
3 min read

Security and IAM

AWS Systems Manager Sessions Manager + EC2 Instance Connect = Awesome

AWS provides several ways to gain shell access to running ec2 instances.

Joel Haubold | Oct 25 2021
4 min read

Ready to bring order to your chaotic cloud?

Talk to us about ULIDs, datastores, AWS and anything over that range.

Explore our Services

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview