Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Data and Analytics

Exploring the Depths of Kinesis Data Streams - Part 1: Partitioning

A deep exploration of how Kinesis works and some of the more complicated challenges.

Ryan Farina | Feb 02 2023
5 min read

At Trek10, we are heavy users of Amazon Kinesis Data Streams for building scalable, reliable, and cost-effective cloud native data architectures. In a blog post from a couple years ago, we discussed some high-level concepts of Kinesis Data Streams. That blog post describes several important concepts to keep in mind when working with Kinesis Data Streams. But to really build enterprise-scale solutions, you need to get much deeper. In this blog series, we will seek to expand upon that baseline knowledge, dive a bit deeper into how Kinesis works, and explore some more complicated Kinesis concepts. At a quick glance, some of the advanced topics we will cover over three parts include:

Shard Scaling
Parallelization Factor
Enhanced Fan Out

Before we jump into any of these, it makes sense to talk a bit further about what exactly Kinesis is and how it works. It helps to consider Kinesis Data Streams as actual storage. This storage has special properties such as:

Each stream has a well-defined Time-to-Live (Known as RetentionPeriod)
Each record has a timestamp for when it was inserted
Each record has a partition key provided by the producer
The data is partitioned into logical groups called shards based on partition key
Within a shard, the records are ordered based on when they were inserted
There is a mechanism called an ‘Iterator’ which can be pointed at any timestamp and proceed to return all records in order.

Some liberty is taken with the concept of a timestamp, so there is no explicit timestamp value. The concept of a timestamp is handled behind the scenes for us by tracking sequence IDs. Every record within Kinesis has only a Sequence Id, Partition Key, and Data Blob. The Sequence ID is provided to the record by Kinesis. The partition key and data blob are provided by the application producer. Understanding that this is a storage of data and not a simple queue can help vastly in figuring out the differences between Kinesis and other services which get used in similar scenarios.

Gain a Competitive Advantage with Data and Analytics on AWS

Partition Keys

Although partition keys were discussed in the previous post, they are such a crucial part of what makes Kinesis work. Implementations that do not require your messages to be read in order can disregard the choice of partition keys as nothing more than random values. However, for implementations where the order is paramount to effective delivery, you must carefully consider the choice of key. Because of this importance, we will discuss them more with additional detail.

Each shard for a given stream is provided a hash range. The shard itself will contain all records whose hashed partition values fall within this hash range. To discover the hash ranges of your shards, you can perform the DescribeStream API call. This API call will return to you a payload including the maximum and minimum hash values for all the shards currently provisioned. When you provide a partition key value, that value is passed through a hash function. The resulting hash is then mapped to the correct shard.

This level of detail can help to understand why it is so important to heavily consider your choice in partition keys. Should you choose an uninformed choice of providing a single value for all records partition keys, then all records will always contain the identical hash value. This means there can never be a way for you to efficiently scale your workload and any choice in shard count or parallelization factor would be irrelevant. It also provides interesting possibilities for hot shards, described in the other post, to occur in unexpected ways. Let's use an example to showcase this possibility.

We will start with two shards and a simplified max hash range of [0,9]. That is, all records will be hashed to values between 0 and 9 inclusive. Should we choose a partition key with a discrete number of values, we can have an interesting problem occur. Our first shard will be given the values [0,4] and second shard the values of [5,9]. If our partition key for each record is one of the four following evenly distributed values; foo, bar, biz, and baz, then these values will have unique and consistent hash values. By definition, a well-defined hash function will evenly distribute values across its valid hash range. That means all numbers are equally likely to be the result of a given hash value regardless of the characteristics of your input value. Let’s go ahead and perform our hash function on these values and see where we end up:

At this point, the issue should start to come into focus. Three of the four values have been provided a hash value between the ranges of 0 and 4. This means that assuming all partition keys have identical velocities, the first shard will contain three times the traffic as the second shard.

The distribution of records between shards can be a crucial thing to understand for performance, but most importantly it helps to inform you about efficient scaling techniques. Join us in the next post to dive deeper still into ways in which we can scale our solution.

To continue reading this series, check out:

Author

Ryan Farina

Go to Stories by Ryan

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Explore the Blues Cell + Wifi communication module on a Raspberry Pi Zero, Notehub, and thoughts on the pros and cons of utilizing Blues in your IoT project.

Justin Courtright | Dec 21 2024
6 min read

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Is on-demand pricing really ‘serverless’ pricing?

Joel Haubold | Sep 21 2023
5 min read

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

An exploration of how to build a data lakehouse entirely in Amazon S3.

Ryan Farina | May 22 2023
8 min read

AWS Data & Analytics Expertise

Explore more resources from Trek10

Browse | Learn | Connect

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Data and Analytics

Exploring the Depths of Kinesis Data Streams - Part 1: Partitioning

Partition Keys

Author

Ryan Farina

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

AWS Data & Analytics Expertise

Check out our Case Studies!

Contact Us