Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Data and Analytics

Kinesis Data Streams and its Intricacies

Some of the common challenges and "gotchas" that will help you smoothly implement KDS.

Fenil Patel | Aug 11 2020

Kinesis Data Streams (KDS) are used to collect and process data and with large numbers of Kinesis shards. We can collect gigabytes of data per second and make it available for processing and analyzing in real-time for multiple consumers.

The classic use cases for Data Streams are:

Collecting real-time metrics and reporting
Real-time data analytics
Accelerated log and data feed intake and processing

For any of the above use cases, it is important to control the assorted factors to ensure smooth functioning and lack of data loss when using multiple shards in KDS. In this post, I’m going to review some of the common challenges and "gotchas" that will help you smoothly implement KDS. If you’re looking for more of a background tutorial on what KDS is and how to use it, check out this in-depth video provided by AWS.

Common Challenges

Monitoring Performance

The optimal method of monitoring performance on a KDS at a shard level is to turn on enhanced monitoring for the KDS. This will give you a few more helpful metrics. One of the helpful shard level metrics is the Iterator Age in Milliseconds explained below.

IteratorAgeMilliseconds is the age of the last record in all GetRecords calls made against a KDS, measured over the specified time period in this case milliseconds. Age is the difference between the current time and when the last record of the GetRecords call was received by the stream. The Minimum and Maximum statistics can be used to track the progress of Kinesis consumer applications. A value of zero indicates that the records being read are completely caught up with the stream.

Common UpdateShardCount "Gotchas"

Shard counts are the key scaling dimensions for Data Streams. Shards determine the capacity of throughput in a data stream. They can be scaled up or down depending on the usage of the data stream. However, there are a few default limitations that are unclear for those with little experience in using Kinesis.

When scaling Data Streams, you cannot:

Scale-up more than ten times per rolling 24-hour period per stream
Scale-up to more than double your current shard count for a stream
Scale down below half your current shard count for a stream

These hidden limitations can leave you pondering late into the night on why no data is being pushed to your data streams. Make sure you account for these limitations when designing your architecture to be elastic and available.

Intricacies with Partition Keys

Partition keys are unicode strings, with a maximum length of 256 characters that can be used to group data by shard within streams. A hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards using the hash key ranges. Partition keys help you scale up beyond one shard, so even if data is written to different shards and there is need for order partition keys can help maintain order during transmission Using some information such as a UUID to trace back the producer of the information is suggested best practice for using multiple shards. Partition keys also count against the total write size and are stored in the stream, so be careful with stringent payload sizes.

Using partition keys in the most effective manner:

If order does not matter, it is most efficient to choose a well-distributed attribute for the partition keys (such as a constant variable to indicate category and some randomness such as a timestamp or random variable.
If order does matter, ensuring that the data is put sequentially to the same shard to ensure continuity and linearity in streaming.

Following the above advice could help avoid jumbling up the data delivery, but how do we avoid the problem of pushing too much data to a single shard and leaving the rest of the shards cold.

Avoiding Hot and Cold Shards

Choosing an Id that is an identifier such as a client Id can backfire when clients with overactive data production push to the stream. This can lead to a few shards that are over-utilized and other shards that are underutilized.

Aim for a minimum of 10x as many partition keys are required as the number of shards. As mentioned earlier, if ordering is not important some randomized partition keys can be used to eliminate hot and cold shards.

With enhanced monitoring, IteratorAgeMillis metric shows the oldest message age on each shard, making it easier to find hot shards. Enhanced monitoring provides shard level metrics that would otherwise be invisible with regular monitoring, there is a cost associated with turning on enhanced monitoring, and this can be done using the AWS CLI, console or API.

These are some of the common challenges that are faced when using KDS, and paying attention to them during development can be helpful in avoiding the common pitfalls.

Author

Fenil Patel

Go to Stories by Fenil

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Explore the Blues Cell + Wifi communication module on a Raspberry Pi Zero, Notehub, and thoughts on the pros and cons of utilizing Blues in your IoT project.

Justin Courtright | Dec 21 2024
6 min read

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Is on-demand pricing really ‘serverless’ pricing?

Joel Haubold | Sep 21 2023
5 min read

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

An exploration of how to build a data lakehouse entirely in Amazon S3.

Ryan Farina | May 22 2023
8 min read

Want to learn more about KDS?

Lean on the experts at Trek10.

Explore our Services

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Data and Analytics

Kinesis Data Streams and its Intricacies

Common Challenges

Monitoring Performance

Common UpdateShardCount "Gotchas"

Intricacies with Partition Keys

Avoiding Hot and Cold Shards

Author

Fenil Patel

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

Want to learn more about KDS?