Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Data and Analytics

Exploring the Depths of Kinesis Data Streams - Part 3: Advanced Features

Correcting course and using advanced Kinesis features to solve hard scaling problems.

Ryan Farina | Mar 08 2023
5 min read

If you have been following along with part 1 and part 2 of this series, you may be left with some questions on what else can you do to further optimize your Kinesis workflows. Fortunately, in this blog post, we will explore some of the more niche features and help to provide an understanding of when you may or may not use them.

The next two sections depend on a piece of infrastructure called an Event Source Mapping. An Event Source Mapping handles the compute needed to route messages to Lambda functions. Within the AWS console, these are created simply by creating a Trigger for a Lambda function. This resource contains all the configurations needed to manage how AWS Lambda will interact with Amazon Kinesis. By default, this resource will spin up a new Lambda instance for each shard.

Optimize Your Data Processing and Analytics with Amazon Kinesis

Parallelization Factor

Another way to scale up our streams is to provide a parallelization factor. This value does not technically pertain to the Kinesis stream itself, rather it provides a unique implementation of the Event Source Mapping. That means that this solution can only work for problems where the consumer is a Lambda function. The goal of this mechanism is to provide a way to scale up your Lambda consumers without needing to increase the number of shards. This solution provides the capability to launch up to 10 Lambda functions per shard depending on your choice of Parallelization Factor.

The immediate reaction to this information should be an understandable concern for the ordering of your messages, however, AWS has made sure to implement this in such a way in which order is still guaranteed along partition values. How does AWS achieve this?

Looking back to our previous example with our manually split shards, we notice that both shards each contain two unique partition keys. What the event source mapping will do is during each new poll of records, it will batch records based on their unique partition keys. It will then distribute these records evenly to all Lambda functions. The implication here is that given our example of 2 unique values per shard, a factor greater than 2 would be ineffective in improving performance. The Event Source Mapping would never create any more than 2 unique sets of batches, this ensures that order is maintained across a partition key, however, interestingly you no longer have a guarantee of order along a shard. Below is a picture to help illustrate what is happening with a related example and parallelization factor of 3:

Notice in this diagram, you cannot guarantee that payload with sequence value 3 is processed before 5, however, you can guarantee that the payload with sequence value 1 will be processed before 5. This is another example of why picking an appropriate partition key value for ordered messages is a critical choice in your kinesis implementation.

This solution is ideal for cases where the method to process the data takes long enough to cause a backup in processing. So for example, if our records discussed above fall below the maximum threshold for a Kinesis shard (1 MB/s or 1000 records/s write), however, the processing of the records for the shard takes greater than a second. More concretely, say you have 500 records per second on a call, but your Lambda function processes these 500 records in 1.2 seconds. This means your Lambda will not be able to keep up with your throughput. You can resolve this by increasing the shards, however, this process can only be done so often and results in higher cost. Splitting your 500 records into 250 records (once again, assuming evenly distributed records along your partitions) each could result in your Lambda function taking half the time to process those records, meaning you can handle your throughput without having to scale up shards.

Enhanced Fan Out

Enhanced Fan Out (EFO) is a solution to solve a different kind of scaling problem. One of the advantages of Kinesis is the ability to have multiple consumers read from a single Kinesis stream. However, on streams with sufficiently high throughput and a sufficient number of consumers, we can run into a couple of scaling problems strictly on the consumer side. The GetRecords API call has some important limits to keep in mind. First, it can only handle 5 transactions per second. A Lambda Event Source Mapping performs 1 call every second. This means that once you go to a 6th unique Lambda function, you have exceeded your limit. Kinesis Firehose, a favorite pairing with Kinesis, also performs the call once per second. So a firehose reduces the number of Lambda event source mappings down to 4. In addition, if a shard returns more than 10MB within 5 seconds, you will be throttled for the remainder of that 5 seconds. Concretely, if a single call returns 10MB, then all other calls for the 5-second window will be throttled.

The old resolution to this issue was to duplicate your Kinesis stream by having a Lambda function simply forward your records to another stream or your producer would write to multiple streams. AWS has implemented another solution for us and created a resource called a ‘consumer’. A consumer is a special copy of your Kinesis stream with an HTTP/2 endpoint. This means that EFO will actually push records to a subscriber. Each consumer may have exactly one subscriber and each stream can have up to 20 consumers. Calling the SubscribeToShard API call will return an event stream object. This Stream will produce real-time values from Kinesis pushed to you from the HTTP/2 endpoint.

EFO is therefore very powerful for two reasons, it allows you to provide dedicated throughput to a single service while also reducing the effective latency.

Conclusion

Kinesis is a very powerful AWS tool that provides the ability to interact with data in real time. When dealing with data that must be ordered, Kinesis can present many unexpected issues that can be difficult to resolve without a deeper understanding of the service itself. With Kinesis, you must be very aware of the impact of the key values you’ve chosen on the performance of the service.

Author

Ryan Farina

Go to Stories by Ryan

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Explore the Blues Cell + Wifi communication module on a Raspberry Pi Zero, Notehub, and thoughts on the pros and cons of utilizing Blues in your IoT project.

Justin Courtright | Dec 21 2024
6 min read

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Is on-demand pricing really ‘serverless’ pricing?

Joel Haubold | Sep 21 2023
5 min read

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

An exploration of how to build a data lakehouse entirely in Amazon S3.

Ryan Farina | May 22 2023
8 min read

AWS Data and Analytics Expertise

Explore more resources from Trek10

Browse | Learn | Connect

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Data and Analytics

Exploring the Depths of Kinesis Data Streams - Part 3: Advanced Features

Parallelization Factor

Enhanced Fan Out

Conclusion

Author

Ryan Farina

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Data and Analytics

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Data and Analytics

Data Lakehouses Unleashed: How AWS and Apache Iceberg are Changing the Game

AWS Data and Analytics Expertise

Check out our Case Studies!

Contact Us