Caylent Acquires Trek10 to Create the Most Comprehensive Dedicated AWS Services Partner Press Release →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Serverless

Serverless Architectures: S3 Data Loading

Andy Warzon | Sep 13 2016

Tue, 13 Sep 2016

Serverless on AWS represents a new way to architect highly performant, highly resilient, and very low-maintenance cloud-native systems. In this occasional series, we’re going to review some interesting design patterns for Serverless systems on AWS. Hopefully this will give you an idea of what you can build and you can apply these concepts to your own use case. Some of the most interesting Serverless architectures combine several of these patterns into a single system.

If you’d like a more high-level introduction to Serverless, check out this post.

Today, we’ll focus on a relatively simple but very powerful pattern for a wide variety of ETL applications…

S3 Data Loading

There are many ways to pipeline data from a producer to a final source. S3 and Lambda represent a very simple and low-cost approach that can also be extended to be very powerful. It is NOT appropriate if you need truly real-time visibility, but that’s a relative term. If you can accept delays of perhaps 15-60 seconds, this is a good solution, and it is far simpler than a truly real-time approach that uses Kinesis.

In its most basic form, your data producer (i.e. a web app or client system) pushes data periodically to S3 in some flat file format. This S3 event triggers a Lambda function which processes the data and pushes it into the final destination, such as Redshift, another SQL database, or perhaps back into S3 in some archived form.

The Redshift COPY command, invoked in the Lambda function, is a straightforward and efficient way to load the data into Redshift. AWS outlined this in more detail in this great blog post.

To make this approach more robust, we always make sure to move the files from some /incoming folder in S3 to a /processed folder after a successful load. This makes it easy to identify and reprocess failures, and you can add an S3 Lifecycle rule to the /processed folder to archive or delete the raw data after some period.

Of course, this simple version has some limitations:

The data producer has to be responsible for buffering and batching the data so it can send intermittent files to S3. This is more effort on the application side, not a “fire and forget” scenario that may be more preferable from the producer’s (i.e. the app’s) perspective.
If there are many data producers and they do not send data in a predictable and consistent manner, a flood of data could overload the database, since Lambda will fire with almost infinite concurrency.

A more robust design

We can evolve this basic pattern to solve both of these problems. To solve #1, we simply put Kinesis Firehose in the middle. Kinesis Firehose is a fully managed streaming data queue that handles buffering and batching your data and pushing it into S3. You can push data in via the API, or you can use the Linux agent to continuously monitor a set of files and let the agent handle sending new data to Kinesis.

To solve #2, we need to introduce a centralized queue. There are a few ways to do this. One is the Kinesis Firehose approach we discussed above… because a single system is doing all the buffering and batching, this “many loaders” problem goes away. But this approach may not be ideal. If you need FIFO, you could use an actual queue like SQS (loose FIFO) or Dynamo or Redis. But in most cases, FIFO is not critical, and you can simply use S3 as your queue. The process works like this:

Data producers drop files into S3 /incoming folder
Scheduled Lambda function runs once a minute (or less if appropriate) and lists files in S3 /incoming
Lambda function begins loading files up to X files, where X is determined for your data source to not overload the system. For typical data sources this load should be iterative, but for Redshift you can load them in parallel with a single COPY command to maximize performance.
Lambda function sends a custom Cloudwatch metric with the number of unprocessed files (the count of files listed in step 2 minus the number of files processed in step 3)
A Cloudwatch alarm can warn you if this metric grows too high or does not drop, indicating you may need to increase X and your data source’s write capacity.

If load time is a concern (Lambda has a max timeout of 5 minutes), you could use an initial Lambda function to list the files and then the Lambda Fanout pattern to have individual Lambda functions perform each load in parallel, again tuning for your particular data loading characteristics so you don’t overload your destination.

Stepping back, the bottom line to keep in mind here is that S3 → Lambda → Data source is a simple and powerful method for managing a near-real-time data pipeline. There are several flavors to this approach to fit your use case. But at the end of the day, you have almost unlimited scalability, no computing or storage infrastructure to manage, no complex ETL tools, and very low total cost. So give it a try, or hit us up if you’d like some help!

Author

Andy Warzon

Go to Stories by Andy

Founder & CTO, Andy has been building on AWS for over a decade and is an AWS Certified Solutions Architect - Professional.

Similar Blog

Spotlight

How to Use IPv6 With AWS Services That Don't Support It

Build an IPv6-to-IPv4 proxy using CloudFront to enable connectivity with IPv4-only AWS services.

Michael Barney | Feb 12 2025
6 min read

Spotlight

AWS Lambda Functions: Return Response and Continue Executing

A how-to guide using the Node.js Lambda runtime.

Joel Haubold | Dec 07 2023
5 min read

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events

How to synthesize an (almost) identical payload using Amazon EventBridge rules.

Joel Haubold | Nov 02 2023
5 min read

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Serverless

Serverless Architectures: S3 Data Loading

S3 Data Loading

A more robust design

Author

Andy Warzon

Similar Blog

Spotlight

How to Use IPv6 With AWS Services That Don't Support It

Spotlight

AWS Lambda Functions: Return Response and Continue Executing

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events