Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise

Interests
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Interests

Amazon API Gateway

Amazon Athena

Amazon CloudFront

Amazon CloudWatch

Amazon Cognito

Amazon Connect

Amazon DynamoDB

Amazon Elastic Kubernetes Service (EKS)

Amazon ElastiCache

Amazon EventBridge

Amazon Kinesis

Amazon QuickSight

Amazon RDS

Amazon Redshift

Amazon SageMaker

Amazon WorkSpaces

AWS CloudFormation

AWS CodePipeline

AWS Config

AWS Control Tower

AWS Database Migration Service

AWS Fargate

AWS Glue

AWS Glue Databrew

AWS IoT Architecture

AWS IoT Devices

AWS IoT Greengrass

AWS IoT SiteWise

AWS Lambda

AWS MAP (Migration Acceleration Program)

AWS Serverless Application Model (SAM)

AWS WAF

AWS Well-Architected Framework

Containers on AWS

Data & Analytics on AWS

DevOps Security in AWS

Disaster Recovery

Industrial Machine Connectivity/Connected Factory

Machine Learning Ops

Serverless Analytics in AWS

Serverless Architectures in AWS

Overview

Amazon API Gateway is a fully-managed, easily configurable entry point for your web services.

Overview

Analyze and query data easily at a mass scale from a variety of platform services using Amazon Athena.

Overview

Amazon CloudFront is a content delivery network (CDN) which is a distributed system that delivers applications, websites, and content to users based on factors such as users’ geographical locations, or the origins of the content and delivery servers.

Overview

CloudWatch is an AWS service that allows for basic-to-detailed performance monitoring of your applications and AWS environment resources within a single platform.

Overview

Make it easy to add user sign-up, sign-in, and access control to your web and mobile applications with Amazon Cognito.

Overview

Amazon Connect is an affordable omni-channel cloud-based contact center that enables companies to deliver advanced level support to customers without the burden of maintaining on-premise legacy systems.

Overview

Amazon DynamoDB is the one of the fastest and most versatile, serverless key-value and document database options available in the cloud today.

Overview

The benefits of Kubernetes without the upfront infrastructure hassles.

Overview

Traditional relational databases do not scale well horizontally, and even right-sized NoSQL databases can become a bottleneck under high traffic.

Overview

AWS EventBridge makes it easy to connect applications together using data from Software-as-a-Service(SaaS), AWS services, and one’s own applications.

Overview

An AWS-managed service, Kinesis is a solution that allows users to analyze streaming data in real-time.

Overview

QuickSight is an AWS-managed business intelligence tool that allows you to quickly assess your business.

Overview

Managed Relational Database Service

Overview

An AWS cloud data warehousing solution that stands out.

Overview

Amazon SageMaker is a fully managed service that allows developers and data scientists to build, train, and deploy machine learning (ML) models much faster and efficiently for your specific use cases.

Overview

Amazon WorkSpaces is a managed, secure Desktop-as-a-Service (DaaS) that helps you cut the noise and cost of traditional VDI platforms.

Overview

CloudFormation is a free AWS service that enables taking declarative code and creating AWS resources configured exactly as declared via templates.

Overview

A continuous delivery service.

Overview

Continually assess, audit, and evaluate your AWS resources using AWS Config.

Overview

Set up and govern multi-account AWS environments with AWS Control Tower.

Overview

Migrate a wide variety of databases to or within AWS utilizing AWS Database Migration Service.

Overview

With AWS Fargate, you can deploy containers in AWS without managing any underlying host infrastructure.

Overview

AWS Glue is a fully managed, scalable, serverless data ingestion service that enables customers to extract, transform, and load (ETL) data for analytics.

Overview

AWS Glue DataBrew is an interactive data preparation tool for cleaning, normalizing, analyzing, and adjusting datasets.

Overview

We break down IoT ecosystems into five foundational components that revolve around cloud-based data insights.

Overview

In general IoT device platforms can be divided into two categories: Embedded Systems and Edge devices.

Overview

AWS IoT Greengrass is an open-source runtime for IoT devices to interact with AWS cloud services.

Overview

IoT SiteWise is an AWS service that can be used to collect, process, analyze and monitor industrial IoT data on AWS.

Overview

AWS Lambda is one of the most revolutionary serverless compute services offered in cloud computing today, allowing you to easily run code for practically any type of application or backend service.

Overview

MAP helps you accelerate cloud migration and modernization with an outcome-driven methodology.

Overview

Enable your team to build serverless applications faster with this open-source framework from AWS.

Overview

Protect against web attacks.

Overview

A Complete Guide to the AWS Well-Architected Framework.

Overview

Amazon Elastic Container Registry (ECR) makes data storage, management sharing, and deployment possible from anywhere.

Overview

AWS provides integrated end-to-end solutions for modern data management and advanced analytics.

Overview

Applying Devops Security for an AWS application.

Overview

A Disaster Recovery Plan (DRP) is a structured and detailed set of instructions geared to recover a system and networks in the event of failure or attack, with the aim of helping the organization get back to being operational as fast as possible.

Overview

In addition to the full range of AWS IoT architecture and support capabilities, we offer an Industrial IoT Proof of Value (POV) solution.

Overview

Machine learning operations (MLOps) is the umbrella term for best practices surrounding machine learning.

Overview

Using AWS serverless services as building blocks, you can now easily and rapidly build data lakes and data pipelines that process and analyze petabytes of data without needing to manage any infrastructure components.

Overview

Let AWS handle the burden of server management so you can focus your time on solutions for clients. By adopting a serverless architecture, you tremendously reduce the operational complexity of running your application, enabling you to focus on delivering new features faster without compromising security, reliability, and performance.

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Monitoring, Ops & DevOps

CloudWatch: Deep Dive

Andy Warzon | Jul 10 2018

Tue, 10 Jul 2018

This is the second in a series of posts about monitoring your production workloads in AWS. In the first post, we did a high level overview of cloud monitoring and broke it down into six types of metrics you should be monitoring. Here we’ll dive deeper into one of those areas, CloudWatch metrics, and give you a few tips for getting the most out of CloudWatch in production.

CloudWatch Metrics is a pretty well-known and straightforward AWS service. If you’re monitoring a production environment in AWS it should be at the top of your list for diving into and getting comfortable with. Particularly if you’re building & running apps that increasingly focus on the non-EC2 world (aka platform services or serverless) like Trek10, CloudWatch is the new Linux top: the most fundamental and basic insight into your running environment. For the uninitiated, first we’ll do a quick overview. Feel free to skip ahead to the tips if this is old hat for you.

CloudWatch Metrics Overview

CloudWatch is actually comprised of three (only loosely-related) services: CloudWatch Metrics, CloudWatch Logs, and CloudWatch Events. We will only focus on Metrics here. Check out our post from last year about CloudWatch Scheduled Events and look for an upcoming one on CloudWatch Logs.

Here is a short summary of CloudWatch Metrics:

CloudWatch metrics are simply time series data points emitted from AWS services or put into AWS by the API.
With EC2, CloudWatch gives you metrics from “outside the VM”… i.e. the hypervisor level. With other services where no VM is exposed, CloudWatch data gives you your only insight into the service’s operation.
You can access metrics from the interactive console explorer, console dashboards, the API, or you can pull them into your own monitoring tool.
Just about every monitoring tool on the planet now supports importing CloudWatch metrics. If yours doesn’t, try our favorite, Datadog.
One minute resolution data is stored for 15 days, 5 minute resolution for 3 months, and 1 hour resolution for 15 months
Most services deliver metrics at one minute resolution but some are less frequent.
You can push custom metrics into CloudWatch and those can be stored with up to 1 second resolution.
You can trigger CloudWatch Alarms off of metrics or import them into your own tool for alerting.

If you’re interested in diving deeper, ACloudGuru has a great set of lessons on CloudWatch metrics as part of its Certified SysOps Administrator Associate course.

Some Tips For Getting More out of CloudWatch

Enough with the basics. Let’s get into a few more interesting notes and tricks that come from Trek10’s experience with CloudWatch.

Metric Visibility Delays

We often get questions about this from people that are used to seeing their VM metrics in near real time. We find that CloudWatch metrics typically have about a 2 minute delay from showing up in AWS (in the console and API)… in other words the data point for 10:15 will be visible roughly at 10:17.

If you are using an external monitoring tool to import your CloudWatch metrics, this polling for import adds additional delay. We believe that having a tool that can aggregate all of your metrics is well worth this downside as long as the delay is minimal. With Trek10’s monitoring platform of choice, Datadog, the total delay from metric origination to being available in Datadog is about 10-12 minutes. Crucially (and we salute Datadog for developing this awesome feature), they can speed up your polling behind the scenes so that the total delay is only about 4-5 minutes (or just about 2 minutes longer than being able to access the data natively in CloudWatch). We find this to be just fine for almost all use cases. Contact Datadog Support if you’d like this feature enabled. One key warning… this will increase your AWS CloudWatch costs. Keep reading…

Watch GetMetrics Costs

If you are using a external monitoring tool, watch out for the cost of GetMetricData API calls. This call costs $0.01 per 1000 requests. There are some details about what you can get out of one request, but the bottom line is that your costs will increase multiplicatively with the number of AWS services you use, metric dimensions within those services, and frequency of polling. For example: With Lambda, a typical function has four CloudWatch metrics emitted: number of invocations, duration, errors, and throttles. If you have 50 Lambda functions in your account, your monitoring tool needs to do GetMetrics API calls on 50 x 4 = 200 metric/dimension combinations. This math applies to any dimension used by CloudWatch: autoscaling groups, S3 buckets, SNS topic, and on and on. It is worth a brief browse of the CloudWatch console to understand the metrics that can affect this cost:

If you’re polling AWS once every couple minutes for hundreds or even thousands of metric/dimension combinations you can see how this cost can quickly add up to hundreds of dollars per month.

AWSWishList: Polling for CloudWatch metrics is remarkably inefficient: AWS really needs to create a better system for bulk export of metrics at high frequency.

Be Thorough

The key to a good CloudWatch monitoring plan is depth. If you monitor just a few obvious things like RDS CPU and Lambda errors you will likely miss out on some critical warning signs of production problems. Every AWS service has thorough documentation of the CloudWatch metrics available to it. To give you an idea, here is the list for IOT Core and another for Step Functions. For every service you are working with, dive deep into this list and understand what is available and why it matters.

Some metrics are obvious candidates for alerting, like DynamoDB throttles: This is a critical production issue if it happens. But even for those where you may not alert, you can build incredibly insightful dashboards to analyze problems when they arise. For example, imagine you have a simple serverless REST API with API Gateway, Lambda, and DynamoDB. Your critical metric might be rate of HTTP 5XX errors on API Gateway, but when this rate hits a concerning threshold you need to be able to quickly dig deeper. Your dashboard might contain API Gateway error rates and request volume as well as Lambda error rates, Lambda throttles, and a variety of DynamoDB error metrics such as Read and WriteThrottleEvents and SystemErrors. Seeing all of these CloudWatch metrics on a single screen will let you quickly drill in on the source of the problem.

Trusted Advisor Metrics

One of our favorite hidden CloudWatch metrics is something that just came out relatively recently: Trusted Advisor metrics. Trusted Advisor is AWS’s service that is available with Business or Enterprise Support and checks a wide variety of usage details across your AWS account to deliver insights into cost optimization, performance, security, and fault tolerance.

There are two groups of CloudWatch Trusted Advisor metrics. Green/red/yellow metrics simply count up the number of checks or resources checked that fit each alert level. So you can easily set up an alarm, for example, if you have at least one red check. More interesting, though, are the second group: Service Limit Metrics. There are a wide variety of service limits across AWS and hitting one of these limits in production is a surprisingly common cause of outages. These metrics report the percent of utilization against that service limit, giving you a simple one-stop-shop for warning against these issues. Just set your warning threshold at, say, 75% of each ServiceLimit, Service, and Region and you’re all set.

That’s all for now. Look for more posts on other aspects of monitoring in the coming weeks, and in the meantime follow us @Trek10Inc, and let us know if we can help you with your cloud monitoring.

This is the second in a series of posts about monitoring production workloads in AWS. Related posts include.

All The Metrics - A Cloud Monitoring Blueprint
Current post…
Custom Metrics Deep Dive

Author

Andy Warzon

Go to Stories by Andy

Founder & CTO, Andy has been building on AWS for over a decade and is an AWS Certified Solutions Architect - Professional.

Similar Blog

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events

How to synthesize an (almost) identical payload using Amazon EventBridge rules.

Joel Haubold | Nov 02 2023
5 min read

Cloud Native

Using AWS XRay for ECS Observability

Learn how AWS X-Ray is a vital tool for enhancing the observability of containerized applications on ECS.

Michele Mike Hjorleifsson Featured Team Member

Michele (Mike) Hjorleifsson | Sep 13 2023
10 min read

Spotlight

Measuring Cross AZ Data in Default VPC Flow Logs

How to Construct a Switch Statement in CloudWatch Log Insights

Joel Haubold | Aug 16 2023
5 min read

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview