InteliBridge MCP: Unlock up to $30k to build your Model Context Protocol (MCP) Server. Join the waitlist →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Monitoring, Ops & DevOps

CloudWatch: Custom Metrics Deep Dive

Andy Warzon | Sep 24 2018

Mon, 24 Sep 2018

This is the third in a series of posts about monitoring your production workloads in AWS. In the first post, we did a high level overview of cloud monitoring and broke it down into six types of metrics you should be monitoring, and in the second we dove deep into CloudWatch. Today, we’ll do another deep dive, this time into custom metrics.

While custom metrics are an afterthought for many when initially operationalizing their systems, we view custom metrics as one of the things that separates the “good” operations from the “great” ones.

Doing it right upfront will make your system more reliable and performant and your operational analysis more efficient.
A few well-placed custom metrics may identify problems that are otherwise missed (or caught too late) by system metrics.
Custom metrics add clarity to system behaviors that are hard to tease out of system metrics.

Let’s take a look at the two key questions for implementing custom metrics: What metrics should you be generating, and how can you generate them? As with past posts, the “how” will consider both AWS-centric approaches and bring in some options from Datadog, Trek10’s favorite tool for operational insights.

Metrics? What Metrics?

First things first… what custom metrics should you be creating? Here are four areas you might want to consider:

Business Key Performance Indicators (KPIs)

This is getting to the heart of “what matters?” in your application and your business. Identify the activities on your platform that generates business value. Maybe user signups, or transactions completed, or data rows ingested. Keeping an eye on that high-level goal is a safety net— if there is some gremlin in your system that none of your system metrics are catching, the KPI will show the truth.

Tracking KPIs has the nice side-benefit of making it very easy to build a dashboard for your business leaders. Great fodder for an office monitor!

Points of concern or focus

Assuming you have thorough system metrics (Cloudwatch and VM, if applicable) and APM metrics, you can ultimately see just about any underlying behavior in the system. However it might not always be simple or obvious and setting appropriate alerts may be impossible.

To simplify matters, add a custom metric to track performance in a specific area of focus. For example, it may be very well known that when your application makes a request to an external service provider and with a certain set of parameters the service provider’s response time is problematic. Add a custom metric in your code for just that scenario and you can immediately understand the behavior and alert on it.

Background Jobs

This one is an easy win. Whenever you have background jobs run, have them submit a custom metric like JobSuccess=1 after they have completed. Almost every monitoring system has an option to alert on no data, so you simply create alerts for when the metric is zero or no data, and voila, you have a simple and highly reliable job monitoring system. Add in job duration and other metrics and you can easily build a complete background job dashboard.

Events

Don’t focus just on metrics: Consider what discrete events in your system might be valuable for alerting or overlaying on charts for correlations. Add custom code in your app or your deployment automation to drop those events into your monitoring platform.

An obvious one is deployment events: Datadog and many other monitoring platforms offer direct integration with tools like Jenkins and Github. If that is not available to you, add a script in your deployment pipeline to post a custom event. Other meaningful events might be when an ETL jobs starts and finishes or when regular patching happens.

How To Create Custom Metrics and Events

It’s important that your custom metrics be as lightweight as possible: The last thing you want is large new code bases or new infrastructure to manage. That’s why we strongly recommend either using AWS CloudWatch or a SaaS, our favorite being Datadog. At Trek10 we tend to put most things into Datadog; we find the code to be simpler and like having everything in one place. But if that’s not an option for you, CloudWatch will do the job just fine. We’ll also discuss a third interesting option: logging your custom metrics.

CloudWatch

You can use CloudWatch to log both metrics and events, though currently events cannot be overlaid on charts & dashboards. (Sidebar… CloudWatch Events is a very powerful service for many other reasons. This post has gotten us thinking about the possibilities with it.)

The approach is straightforward… just use your AWS SDK of choice and the API calls PutMetricData and PutEvents. In our favorite SDK, boto3, that would be putmetricdata and put_events. Make sure of course that your IAM user/role has the permission for cloudwatch:PutMetricData or events:PutEvents.

When putting metric data, you can include up to 150 values per metric. And note that if you have high volume metrics, you might want to look at publishing statistic sets instead of every individual data point.

Datadog

There are two main options for custom metrics in Datadog (not including the logging option we discuss below):

If your code is running on a VM with the Datadog agent installed, the simplest option will be to use dogstatsd, Datadog’s adaptation of the popular statsd. This relays metrics to the agent which then ships them off to Datadog alongside the typical system metrics. This is a very efficient and high-throughput approach, and it has the added benefit of not needing to configure additional security since the Datadog agent already has a key. If you are running in the Linux shell there is even a convenient bash one-liner to send metrics to the agent.

If you’re running without a Datadog agent (i.e. in a Lambda function), you will need to use the Datadog API to POST a metric. There is a convenient Python library as well as support for other languages, or try going sans SDK with a raw HTTP post.

If you want to dive deeper, here is a great summary from Datadog.

Logging Your Custom Metrics

Structured logging is an interesting third option for pushing custom metrics. Just drop your metrics into your logs in some structured form like METRIC|{NAME}|{VALUE} and then configure your logging platform to extract those metrics. Datadog has a slick feature to automatically parse these metrics from AWS Lambda logs if they come in the form defined here. Perhaps the biggest upside of this is the simplicity of the code. No extra packages at all… one simple line of logging code for each metric. However the configuration on the logging platform side may be more complex. The other downside to consider is cost… depending on your platform this may incur additional costs.

Most logging platforms like Datadog Logs, Splunk, and SumoLogic will support this, as will the open source option of ElasticSearch. However if you’re trying the low-cost AWS-native option of CloudWatch Logs, you won’t be able to do this.

That’s a quick summary of a few options for you to consider. We’d love to hear your thoughts on this… what platform are you using for custom metrics and what kind of events are you sending metrics for? Start a conversation at @Trek10Inc.

This is the third in a series of posts about monitoring production workloads in AWS. Related posts include.

All The Metrics - A Cloud Monitoring Blueprint
CloudWatch Deep Dive
Current post…

Author

Andy Warzon

Go to Stories by Andy

Founder & CTO, Andy has been building on AWS for over a decade and is an AWS Certified Solutions Architect - Professional.

Similar Blog

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events

How to synthesize an (almost) identical payload using Amazon EventBridge rules.

Joel Haubold | Nov 02 2023
5 min read

Cloud Native

Using AWS XRay for ECS Observability

Learn how AWS X-Ray is a vital tool for enhancing the observability of containerized applications on ECS.

Michele Mike Hjorleifsson Featured Team Member

Michele (Mike) Hjorleifsson | Sep 13 2023
10 min read

Spotlight

Measuring Cross AZ Data in Default VPC Flow Logs

How to Construct a Switch Statement in CloudWatch Log Insights

Joel Haubold | Aug 16 2023
5 min read

Overview

Overview

Overview

Related Content

AWS Lambda

Blog

What is Serverless and Why Does it Matter?

Overview

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

Overview

Related Content

AWS CloudFormation

Containers on AWS

Overview

Related Content

Amazon WorkSpaces

Containers on AWS

Overview

Related Content

Machine Learning Ops

Amazon SageMaker

Overview

Related Content

Developer Acceleration

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

Amazon CloudWatch

Disaster Recovery

Overview

Related Content

AWS Premier Partner

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Monitoring, Ops & DevOps

CloudWatch: Custom Metrics Deep Dive

Metrics? What Metrics?

Business Key Performance Indicators (KPIs)

Points of concern or focus

Background Jobs

Events

How To Create Custom Metrics and Events

CloudWatch

Datadog

Logging Your Custom Metrics

Author

Andy Warzon

Similar Blog

Serverless

Replacing Amazon S3 Events with Amazon S3 Data Events

Cloud Native

Using AWS XRay for ECS Observability

Spotlight

Measuring Cross AZ Data in Default VPC Flow Logs