Caylent Acquires Trek10 to Create the Most Comprehensive Dedicated AWS Services Partner Press Release →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Monitoring, Ops & DevOps

Detecting AWS Cost Anomalies

Forrest Brazeal | Jun 25 2018

Mon, 25 Jun 2018

Cloud Spend: Everybody’s Problem

If you’ve spent any time at all in the cloud, you know the pain of mysterious bills and ever-escalating spend. When you’re paying as you go and scaling on demand, exercising control over your infrastructure costs takes more vigilance, not less. As Expedia’s Subbu Allamaraju has correctly pointed out, in the cloud, cost awareness must be part of the engineering culture, not just something assigned to a governance team on the side. For that reason, Trek10’s CloudOps team spends a lot of time thinking about cloud spend and developing tools to help catch problems before they happen.

Detecting Cost Anomalies

One thing we’ve learned is that the first step in limiting the size of your AWS bill is to limit the surprise on the bill. That means keeping a close eye out for cost anomalies: changes in the historical consumption pattern of your cloud resources. Obviously, AWS Budgets with its running total of monthly spend is the first place to look, but we’ve found that static thresholds like that become less useful across many accounts and highly dynamic environments. We don’t just want to know how much we’ve spent, but also to mine deeper trends.

Two Kinds of Changes

Trek10’s CloudOps team looks for cost anomalies in two ways: big spikes and slow changes. We use standard deviation to determine when a spike is “big”, and we calculate the cost change from 7, 21, and 35 days ago to determine if there is a slower-growing change that adds up to a big difference over a longer window of time. (We use multiples of 7 to avoid comparing different days of the week and getting false alarms from systems with large intra-week fluctuations.)

We use the AWS Cost Explorer API (formerly CloudWatch Billing Metrics) to calculate these alerts for our own AWS accounts and for our CloudOps clients.

Thresholds

We define anomalies based on the following thresholds:

“Absolute meaningful change” greater than $20 (so we ignore any change that is less than $20 per day, no matter what other thresholds it breaches)
2 standard deviations above or below the 45 day mean (also known as the z-score)
Absolute increase or decrease of 50% over the past 10, 20, or 30-day window

If one of these thresholds is breached for more than two days in a row, we create an alert message and send it to Datadog, where we maintain dashboards tracking all the metrics captured by our system. The alerts also integrate automatically with our SLA-enforced ticketing system, so our clients can rest assured that their bills aren’t mushrooming while they sleep.

The graph above shows the z-score for one of our internal accounts over the last few days. You’ll notice there’s no red line indicating an alert for that big spike in the graph in early June. That’s because the cost increase did not last long enough to be of concern. The red line later in the month is actually catching a slow-growing change — in this case, because we were accumulating some resources in the account that weren’t getting cleaned up.

Oh, and given our serverless proclivities, you won’t be surprised to learn that all of these checks run in Lambda and incur little to no overhead cost.

Cost Anomaly Detection In Action: Expiring Reserved Instances

Our billing alert system often catches legitimate spend changes that require no further action — for example, when one of our clients starts using a new AWS service. But recently we noticed an alert that seemed a little more concerning.

In the graphs above, taken from our Datadog dashboard, you’ll notice a large increase in daily cost around the middle of June. Once the increase had breached our thresholds, our alert system automatically registered an alert and filed a support ticket for our CloudOps team. We followed up with the client to confirm the source of the problem (in this case, expiring reserved instances), and got the reservations renewed. No cloud bills were harmed in the detection of this issue!

The Adventure Continues

Optimizing cloud costs isn’t a one-solution-fits-all problem. It takes attention and discipline from every part of your organization. And taking proactive steps to monitor your environment may expose slower-growing issues that have a big impact over time. If you’re interested in leveraging the Trek10 CloudOps team’s expertise to level up your spend management game, we’d love to hear from you.

Thanks to Trek10’s James Bowyer for contributing to this post.