InteliBridge MCP: Unlock up to $30k to build your Model Context Protocol (MCP) Server. Join the waitlist →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Cloud Native

Using AWS XRay for ECS Observability

Learn how AWS X-Ray is a vital tool for enhancing the observability of containerized applications on ECS.

Michele Mike Hjorleifsson Featured Team Member

Michele (Mike) Hjorleifsson | Sep 13 2023
10 min read

AWS X-Ray is a powerful tool that allows you to gain insight into the performance and behavior of your containerized applications. In plain language, this tool, in combination with the methodologies I’ll describe in this post, can help you identify and monitor performance, potential bottlenecks, as well as breakpoints in your application(s) running on several services in AWS. For the purposes of this article, we will focus on an ECS example. X-Ray is particularly useful with regard to system observability as well as SRE. Site Reliability Engineering is an approach to software development that focuses on ensuring the reliability and availability of large-scale, complex software systems. It combines aspects of software engineering and operations to create a holistic approach to building and maintaining software systems. In this post, we will explore how to use X-Ray to improve observability and SRE when deploying ECS containers. There are several tools utilized to gather the data required to create observability for an SRE initiative, let us define and investigate these now.

Tracing requests: As your application runs and makes requests to different services, AWS X-Ray captures information about each step of the journey. It records things like how long each step took, what service it was communicating with, and any errors that occurred.
- With this information, developers can use AWS X-Ray to see exactly how their application is working, and identify any issues or bottlenecks. They can use the trace to visualize the path that requests take through the application, and see where things might be slowing down or going wrong.
- A trace in AWS X-Ray is a record of the journey that a request takes through an application, which developers can use to analyze and debug their code. X-Ray allows you to trace incoming requests to your ECS services and see the details of how they are handled by your application. You can view the latency of each request and see how long it takes for each service to process it. This information can help you identify bottlenecks and optimize your application's performance.
Analyzing errors: X-Ray also allows you to see any errors that occur in your ECS services. You can see the error message, the stack trace, and the request that caused the error. This information can help you quickly identify and fix issues in your application.
Monitoring resource usage: Utilizing the X-Ray SDK and annotations, X-Ray can also be extended to provide monitoring of the resource usage of your ECS cluster. You can see how much CPU, memory, and network traffic each cluster, and use this information to optimize your application's resource usage. This functionality requires some additional expertise around SDK integration that our Trek10 team would be happy to facilitate.
Service Map: X-Ray provides a visual representation of the dependencies of your services, called the Service Map. This helps in identifying the services that are causing delays, errors, or high resource usage.
Anomaly Detection: X-Ray provides an Anomaly detection feature that automatically detects and alerts you when there is a change in the normal behavior of your services.
Root Cause Analysis: X-Ray provides a Root Cause Analysis feature that helps in identifying the root cause of an issue by analyzing the traces of requests and providing possible causes of the problem.

In the following images, you can see how X-Ray traces a request, identifies errors, and shows service map and resource usage.

Maximize Your AWS Investment with Trek10's 24/7 Monitoring

Service Log(s)

The Discover panel allows you to view and query logs. AWS X-Ray discovery panel offers several benefits over just reading log files.

Firstly, the discovery panel provides a visual representation of the flow of requests through your application, which can be much easier to understand than looking at raw log files. It allows you to see the entire path of a request and understand how different components of your application are interacting with each other. This can be particularly useful when troubleshooting issues, as it can help you identify which part of the application is causing problems.

Secondly, the discovery panel can provide detailed information on the performance of your application. It allows you to see how long each request took to process, and which parts of your application are causing delays or bottlenecks. This information can be used to optimize your application and improve its performance.

Thirdly, the discovery panel can help you identify dependencies between different components of your application. This can be particularly important in complex systems where there are many different services or microservices interacting with each other. By visualizing these dependencies, you can gain a better understanding of the overall architecture of your application and identify potential areas of risk.

Finally, the discovery panel can help you track changes to your application over time. It allows you to see how the flow of requests through your application has changed over time and can help you identify trends or patterns in how your application is being used.

Overall, the AWS X-Ray discovery panel provides a powerful set of tools for understanding the behavior and performance of your application and can help you optimize your application for reliability, scalability, and performance. Check the log activity happening in the microservice application(s) running in your ECS containers. The Discovery Dashboard is a powerful tool that provides visibility into the performance of your application. It provides an interactive map that allows you to trace requests from start to finish, identify performance bottlenecks, and troubleshoot issues. The dashboard also provides detailed performance metrics and insights into the performance of your application. With X-Ray Discovery Dashboard, you can quickly identify and resolve issues, and optimize the performance of your application.

Service Map

AWS X-Ray is a service that provides an end-to-end view of requests as they travel through your application. It provides an interactive map that allows you to trace requests from start to finish, identify performance bottlenecks, and troubleshoot issues. X-Ray also provides detailed performance metrics and insights into the performance of your application. With X-Ray, you can quickly identify and resolve issues, and optimize the performance of your application.

Traces

In AWS X-Ray, a span is a way to track a specific unit of work that is being performed as part of a larger operation. Think of it like a "breadcrumb" that helps you trace the path of a request through your application.

For example, let's say you have a web application that allows users to search for products. When a user enters a search query and hits the "search" button, that request is broken down into a series of smaller tasks, or spans. One span might be responsible for fetching data from a database, while another might be responsible for rendering the search results on the user's screen.

Each span in AWS X-Ray captures information about the task it represents, such as how long it took to complete, which service it was interacting with, and any errors that occurred. By looking at the spans for a given request, you can gain a detailed understanding of how that request was processed and where any issues might have occurred.

Spans are useful for developers because they allow you to break down complex operations into smaller, more manageable pieces. They can help you identify bottlenecks or areas of your application that are causing performance issues. And because they are linked together into a larger trace, you can use them to visualize the path of a request through your application and understand how different services are interacting with each other.

In short, a span in AWS X-Ray is a way to track a specific unit of work that is being performed as part of a larger operation. It helps you understand how your application is processing requests and identify areas for optimization or improvement.
Applications need to be instrumented to generate and send trace data downstream. There are two types of instrumentation:

Automatic – In automatic instrumentation, no application code change is required. It uses an agent that can capture trace data from the running application. It requires the usage of the language-specific API and SDK, which takes the configuration provided through the code or environment and provides good coverage of endpoints and operations. It automatically determines the span start and end.
Manual – In manual instrumentation, developers need to add trace capture code to the application. This provides customization in terms of capturing traces for a custom code block, naming various components in OpenTelemetry like traces and spans, adding attributes and events, and handling specific exceptions within the code.

These instrumentations generate traces of information to developers and site reliability engineers that can help identify errors, bottlenecks, well-behaving microservices, as well as both internal and external API transactions.

Log and Trace Correlation

Although the log and trace data provide valuable information individually, the actual advantage is that we can relate trace data to log data to capture more details about what went wrong. There are three ways we can correlate traces to logs:

Runtime – Logs, traces, and metrics can record the moment of time or the range of time the run took place.
Run context – This is also known as the request context. It’s standard practice to record the run context (trace and span IDs as well as user-defined context) in the spans. OpenTelemetry extends this practice to logs where possible by including the TraceID and SpanID in the log records. This allows us to directly correlate logs and traces that correspond to the same run context. It also allows us to correlate logs from different components of a distributed system that participated in the particular request.

Logs are a valuable source of information about your application and can provide detailed insights into how it is functioning. However, logs from different services or components of your application can be difficult to analyze in isolation. Correlating logs in AWS X-Ray means connecting them to specific spans or traces so that you can see how they relate to other events in your application.

For example, let's say you're experiencing a performance issue with your web application. By correlating logs with spans in AWS X-Ray, you might be able to identify which service or component is causing the issue. You could see which database queries were executed during a particular request, and whether they took longer than expected. You could also see any errors that were encountered, and which service or component they originated from.

Correlating logs in AWS X-Ray can also help you identify issues that might not be apparent from looking at individual logs or spans. For example, you might notice that errors are occurring more frequently during periods of high traffic, or that certain services are consistently slower than others. By correlating logs with spans, you can gain a more holistic view of your application and identify areas for improvement.

Overall, correlating logs in AWS X-Ray can be a powerful tool for understanding the behavior of your application and diagnosing issues. It allows you to connect different sources of information and gain a more complete picture of how your application is functioning.
Origin of the telemetry – This is also known as the resource context. OpenTelemetry traces and metrics contain information about the resource they come from. We extend this practice to logs by including the resource in the log records.

Overall, X-Ray is a great tool to have in your observability and SRE toolbelt when deploying ECS containers. It can help you quickly identify and fix issues and improve the performance of your applications.

In conclusion, AWS X-Ray is a powerful tool that can help you improve the observability and SRE of your ECS containers. By using X-Ray, you can gain a deeper understanding of how your applications are behaving, identify and fix issues quickly, and optimize your application's performance. For a workshop or to set up a proof of concept please check out the workshop located here: Microservice observability with Amazon OpenSearch Service part 1: Trace and log correlation | AWS Big Data Blog