It's no secret that we love AWS Lambda for building serverless applications. Development costs can be dramatically reduced. Because you pay per invocation, operating costs (in most cases) will be less than running your application even on inexpensive t2 instances. And, support costs are decreased because you don’t have to manage and monitor servers. In short serverless applications are awesome!
Except of course when you experience an error when a Lambda function is executed. At that point, emotions can pivot fast, and a Lambda can fail for many reasons:
- Maybe you have an incorrectly handled edge case
- Maybe S3 is invoking your function multiple times for the same event
- Maybe the database connection times out
- Or, maybe an API call returns bad data or just never returns and your Lambda function runs out of time.
At Trek10, we have a Lambda function that processes logs from our log aggregation service, Sumo Logic. Sumo Logic drops aggregated log files into an S3 bucket, which in turn triggers the Lambda function. The function separates out different clients logs for long term archiving, and we monitor the function for errors. When an error occurs, we know we need to debug.
This is our process for debugging the functions:
- First, you want to find the time the error occurred via the CloudWatch metrics console:
- When ready, we clicked the arrow to start the search.
Ok now we have results (I was kind of lucky this time as errors don’t always have the word "error" in them, e.g. Lambda function timed out):
Here are all the results minimized.
Unfortunately, we don’t directly have the context of the error, but we do have the log stream name and the time. So now we can go back, find the specific log stream, go to the date/time in the stream, and then see the context.
After doing this a couple of times, we realized that this entire process could be automated. Thus, the Lambda Error Hound was born. Lambda Error Hounds is command line utility, written in node, and available on npm for getting the CloudWatch logs around Lambda error metrics, or even a specific time from any CloudWatch log group.
Check it out on github or npm.