‘Performance Considerations of Serverless’ - Think FaaS Podcast
Welcome back, I’m Jared Short at Trek10, and this is ‘Think FaaS’, where we learn about the world of serverless computing in less time than it takes to run an AWS Lambda function. So put five minutes on the clock - it’s time to ‘Think FaaS’.
This week we are tackling performance, and what that looks like in the current Serverless ecosystem. Whether it’s about cold starts, concurrency limits, traffic spikes, or database connection pooling, everyone at some point has to at least consider performance in serverless systems at scale.
First, let’s understand “cold starts”
If you’ve listened to some of our previous podcasts, or been anywhere near the serverless space, you’ve probably heard about these “cold starts”. Some folks disregard them, some vilify them, (however, nobody praises them) but what are they? Long story short, cold starts are a by-product of how most FaaS providers are able to provide compute nearly instantly for you at nearly any scale… when requests come in to your event, typically containers are spun up to serve those requests. Spinning up that container and initializing your code takes time, this incurred overhead is the “cold start” penalty.
However, in order to improve performance, if within a reasonable amount of time there is a subsequent second, third, and so on request, those containers can be re-used. These are called “warm” containers. They do not incur the same overhead.
There is a very important caveat to all of this that is not typically addressed however, having a single warmed container at all times does not protect you from concurrent requests starting additional new containers. A single instance or “container” of your function can only serve one request at a time, so either a second request has to wait for that container to be available, or the platform starts a new container (incurring the cold start penalty).
So for example, if you have a sports app that has low usage most of the time but on a big game day many people start using it right before the game starts, you would have hundreds or thousands of concurrent requests coming in at once. Chances are that many of those first visitors would get cold starts as hundreds of new containers are brought online to serve the many concurrent requests. Yan Cui has a great post, “I’m afraid you’re thinking about AWS Lambda cold starts all wrong”, that dives even further into details and understanding on HackerNoon that I encourage you to check out.
Let’s put the cold starts on ice, and take a trip down memory lane
When it comes to declaring how much “power” you want a function execution to have, particularly in AWS Lambda, there is only a single option, “Memory”. Behind the scenes this single memory configuration actually determines a whole slew of other things including network, IO, and CPU.
If you are ever experiencing high latency or otherwise poor performance on a lambda function, try cranking up the memory setting and see where that gets you. If you are just flying in blind right now, a good baseline is usually 512mb of memory and then jumping up to 1024mb if you still think you can get more. You typically don’t need much beyond that unless you are doing heavy data processing or ETL type jobs.
When in doubt and starting a new service in production, and this feels really weird to say, “over provision” your function memory a bit. Pick a tier or two higher than you tested with and then after you have a reasonable amount of metrics, pull it back in.
If you are interested Alex Casalboni has a great blog post on “AWS Lambda Power Tuning with AWS Step Functions” on the Serverless.com blog, and an accompanying open source library. He made it easy to quickly test what power level a particular function should be running at!
Finally, we cannot forget about upstream and downstream resources and their performance considerations and limitations
Serverless performance start, nor end inside your container, you must consider all your integration points with other systems.
When it comes to upstream systems, there may be limitations put in place by your platform in it’s native integrations that could impact what you are trying to build. Take for instance the Amazon Kinesis to AWS Lambda integration, fantastic for event streaming architecture, there’s no doubt. However, Kinesis will only send records to a lambda function once a second per shard. In a hyperlow latency sensitive scenario, this simply may be a non-starter. You are at the mercy of those native integrations and their limitations, and while in most scenarios this is perfectly acceptable, sometimes you just need a little bit more control.
Now, downstream resources are something you really have to think about. When you wield the great power that is serverless and its scalability you have to be a good citizen on your downstream services and infrastructure. Consider writes to databases, while your FaaS executions may be able to scale to a surprise spike of 10,000 concurrent requests painlessly… your database is probably going to come to a painful, screeching, halt. In traditional server environments, various request threads can use database connection pooling. You simply don’t get that in FaaS as each execution is entirely isolated from a concurrent one. So be considerate, and realize you may need to implement queues and caching to protect those systems that aren’t as awesome as your shiny new serverless service!
That’s all I’ve got time for this week! Do you have a fun serverless performance gain or horror story? Tell me on twitter @shortjared. You can also follow Trek10 there @Trek10inc. See you again soon on the next ‘Think FaaS’.