Ops in a Serverless World
New technology appears and old job functions become obsolete. The practitioners move on to retrain on higher-value tasks and in the end the whole system is more productive. It happened when cars replaced blacksmiths, when the computer replaced armies of human calculators, and it is happening now as cloud services replace low-level hardware babysitting on a large scale.
And now we are on the cusp of yet another obsolescence. The trend is “serverless”: In the AWS world, this means running a complete application with AWS platform services that includes API Gateway, Lambda, S3, and DynamoDB and using the Serverless Framework to manage it. In a broader sense, this is just the next phase in the Platform-as-a-Service (PaaS) evolution: this particular AWS combination of services happens to enable perhaps the most powerful and flexible platform that we've ever seen.
The nirvana vision of PaaS is to enable developers to build arbitrarily complex applications while obviating the need for server or cluster management. This frees the business from such non-differentiating tasks as patches, backups, and dependency management. The “Lambda Stack”, we believe, gets much closer to this vision than any platform yet built.
So when Serverless becomes a reality, what happens to IT Ops? Isn’t there, theoretically, “nothing to do”? Of course, nothing could be further from the truth. But the rise of Serverless does present Ops with an opportunity to refocus on higher value tasks.
So what goes away?
- OS-level Configuration Management: Whether you are pointing and clicking your installs or using Chef, Ansible, or the like, this has been a core function of Ops for ages. All configuration now lives within the function and the code with the Serverless project (if you're managing it with that framework). So developers can completely own this configuration.
- Backups: No servers, nothing to back up, right? Well not quite, but at least you’re not managing backup agents in the traditional way. You can focus on things like copying data to other regions for DR or other accounts for security. But backups become a much smaller and simpler responsibility.
- Patching: Happily, this one definitely goes away in a platform world.
- Managing Scaling: Identifying bottlenecks, optimizing software, adding hardware... these are the typical tasks of scaling a service. Running on the Lambda Stack, there is almost nothing to manage besides Dynamo provisioned throughput... point and click to up your throughput.
What are some higher value tasks that Ops teams can now refocus on to make their products more robust and reduce the overall cost footprint of their environment?
- Automation: Not that you weren’t doing this before, but the Serverless world makes it even more imperative. No more manual configuration is allowed! Your infrastructure should be completely code-defined and source-controlled. This will reduce overall management cost and downtime caused by mistakes in the long run. Deployment should also be 100% automated. The job of Ops is to deliver this automation to the dev team so they can focus on product and deploy without friction.
- Disaster Recovery: This is a great example of moving Ops towards higher-value tasks. In the Lambda Stack, as in any good PaaS, comes ready with server-level and even datacenter-level redundancy out-of-the-box. It does NOT however, come ready with regional-level redundancy. If you want to protect from region-level failures (possibly natural disasters, but more likely platform failures), the Ops team can focus on building fast and easy failover to another region.
- Load testing: No platform, no matter what the marketing hype, responds perfectly to an arbitrarily high load, and of course it’s always easy to write software with fundamental inefficiencies that limit its scalability. So an important role of Ops is to understand the platform’s behavior by load testing. The Ops team needs to own this: Build realistic usage profiles, run them and collaborate with the developer team and the platform provider (AWS in our case) to find and fix bottlenecks, and build automation so this load testing is a frictionless part of the release cycle.
One last point... ultimately, how is this any different from container-based architectures? In a way, not really. Essentially, all of the benefits listed above apply to both container-based architectures as well as "Serverless" platforms like the Lambda stack. The big differences are in other areas:
- Cost: For an application that is not heavily loaded 24/7, paying for just the compute you use with Lambda can be significantly less expensive than paying for EC2 instances, even if those instances AutoScale, Lambda is usually a lot cheaper.
- Complexity: The platform reduces complexity. Containers are great, but they do add some non-trivial complexity. At Trek10, we use AWS's EC2 Container Service to reduce that complexity and at the end of the day we think that complexity is well worth the benefits... but it's there. The Serverless platform promises to dramatically reduce that complexity while losing very little in terms of flexibility.
With a great platform like the Lambda Stack on AWS, the IT Operations team can refocus its efforts to higher-value tasks. The result should be a more robust application with frictionless deploys and lower overall Ops costs due to less management overhead and less downtime.