Hi, I’m Forrest Brazeal at Trek10, and this is ‘Think FaaS’, where we learn about the world of serverless computing in less time than it takes to run a Lambda function. So put five minutes on the clock - it’s time to ‘Think FaaS’.
Today I want to talk about service discovery for serverless applications. This is obviously a huge subject for five minutes, but there’s a real vacuum out there when it comes to best practices in this area, so let’s try to clear a few things up in the time we have.
Service discovery, which is the process of telling your various back-end services how to find each other in the vast reaches of the cloud, is not a problem unique to serverless applications; it’s inherent in any distributed system that has multiple components talking to each other.
But as architectures get more fragmented, with microservices turning into individual functions that are deployed separately, each with its own endpoint, it can be a real challenge to determine what version of function A should be calling what version of function B.
We see a lot of bad practice around this, so let me start by reviewing how NOT to do service discovery for a serverless app. Let’s say you have a Lambda function managed in a Serverless Framework project that gets triggered whenever some event happens. We’ll call this the Local Service. This function needs to do some processing, and then it calls the HTTP endpoint of another service that we’ll call the Remote Service.
But that Remote Service is managed in a different Serverless Framework project that’s deployed in a different environment, maybe even by different people. So you pass in the Remote Service’s endpoint and API key to your Lambda function as environment variables when you deploy the Local Service.
This is a problem for several reasons. Obviously, if the Remote Service changes its endpoint, you’ll have to update your deployment of the Local Service. So you’ve just tightly coupled two things that should be decoupled.
Not only that, whoever deploys the Local Service has to know where to find the latest configuration information for the Remote Service, so some other process to share data, maybe even access to the remote environment, has to be introduced. Multiply that over many services and it creates delays, security issues, and room for human error.
And of course, worst of all, you’re putting the API key for the Remote Service in your environment variables in plaintext, which means anybody who can log into your Lambda console for the Local Service can see it.
By the way, we’ve touched on the limitations of Lambda environment variables in this podcast before, but I’ll reiterate here. Don’t put any data into a Lambda environment variable that is…
a) sensitive and unencrypted
b) dynamic, or
c) tightly-coupled to a different service
…unless you want an insecure and tangled mess. So clearly we need some other way to share configuration information between different services.
The best practice here is to have a centralized service registry, some well-known place where all your shared configuration values are stored. There are a few ways to do this in AWS. They have something called Service Catalog that can maintain a list of products and deploy them on your behalf, but that’s not really a way to keep track of live config data between running services.
Instead you could use a DynamoDB table in a central AWS account, or even better, use the SSM Parameter Store. This is a place you can drop parameter keys and values, encrypt them at rest with KMS, version them, and tag them however you want.
So, going back to our Local Service example, you could deploy that service with an environment variable telling it to look for the Remote Service in the parameter store at the location “remote_service/prod/endpoint”. Now it’s the Remote Service’s responsibility to update the value of that parameter as necessary, and all the Local Service needs to know is how to access the parameter store.
But obviously, just a centralized registry isn’t enough. You’ll also need some kind of caching solution on the Local Service. You don’t want to incur a performance penalty on every execution of your function while you go look up remote parameters. Fortunately, there are some open source tools out there that implement Lambda caching for SSM Parameter Store. Yan Cui has one in NodeJS; Alex Casalboni has one in Python. These implementations should mostly limit your parameter lookup latency to cold starts of your function.
Finally, you’ll need some way to protect secrets, especially once you start using services across environments. Ben Kehoe at iRobot, who was a great source of ideas for this episode, has a project called ssm-ctl which has a pretty nifty approach to managing secret parameters in SSM, and we’ve done some work on that ourselves at Trek10 with the Serverless Secrets plugin for the Serverless Framework. But, as we’ve said on this podcast before, AWS has a new service out now, called Secrets Manager, that lets you securely rotate and retrieve secrets in a centralized fashion. We haven’t worked with it much yet, but I’d recommend checking it out for new projects.
The tricky part, whichever option you choose, is managing the cross-account IAM roles that let you access the secrets in other environments. But we never said serverless service discovery was painless. The good news is, with deliberate decisions to abstract your config into a central store, pay attention to caching, and keep security in mind, you may “discover” that your services are even more powerful than you thought.
If you’re ready to take the next step with best practices for your serverless environment, Trek10 would be glad to help. You can find us on Twitter @Trek10inc, or hit me up @forrestbrazeal, and I’ll see you on the next episode of Think FaaS.