You can use Step Functions to orchestrate workflows in AWS and those often invoke Lambda functions to do the work. By using permissions-based tagging, your Step Function can execute a list of Lambda functions that are being defined independently of the step function. A new Lambda function can be added for use without touching the definition of the Step Function. Our goal is to not manage a list of resources and continue to use Least Privilege best practices. That is exactly what we did with a service that manages some of our internal business processes. If your ears are perking up, read on.
Tagging Use Cases
Tagging is often associated with other use-cases as well. The primary ones I have seen are:
Security Risk Management
To learn more about those go here. Our use-case is Access Control.
Before we dive into the Step Functions pattern, it’s important to understand tagging for IAM and how it can help us here. ABAC - Attribute-Based Access Control is a wonderful use-case for tags - although the condition statements are not just limited to tags! So let's level set on these conditions then we can dive into why this is needed and how it works.
IAM Statement Condition
In a programming environment you would write your own conditional statements, but for IAM you have a set of operations to work with; these operators are akin to ==, !=, <, and > in a programming language. This might seem limiting but AWS has definitely thought through many of the use cases you’ll need! For example, if you are working with a string, as we are with tags, you might use StringEquals and then give the condition key as aws:PrincipalTag/TagName and the condition value as tag-value. This will make sure that we have a tag with key TagName and value tag-value on the principal that is calling this resource. We’ll see some more examples below. The key thing to remember is that you won’t be able to keep all these in your head but if you can remember some key ones you’ll look like an IAM wizard. You might also want to take a look at policy grammar to better understand the structure of how policies can be written.
Types of Tags
A tag is a key and value pair that can be set on a resource. However, you don’t have to have a value. You can use tags in IAM conditions in several ways:
RequestTag - Use RequestTag to compare the tag key-value pair that was passed in the request with the tag pair that you specify in the policy.
ResourceTag - Use ResourceTag for limiting what resource can have an action taken on it.
PrincipalTag - Use PrincipalTag for limiting what resource can take the action. This is usually the tag on the role you have assumed and are making requests from.
Let’s look at an example without ABAC to explain why it is truly a necessity for doing IAM at scale. Let’s say we have a user that is an IAM admin and several users that are developers that need to be able to create their own resources. Without ABAC through tagging that IAM admin would have to edit policies to specify exact resources for the groups or roles each of these developers is in. This is time-consuming work that has a lot of manual intervention. The admin might need to make a group that allows the creation of an EC2 instance but then they might need another one that allows management of those instances. Then they would have to add to the policy each time a new instance is created, which would entail updating the resource ARNs for each set of actions. The developers in this example are most likely blocked after a resource is created and are waiting for the IAM admin to update their policy. This will kill their productivity and agility which was the point of moving to the cloud in the first place!
With ABAC the need to update the policy for each instance can be eliminated by using tags. You can easily create a policy that requires certain tags to be set on the creation of a resource. The tag values are set only to what that user, group, or role is allowed. Alongside that, you have a policy that allows the management of the resource-based upon the presence of these same tags and limits what tags can be modified or even unset. Check out Brigid Johnson’s talk from re:Inforce 2019 to get the full run down:
Though the examples above use IAM users & groups, these principles of IAM ABAC apply to IAM Roles as well. At Trek10 we build serverless & cloud native, and properly scoped IAM roles are key to any cloud native architecture. It is important to remember that IAM Roles and the policies that are attached are how we give AWS services their ability to take actions on our behalf; Quint Van Deman gave us a lot of great insight into IAM when he compared it to different layers of a cake - check it out here:
Our specific use-case involves StepFunctions, Lambda, and KMS so let’s get into it.
In our problem instead of an IAM Admin we have a centralized Step Function that could kill our productivity. We need that Step Function to be able to invoke any Lambda or Step Function that it is supposed to be allowed to but we don’t want to have to manage that list manually. We need to do this in a way that is secure and doesn’t create an attack surface. Here come tags and ABAC to save the day, let’s see how.
We settled on Step Functions for the orchestration (shameless plug) because it could handle calling out to whatever we might need to invoke: Lambda, another Step Function, even AWS Batch if we wanted. Now we need a way to allow this orchestration step function to invoke these other services without just giving it the whole run of the place. Here are some snippets of IAM policies:
Note: The role itself needs the tag because that is the caller to KMS, not the Lambda/Step Function/etc...
So we have tags on both sides of this equation (the Step Function caller and the Lambda Function being called) and are using them to make sure only approved resources are available to have calls made to them. We are also making sure that the resources that are being called are only happening from principals that have the expected tags.
We have successfully created a way to invoke any resource from Step Functions without needing to be concerned with a list of resources that are allowed! The KMS key is only allowing Encrypt action to only be taken by the caller and Decrypt is only available for the functions being called because of their tag. Also, we have the functions limiting themselves to only using KMS decrypt from a key that has the CentralizedPlatform tag. This is awesome! Those with a keen eye for detail might have noticed that I mentioned the tag on the Role for the Product might and you might be wondering “Why doesn't that go on the Lambda?”. In this case, we also need it on the execution role for the product because it is technically the Principal for the Decrypt calls to KMS. That nuance is what can make working with IAM so difficult at times.
I wanted to mention that we also need to be managing controls for who can tag and how, which is an entire blog post of its own. If you're multi-account and need to do it over your entire organization you'll want to check out SCP here.