At Trek10, one of the great things about running our 24/7 CloudOps support service is that we often have to solve problems on AWS for ourselves and then use our learning to implement the same solution for our customers. In this case, it was a security challenge: how do we balance the principle of least privilege with a reasonable level of productivity in a wide-ranging set of AWS accounts? We think we’ve come up with an interesting approach; I’ll lay out the principles of that approach in this post.
If you’re not already familiar with basic concepts of AWS IAM, including users, groups, policies, and cross-account roles, read up on those first. Here is a great overview. A well-established best practice for IAM is to centralize your users in either a special “IAM only” AWS account or through some user store and SSO service outside of AWS (i.e. Active Directory or Okta) and then use cross-account roles with various access levels to access your organization’s AWS accounts. The scheme looks something like this:
This system gives you clearly segmented access levels, limits the attack surface to a single login, and leaves a nice audit trail of access which you can use for notifications or alerts.
However one important point is glossed over in this best practice… how do you define which users can assume which roles? For a small team with a small number of AWS accounts, this is probably not too critical… just pick access levels based on job function and adjust as necessary. But for larger teams with many AWS accounts, this is a critical problem. It may be very hard to define upfront who needs what access with true least privilege principles.
The default response becomes to give users more and more blanket access to assume many roles across many accounts, and the result is a poor security footprint, far from least privilege.
This is where our solution comes in: temporary elevation. Users are given access to assume roles at various levels in various accounts, in an ad-hoc manner, by an approver. This approval is tightly secured and logged with a description of the approval. Most importantly, every role assumption permission is temporary with an expiration period attached. We allow expirations to be up to a few weeks out, but if you only need access for the day, that is all you request.
Approval authority is central here, so we’ve put a lot of thought into how to secure this. Approvers have an IAM policy attached with exactly one permission: execute a single Lambda function that processes the approval. Those same users also have IAM policies that enforce an MFA restriction and an IP restriction. To ensure that approveres cannot escalate themselves, we’ve implemented a true-quorum system where no one user can escalate themselves; two users are required for every escalation to succeed. This plus the usual access key / secret key gives us a total of three completely distinct factors of authentication.
Additionally, since AWS Lambda has no in-process context of who executed it (ahem ahem, Lambda product team), we developed an additional system that confirms the identity of the approver inside the function so that we can be confident the approver’s identity is being accurately logged.
One more important point… of course, because we’re Trek10, the whole system is serverless. Operational cost is trivial, all infrastructure is code-defined and repeatable, and attack surface is reduced with no long running processes that could be hacked.
If you might be interested in having Trek10 implement this solution for you, or have any questions or comments, hit us up at @trek10inc on Twitter or firstname.lastname@example.org.