Cloud Security Series - Part 2: IAM in the Cloud: Making IAM in the Cloud more manageable

Cloud Security Series - Part 2: IAM in the Cloud: Making IAM in the Cloud more manageable

Table of Contents

Cloud Security is hard

When talking to clients and partners about cloud security, one common theme always emerges: securing the cloud is hard. And it’s true. Building and maintaining a secure cloud environment is a challenging task. But why is that?

When designing and building a robust and resilient cloud environment, there are many security aspects to consider. Mature identity and access management, strong data protection and encryption, and reliable backup and disaster recovery strategies, just to name a few.

However, those challenges also exist in on-premises environments. Why does it seem like the cloud adds another layer of difficulty on top?

The answer is simple: complexity and scale.

Complexity and Scale are inherent

When moving into the cloud, we start to operate within large and complex ecosystems, regardless of how small or big our own cloud environment is. Even with a single AWS account, we have instant access to hundreds of services, thousands of API calls to consider, global reach from day one, and virtually unlimited scalability. We don’t gradually grow into this complexity over time as we get more familiar with the cloud. Instead, we are exposed to it immediately. But neither complexity nor scale is the real problem. They are among the core strengths of the cloud and one of the main reasons many of us moved into the cloud in the first place. So, if complexity and scale are not the core issue, what is?

The answer is humans. More precisely, human error.

The Real Problem: Human Error

Humans are great at building large and complex systems over time. But we are terrible at managing them once they have reached a certain size. Most cloud security issues do not arise due to the cloud’s complexity and scale; they arise because humans struggle to safely manage and interact with systems of this magnitude. So, what’s the solution? Accepting insecure cloud environments or artificially reducing cloud complexity and scale? No.

The only viable solution to creating more robust and secure cloud environments is to reduce the amount of human error in the system as much as possible. I deliberately wrote “reduce” and not “eliminate.” Human error is unavoidable. What we can do, however, is design systems that tolerate those mistakes and limit their impact.

Cloud Security through the lens of Human Error

I have to admit, this is easier said than done, and it’s the reason I decided to write this blog. This post is the first of a multipart series in which I discuss the different aspects of cloud security through the lens of complexity, scale, and human error. I will talk about how complexity and scale manifest in each domain, and how we can reduce the human error introduced into these systems.

Let’s start this series with one of the core pillars of cloud security and one of my personal favorites: Identity and Access Management.

Why is IAM in the cloud such a challenge?

First, cloud environments come with different access types. Besides your standard user access, we have root or other privileged access, and service access. Each access type comes with different credential models: temporary and long-lived credentials, secrets, and access keys. Especially long-lived credentials and secrets are a common attack vector and require careful handling.

Adding to this, non-human identities vastly outnumber human ones in most cloud setups. Roles, service accounts, third-party integrations, automation tooling, and other components require identities and permissions we need to manage and govern.

Permissions themselves are defined through policies we either write and maintain ourselves or define using managed policies. Managed policies rarely follow the principle of least privilege and should therefore serve more as a starting point rather than the final solution. Writing and maintaining fine-grained permission policies for a large number of identities quickly becomes a challenging task.

The challenge is further enhanced by the fact that effective permissions rarely result from a single policy. Instead, they emerge from multiple permission policies that are applied at different layers and levels throughout the permission chain. Understanding what an identity can actually do at any given point in time is therefore far from easy.

The cloud service landscape itself is also constantly evolving. New services and features are introduced regularly, while others are deprecated or replaced. This creates an ever-shifting foundation for permission management.

Cross-account access and multi-cloud or hybrid-cloud setups add yet another layer. Cross-account access includes granting permissions to accounts within your own organization as well as to external environments. Multi-cloud and hybrid cloud setups introduce trust relationships between different cloud providers and on-premises infrastructure. In all these cases, we need to manage and maintain identities, permissions, and trust relationships that sometimes span organizational and external boundaries.

Finally, scale acts as a powerful complexity multiplier in such vast ecosystems. Cloud environments often consist of hundreds or even thousands of accounts across different providers. Hundreds of services expose thousands of API calls. Thousands of identities, permission policies, and credentials exist across multiple domains. At this scale, even a small misconfiguration can have a significant impact on the security posture.

Making IAM in the cloud more manageable

Simply pointing out how large and complex everything is won’t do us any good. I will now discuss three core principles which help streamline identity and access management in the cloud, reduce the human error introduced, and lead to a more robust and secure setup.

Zero trust and least privilege are foundational principles I already discussed in a previous blog. I will not repeat them here. They are two of the main pillars of a secure cloud environment and contribute substantially to effective IAM in the cloud. They should be implemented wherever possible.

Central identity and permission management

Many IAM-related failures can be traced back to organizational weaknesses rather than technical limitations. The two most common issues I regularly encounter are lack of central ownership of cloud IAM and poor or even nonexistent integration with existing enterprise IAM.

Instead of consolidating IAM and reducing the maintenance and governance overhead, identities and permissions are frequently managed independently in each cloud environment and on-premises. As a result, a parallel shadow IAM for the cloud is born.

This not only creates unclear security boundaries but also makes it extremely difficult to determine which permissions a given identity possesses across the whole organization at any given time or whether those permissions adhere to internal standards at all. A dangerous combination in large and complex cloud environments where a few overly permissive permissions and a hijacked identity can quickly spiral out of control.

The solution is to consolidate your cloud IAM and enterprise IAM. Identities and permissions should be managed centrally and be subject to the same standards, controls, and governance processes. Maintaining a clear overview of the organization’s entire IAM landscape and managing access and permissions including the cloud becomes far more manageable as a result.

Preventive and detective security controls in the cloud.

Even with centralized IAM, excessive or improperly defined permissions are still possible and can never be prevented completely. Furthermore, some aspects of identity and access management in the cloud, like service-level IAM or fine-grained resource policies, simply cannot be centralized.

What we need are controls and mechanisms in place that help detect and prevent IAM misconfigurations.

Implementing detective controls should be the bare minimum baseline, especially in the cloud where enabling such controls has a very low barrier to entry. Services like AWS CloudTrail, AWS Config, and Amazon CloudWatch immediately come to mind in the AWS ecosystem. These services help maintain visibility into the cloud environment and can send notifications in case of detected violations or suspicious activity.

One common issue with detective controls, however, is the sheer amount of data that is collected combined with the lack of follow-through once an issue has been detected. This often creates a false sense of security. Humans can easily become overwhelmed by the amount of data and collecting telemetry without acting on it is a pointless endeavor.

Cloud-native automation and self-remediation pipelines can be a great starting point to help with some of those challenges. When possible, issues are detected and automatically remediated without human intervention.

We also need preventive controls to complete the picture. Identities and policies should be tested and reviewed before they are deployed. Methodologies such as DevSecOps and tooling like Infrastructure as Code allow cloud development pipelines to integrate security testing early in the delivery process. The term shift left comes to mind. Catching and fixing issues early, making it cheaper, faster, and more efficient than fixing them later.

Identity and policy standards and design patterns can be defined once and reused as a default baseline for further configurations. These standards and baselines should also be centrally managed and integrated into overall IAM maintenance and governance. The result is fewer incorrectly defined or overly permissive identities and policies reaching the cloud environment and a substantially reduced blast radius in the event of an incident.

Identity and permission lifecycle management

Even with detective and preventive controls and with centralized IAM, some misconfigurations will still be overlooked. This is where the third principle, identity and permission lifecycle management, becomes essential.

I often see IAM in the cloud treated as a one-time setup and not an ongoing operational responsibility. Identities and policies are created and never reviewed again. Credentials and secrets are never rotated. Identity and permission lifecycle management is simply absent. The result is a weakened cloud security posture. The cloud is not a fixed entity. It changes constantly. New services, new features, new integrations, and new actors are introduced all the time. We need to adjust our own internal processes to handle this continuous change.

One solution is a robust identity and permission lifecycle management. It means treating IAM in the cloud as a dynamic system rather than a set state. Everything is evaluated continuously. Lifecycle management includes in-depth reviews of the current setup, including not only user identities and permissions, but all IAM security-related components like inline and resource policies as well as security controls and configuration standards.

Reviews serve as a mechanism to gain insights into the current state of the environment. They can be further enhanced by conducting security gamedays or penetration testing to uncover hidden weaknesses. The information gathered should then be used to adjust controls, standards and baselines, reconfigure insufficient or weak identities and policies, and conduct a thorough clean-up of the whole environment to get rid of outdated and excessive components. Tools like the AWS IAM Access Analyzer or open-source solutions like steampipe are a great starting point for IAM analysis and access-related assessments.

Final Thoughts

Together, these three principles help reduce both the amount and the impact of human error in cloud environments substantially. The result is a more stable, resilient, and secure cloud foundation.

That said, even with these principles in place, human error can never be eliminated completely. Cloud security is an ongoing process that must be continuously evaluated and adapted as environments evolve.

I hope this blog provided useful insights and practical ways to improve your own cloud IAM setup and overall security posture.

— Hendrik


Title Photo by Parsoa Khorsand on Unsplash

Related Posts

Centralize IAM by integrating Okta with AWS IAM Identity Center via SAML and SCIM.

Centralize IAM by integrating Okta with AWS IAM Identity Center via SAML and SCIM.

This blog post walks you through setting up an Okta Application for AWS IAM Identity Center, connecting Okta for SSO, syncing users and groups, and creating a unified IAM experience across your cloud environment.

Read More
Build Terraform CI/CD Pipelines using AWS CodePipeline

Build Terraform CI/CD Pipelines using AWS CodePipeline

In this blog post, I would like to show you how you can leverage the AWS Code Services CodeCommit, CodeBuild, and CodePipeline in combination with Terraform to build a fully-managed CI/CD pipeline for Terraform.

Read More
Serverless Jenkins on ECS Fargate: Part 1

Serverless Jenkins on ECS Fargate: Part 1

This is the first post of a three-post series. In this series, I would like to show you how you can leverage AWS Fargate and Terraform to deploy a serverless as well as fault-tolerant, highly available, and scalable Jenkins Controller/Agent deployment pipeline.

Read More