Cloud Security Series - Part 3: Managing Risk Across the Data Lifecycle

Table of Contents

Data Security in the Cloud

In the next iteration of this blog series, I want to dive into the domain of data security.

Data security has never been more important than in today’s data-driven world. Businesses create, store, and analyze data at an ever-increasing rate, deriving insights to support critical business decisions. The knowledge hidden within data is no longer a neglected afterthought. It is a valuable asset that allows businesses to gain an important competitive advantage.

But assets need to be protected. Without protection, companies risk severe consequences, including massive financial losses from fines, irreparable reputational damage, and loss of customer trust.

The Data Lifecycle

Why is data security such a complex task? First, there has never been a time in human history where more data has been created and stored than today. In 2025, it is estimated that approximately 402.74 million terabytes of data were generated every day. That would be 181 zettabytes throughout the whole year 2025. That is a lot of data. Enough to justify the whole complexity discussion right then and there.

But data creation is only a small part of what is called the data lifecycle. The entire lifecycle consists of the six steps that each piece of data goes through. Those steps are

Creation
Storage
Usage
Sharing
Archival
Destruction

While going through the stages of the data lifecycle, data needs to be handled securely and appropriately at all times. Mishandling data, especially sensitive data like personal information or business secrets, can lead to irreversible damages to the company as described above.

Creation

Besides the amount of data that is created, data is created with different sensitivity levels. At creation, it needs to be categorized and classified to ensure proper handling. The underlying classification policies need to be created, maintained, and reviewed. This is of particular importance when sharing data with other entities across security boundaries. A data inventory needs to be kept, providing an overview of all the sensitive data created within the organization and thus preventing shadow data. Shadow data being unknown, hidden, or overlooked copies of sensitive information that exist outside the purview of an organization’s IT security measures.

Storage

Data is usually stored where it is created, distributed across different locations and storage types. Central storage architectures like data lakes are often introduced for central data management and governance, leading to duplication of sensitive information and forcing protection of the same data across different locations. Different sensitivity levels and regulatory requirements require adjustments to storage architectures and encryption strategies. Key management, including robust key creation, storage, rotation, and monitoring, becomes an essential component to secure data at rest.

Usage

Data is accessed by a large variety of entities, internal as well as external. Strong security boundaries need to be established and paired with robust identity and access management as well as data classification controls. Data is often transferred across security boundaries during usage and processed in different locations. This requires robust encryption controls at rest, in transit, and in use. With sensitive data moving across boundaries and changing hands so often, defining clear data ownership becomes fuzzy and challenging.

Third-party integrations within their own cloud environment have become the norm rather than the exception. Data is regularly shared with other entities across organizations’ own security boundaries. Businesses need to ensure that the security and data handling practices of those entities meet internal standards to ensure consistent data protection across the whole supply chain. The implementation of proper data mapping is a key component to allow for the exchange of sensitivity labels. With so much data being shared, data sprawl is a common occurrence, with organizations no longer knowing what data they own, where it resides, and who has access to it.

Archival

Once data becomes inactive or is rarely accessed but needs to be retained for historical reference or long-term compliance, it needs to be archived. Long-term storage strategies, including regular audits and long-term key management processes, need to be defined. Retention periods based on regulatory requirements need to be designed, reviewed, and implemented. Classification and compliance standards might change over time, requiring flexible archive solutions. Fast retrieval and restoration of data need to be considered when building archival solutions.

Destruction

At the end of the data lifecycle, data needs to be deleted. All copies across all storage locations need to be securely removed without residue. Data deletion should be complete and non-reversible. Robust identity and access management becomes important in deciding who is allowed to delete what kind of data. Ephemeral storage in the cloud makes proper destruction a challenge. Even with proper destruction mechanisms in place, legal holds need to be honored, forcing companies to retain data past the defined lifetime.

Monitoring

All stages of the data lifecycle need to be audited and monitored. Remediation actions need to be initiated in case of breaches or deviations from defined standards or regulations. Proper monitoring and auditing require even more data. This data also needs to run through the whole data lifecycle, adding to the overall complexity.

Remove Human Error from Data Security

It should be clear by now that data security in general, and especially in the cloud, is far from trivial. In the next section, I will discuss four key principles that will help reduce human error introduced into the data security processes and data lifecycle, thus leading to more robust and secure data protection and management within the organization.

Automate and Enforce Classification

Many of the data security and management issues can be traced back to neglect within the first stage of the data lifecycle; data creation. The most common security issues during creation are weakly defined or non-existing data ingestion, categorization, and classification standards combined with a lack of their enforcement. Every security weakness introduced at this stage leads to long-lasting consequences throughout the whole lifecycle, consequences that are extremely difficult to correct after the fact.

To create a solid foundation, we need to design and implement a robust and automated data ingestion and classification strategy. Concrete data categorization and classification policies need to be centrally defined and enforced at data creation across the whole organization. Categorization and classification should be automated as much as possible using specialized tools, removing the need for manual human intervention. Data not adhering to these standards should be prevented from being ingested into the organization’s data landscape. Ingestion should be handled via predefined and secure pipelines. Ingesting data into the system without passing these checkpoints should be blocked. Furthermore, data should only be allowed to be consumed from pre-approved and centrally governed sources.

Solutions like DSPMs are a great starting point to build a solid data foundation. They allow companies to gain full visibility into their data, simplifying the process of classification and data tagging through pre-defined policies, automation, and enforcement. Automated risk detection and remediation, as well as proactive risk assessments, ensure faster incident response and compliance reporting. The result is a holistic, data-centric security approach and improved visibility into the data landscape of the business.

Strong Encryption as a Default Baseline

With data being stored, moved, and processed across so many security boundaries, strong encryption at rest, in transit, and in use becomes mandatory throughout the whole data lifecycle.

Infrastructure standards should be defined centrally and enforced early in the development process by leveraging the DevSecOps methodology in combination with Infrastructure as Code and robust deployment pipelines. This prevents unencrypted resources and endpoints from ever reaching the cloud environment, making use of the shift-left principle. Policy engines like OPA can be seamlessly integrated into existing workflows to define and enforce infrastructure standards using policy as code. These policies can also be managed centrally to allow a consistent standard across the whole organization. Default encryption enforcement offered by many cloud providers should be enabled.

Robust key management, including secure creation, rotation, storage, and monitoring of keys and key material, needs to be ensured. In such a vast and distributed ecosystem as the cloud, reducing the operational overhead of key management while ensuring flexibility and secure handling of key material is essential. Managed services like AWS KMS allow streamlined key management that is fully integrated with AWS storage services, AWS Identity and Access Management, and services like AWS CloudTrail and AWS Config for monitoring, drift detection, alerting, and automatic remediation. Furthermore, leveraging services like AWS KMS supports the implementation of hybrid key management approaches, enforcing centralized creation and monitoring of keys while allowing decentralized key storage and rotation to improve flexibility and reduce internal dependencies.

TLS should be mandated for all services and connections, combined with enforcement of secure protocols like HTTPS. Clear network segmentation and traffic isolation should be performed while prohibiting the usage of weak legacy protocols. Automated certificate management services like AWS Certificate Manager or AWS Private CA support certificate management throughout the entire lifecycle, from creation through automated renewal and rotation to monitoring.

Implementation of confidential computing should be used to protect sensitive data from outside access while in use. Nitro Systems, the underlying platform for all Amazon EC2 instances, is an example of inbuilt confidential computing, offering protection from cloud operators, AWS system software, and the organization’s own operators and software tools.

Attribute-Based Access Control

When dealing with hundreds of identities trying to access and process data, identity and permission management becomes a challenge. This is enhanced by the fact that data access needs to be handled differently, depending on sensitivity levels and compliance requirements.

In such environments, permission assignments need to be automated and streamlined. One approach is attribute-based access control, or ABAC for short. No longer are we defining access based on static, pre-defined roles like in role-based access control. ABAC offers superior flexibility and granularity for data access management by evaluating access dynamically using policies and attributes assigned to both the identity and the data itself.

First, attributes are defined and assigned to identities and data. Attributes can include the user role, file sensitivity, time, location, and much more. Second, data access policies are created. Access is then evaluated and enforced at runtime through attribute matching, enabling complex and context-aware access management. As attributes and policies are easily adjustable, ABAC remains highly flexible and scalable, allowing hundreds of policies to be enforced without the need to create new roles. This is especially beneficial in large and highly regulated environments, as it allows the enforcement of need-to-know access, thus reducing the overall risk and simplifying compliance audits.

In AWS, ABAC can be implemented using tags, with many of the AWS services already supporting ABAC. The foundation remains a robust and tightly monitored data categorization and classification strategy discussed earlier. Attributes need to be defined, reviewed, and assigned first before they can be used for policy evaluation.

Data Lifecycle Automation

The data lifecycle is a multi-stage process that is often not respected in its entirety. Data is created, stored, used, and shared. Rarely is data properly archived or deleted though. But without data archival and deletion, the data lifecycle is not complete. The result is old and stale data adding noise to already complex data management processes.

Data lifecycle management, especially toward the end of the lifecycle, should be mandatory and automated. Policies for data handling, archival, and deletion need to be defined based on sensitivity levels as well as regulatory and compliance requirements. The lifecycle metadata should be assigned to the data via categorization upon creation. As stated before, the foundation remains robust data categorization and classification automation and enforcement at the start of the data lifecycle.

Once the lifecycle metadata has been added, automated archival and deletion processes should be implemented and enforced across the whole cloud environment. The enablement of lifecycle automation within cloud services can be ensured using robust deployment pipelines, Infrastructure as Code, and Policy as Code. Every storage medium and service should have lifecycle rules attached. Based on the pre-defined rules and the attributes attached to the data itself, data should be automatically and securely archived or deleted, thus completing the data lifecycle.

Summary

As we can see, complexity and scale are present within the whole data lifecycle. Introducing human error when dealing with large quantities of sensitive data can cause irreversible damages to organizations and individuals alike. With the consequences in mind and the ever-increasing amount of data that is being created, it is understandable why designing robust and secure data management processes in the cloud is of such importance.

By implementing strong security foundations and reducing human error through automation and policy enforcement, organizations can significantly improve their overall security posture while maintaining scalability and compliance in the cloud.

I hope this blog provided useful insights and practical ways to improve your own cloud data security approach and overall security posture.

— Hendrik

Title Photo by Jeremy Perkins on Unsplash

Cloud Security Series - Part 3: Managing Risk Across the Data Lifecycle

Data Security in the Cloud