DEV Community

Gabriel Olivieri
Gabriel Olivieri

Posted on

Budget Friendly ISO27001/SOC2 Compliant Environments for AWS

Table of Contents

The Need

Compliance frameworks are transitioning from a "good to have" to an absolute need to do business with corporate clients.

Fortunately compliance evidence gathering has been automated with countless tools and services. Opensource tools such as Prowler, CISO Assistant or vendors such as Scrut Automation, Sprinto or Vanta have made evidence collection highly accesible.

However, as stated by Databank , achieving compliance in public cloud environments requires substantially more effort, tools, and ongoing management than many organizations initially anticipate. This explains why 83% of enterprise CIOs surveyed by Barclays plan to move at least some public cloud workloads back to private cloud or on-premises infrastructure.

The Problem

While AWS is both ISO27001 and SOC 2 certified, building the logging, access reviews, and data governance required on top of it becomes operationally expensive without documented procedures and tools. Many companies rely on manual checks and configurations to be compliant, however this creates technical debt that increases manpower costs on remediations as the company and the team grows.

Furthermore for a small company like us, it is unfeasible to pay for enterprise offerings that guarantee AWS compliance. To get both ISO27001 and SOC 2 certified and to align with our value proposal of offering Safe and Compliant Cloud Platforms to our clients, we have developed a budget friendly compliant AWS environment with the following methods and strategies.

Isolating or Removing Default VPCs, Subnets and Security Groups on all Regions

Controls & Requirements Met

ISO 27001:2022

  • A.8.20 - Network security (NET-04, NET-04.1)

  • A.5.14 - Information transfer (NET-04.1)

SOC2

  • CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (NET-04,NET-04.1)

Motivation

Unused VPCs and regions are where attackers usually hide in an AWS account, since they are not as rigorously controlled as production ones. This brings the needs of isolating or removing unused resources, so they can't be used as an attack vector.

Approach

If the company is able to do so, the easiest approach to do this is removing the default vpcs all together.

If for some reason this is not possible in the account it can also be achieves by having all default subnets to block all traffic using its NACLs and remove default Security Groups rules.

There are three ways to implement these changes:

  • Quick and dirty with bash or boto3 scripts: This is the easiest one but lacks traceability therefore lays risks when implementing in existing accounts. A simple script can iterate through all subnets and security groups to implement the change.

  • CloudFormation StackSets: This is the most AWS native approach, creating a CloudFormation stackset that will deploy the stack on all regions enforcing the change. If the company is using AWS Organizations, Stacksets are integrated with this service, so it will be able to enforce the change across the different accounts in the organization.

  • AWS Control Tower: If the company is big enough and has the need of using this service, this is the easiest way to deploy landing zones on all the accounts

AWS Stacksets

Image credit: AWS

Allow Flow Logs in VPCs in all Regions, send to a S3 bucket, and query with Athena

Controls & Requirements Met

ISO 27001:2022

  • A.8.15 - Logging (MON-02.2, MON-08, MON-03.3, MON-03.7)

  • A.8.16 - Monitoring activities (MON-02.2)

SOC2

  • CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (MON-08)

  • CC7.2 - The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives; anomalies are analyzed to determine whether they represent security events. (MON-01.8)

Motivation

All network events have to be audited in the case of a security incident, either to address the incident or avoid a follow-up by determining the root cause and enforcing the required changes.

Approach

VPC flow logs are quite easy to set up, either manually or through Infrastructure as Code (Terraform, CDK, Cloudformation, etc).

The real challenge is meeting the requirement being cost-effective, since sending all network logs to Cloudwatch could increase the billing of the cloud account exponentially.

As a budget friendly alternative that is also compliant it is possible to send the logs to an S3 and query using AWS Glue Crawler and Amazon Athena. Javier Carrera wrote an excellent blog on the architecture implementation

VPC Flow logs architecture

Image credit: Javier Carrera

Allow Cloudtrail on all Accounts, send to Cloudwatch or S3 to be queried by Athena

Controls & Requirements Met

ISO 27001:2022

  • A.8.15 - Logging (MON-02.2, MON-08, MON-03.3, MON-03.7)

SOC2

  • CC7.2 - The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives; anomalies are analyzed to determine whether they represent security events. (MON-01.8)

Motivation

If someone compromised the account or there was an unintended action by a cloud user, Cloudtrail is the service that can give hints on what user or role has been compromised and the impact that has had on the cloud environment.

Approach

This is a fairly easy change, it consists on creating a Cloudtrail for the company accounts and exporting them into either S3 or Cloudwatch.

If the company is using AWS Organizations it is as simple as creating one organization trail for all accounts and regions.

If AWS Organizations can't be used, the landing zone pattern should be used to deploy a multi-region trail in every account.

To export either the S3-Glue-Athena pattern mentioned before can be used to read the trails or Cloudwatch.

I don't consider the S3 pattern to be a must in this case to save on costs since trail logs are not as big as network logs, but that's also up the amount of users the organization has in its cloud accounts. Also, Cloudwatch has the plus of having anomaly detection for these trails, which allows detecting security incidents sooner.

AWS Cloudtrail

Image credit: AWS

Enforce KMS Encryption to all EBS Volumes and S3s with Key Rotation

Controls & Requirements Met

ISO 27001:2022

  • A.8.24 - Use of cryptography (CRY-09)

SOC2

  • The entity restricts the transmission, movement, and removal of information to authorized internal and external users and processes, and protects it during transmission, movement, or removal to meet the entity’s objectives. (CC6.7)

Motivation

All data at rest should be encrypted both for security and compliance. It acts as a fail-safe for data exposure, if the attacker compromises the data without the key it's useless to get anything from it.

AWS offers a fairly easy way to manage keys and their rotation for this objective with KMS service that can be paired with the main services for data storage, such as EBS and S3.

Approach

To comply all data in EBS should be encrypted, this means not only volumes but also snapshots, so that should be also be taken into account when creating backups or AMIs.

The easiest way to enforce EBS encryption on an account level is enabling "EBS encryption as default" flag in the account settings.

If using AWS Organizations one way to comply is using Service Control Policies as a Guardrail.

This consists on attaching a JSON policy to the Organizational Unit (OU) that explicitly denies the ec2:CreateVolume and ec2:RunInstances API actions if the ec2:Encrypted condition is set to false.

Private S3 buckets should also be encrypted either by using KMS keys or S3 managed keys (SSE-S3). While SSE-S3 is easier to manage, it lacks auditability and does not provide a clear separation of duties for data access.

AWS KMS

Image credit: AWS

Remove all Access and Secrets Keys from IAM

Controls & Requirements Met

ISO 27001:2022

  • A.5.18 - Access rights (IAC-20,IAC-21)

  • A.5.15 - Access control (IAC-21)

SOC2

CC6.1 - The entity implements logical access security software, infrastructure, and architectures overprotected information assets to protect them from security events to meet the entity's objectives. (IAC-20, IAC-21)

CC6.2 - Prior to issuing system credentials and granting system access, the entity registers and authorizes new internal and external users whose access is administered by the entity. For those users whose access is administered by the entity, user system credentials are removed when user access is no longer authorized. (IAC-21.3)

Motivation

IAM Users Secret and Access Keys can be easily compromised, since they are available in engineers local environments and applications.

A compromised key could doom the cloud environment, since the attacker would have access to the Cloud Services and could exploit them to their advantage, also increasing the billing.

Approach

Here it's useful to differentiate the type of users orgs have in AWS since they approach varies for each of them.

In the case of users for developers/engineers, as to have access to the AWS CLI or API they should be using either AWS Identity Center (Previously SSO) or the new aws login command to verify access in the browser.

AWS Identity Center is the go-to for managing users when using AWS Organizations. In this service companies are able to create users and their permission sets for the different accounts they hold. It also allows for integration with an existing SSO SAML provider (Google Workspace, Okta, for example) and corporate LDAPs (Keycloak for example).

Once the SSO provider is configured using aws configure sso, developers are able to login calling aws sso in their shell. However, recently AWS allowed a simpler option by just calling aws login.

For system users to be used by backend applications, Kubernetes controllers and the likes Approaches are different.

One option is to have instance profiles with the policies needed for the underlying applications. However, this has downsides in big environments, since one server is hosting many applications some of them would have unwanted access to the permissions. Also, this creates an evident attack vector, if the server is compromised the attacker gains access to the Cloud environment.

The best option with the smallest attack surface is using role assumptions thorough OIDC. In the case of Kubernetes the followings would need to be configured:

  • An IAM OIDC Identity Provider
  • Configuration of the OIDC provider in the K8s cluster
  • IAM Role with Trust Policy trusting the OIDC provider
  • Service Account referring the role
  • Service Account attachment on the Pod with the app that requires permissions

The flow would be the following:

1- The Kubernetes service account injects a signed JWT token to the pod
2- The token is used by the app inside the pod to request AWS access
3- AWS STS validates the token by contacting the OIDC provider to retrieve public keys and verify the token authenticity
4- After successful validation, STS issues temporary AWS credentials for the pod's token

Sanjal S Eralil has made a nice tutorial on Medium to implement this in EKS.

OIDC Provider Architecture
Image credit: Sanjal S Eralil

Remove SSH access from instances

Controls & Requirements Met

ISO 27001:2022

  • A.8.20 - Network security (NET-04, NET-04.1)

  • A.5.14 - Information transfer (NET-04.1)

SOC2

  • CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (NET-04,NET-04.1)

Motivation

Even if ssh is a safe protocol in theory, enabling ssh access creates on servers allows for a bigger attack surface due to brute force attacks, key sprawl attacks and others.

Rather than handling the complexity of SSH configuration and key rotation to avoid attacks, modern cloud native technologies allow connecting to servers without using ssh at all.

Approach

The AWS native way to disable ssh is connecting to VMs through Session Manager.

Session Manager enables the capability for managing EC2 instances through an interactive browser-based shell or through the AWS CLI. After the session is started, interactive commands can be run the instance as they would in an SSH session. An SSM agent needs to be installed in the AMI, either in the build time or via user data, to enable Session Manager access, more info on implementation can be found in AWS docs.

If using self-managed Kubernetes, Talos Linux is the best distribution to get rid of ssh access entirely. With its GPRC API engineers can manage nodes without having to log in into the instance, albeit this brings a steep learning curve to learn how to operate with the API.

Some Cloud Native tools have appeared to replace both SSH access and VPNs, among them it is worth mentioning Netbird and Teleport.

Netbird is a tool built on top of Wireguard that works as a mesh network to connect servers and cloud environments. It allows for easily connecting to servers through the network without the need of enabling SSH.

Teleport meets a similar function but it does so through an identity aware reverse proxy. It also records and audits all sessions and logs every query executed.

Both of them have open source versions that can be self-hosted and Cloud versions with enterprise support.

AWS System Manager Session Manager
Image credit: AWS

Restrict Security Groups Inbounds

Controls & Requirements Met

ISO 27001:2022

  • A.8.20 - Network security (NET-04, NET-04.1)

  • A.5.14 - Information transfer (NET-04.1)

SOC2

  • CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (NET-04,NET-04.1)

Motivation

As to minimize attack surface, preventing automated scanners and attack vector lateral movement; the least privilege principle has to be applied into Security Groups inbound ports.

This means that ports can't be opened to all traffic nor unreasonably big IP CIDR, but instead should have the minimum required open ports to work.

Approach

This is a situation where the SRE or Incident team should be involved. It is quite unpredictable what could happen in the system if traffic is limited. Some APIs could suffer from failures, some applications might not have the required network access and Kubernetes and Database clusters could cease to work.

To monitor which ports are being accessed, VPC flow logs and probes using tcpdump and Wireshark can be used to have visibility on the ports that should remain open. This can help minimize the potential of suffering an outage.

Creating systems with minimal network access can be specially challenging if Kubernetes is being used. To successfully build a cluster with minimal access a good reference are Kubernetes port requirements and the port requirements for the CNI that is being used, for example Cilium, Calico. If the cluster has a service mesh, for example Istio, its port requirements should also be taken into account.

Kubernetes Required Port Specs
Image credit: Bibin Wilson

Restrict network access with NACLS in VPCs

Controls & Requirements Met

ISO 27001:2022

  • A.8.20 - Network security (NET-04, NET-04.1)

  • A.5.14 - Information transfer (NET-04.1)

SOC2

  • CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (NET-04,NET-04.1)

Motivation

The least privilege principle for networks is not exclusive to Security Groups, Network ACLs for subnets should also be limited as to avoid unwanted traffic entering the VPC.

Approach

This change is even more sensitive and unpredictable than the Security Groups inbounds. Therefore, the entire organization should be aware of the high risk of the change and all SRE teams alerted.

To minimize the risk VPC flow logs can be used to detect outbound and inbound traffic from the VPCs.

Enterprise grades tools like Palo Alto egress NAT and Kong can also help int his endeavor, or more modest egress gateways setup such as setting up a gateway with Istio or Squid to be used for forward proxy to the internet resources and keep control of the external resources accessed.

Take in account that, unlike Security Groups, NACLs are stateless. This means that both inbound and outbound connections should be allowed for a port for the handshake to be successful. Also, ephemeral ports (ranging from 49152 to 65535) should be allowed so your instances and services are able to reach the internet (unless it is an AirGapped environment).

NACLs vs Security Groups
Image credit: Saurabh Batray

Object Locks and Protect Key Deletions

Controls & Requirements Met

ISO 27001:2022

  • A.5.15 - Access control (IAC-21)

SOC2

CC6.1 - The entity implements logical access security software, infrastructure, and architectures overprotected information assets to protect them from security events to meet the entity's objectives. (IAC-06)

CC6.6 - The entity implements logical access security measures to protect against threats from sources outside its system boundaries. (IAC-06.1)

Motivation

As to avoid unwanted deletion of important data and encryption keys, it is important to have guardrails in place, so the assets are protected. Keys resources where this should be done are EBS Snapshots, S3 buckets and KMS encryption keys.

Approach

To enforce compliance on important sensible S3 data that should be immutable, it is possible to set object lock with compliance mode with the bucket. However, it should be taken into account that the buckets nor its data will be able to be deleted, so this shouldn't be applied on data that is meant to be temporary such as development or test assets nor static websites.

The same applies for EBS snapshots. If the snapshots are not meant to be temporary they should have snapshot locks in compliance mode in order to meet the standard. Also take in account snapshots for critical data that should be used in disaster recovery are typically mandatory for compliance, unless and alternate method is used (i.e Velero File System Backups).

KMS keys should also be protected, since its deletion could render encrypted data unrecoverable. For this delete protection should be enabled up to a month, which doesn't affect the billing of the keys unless they are reactivated.

S3 Object Lock
Image credit: Lisa Halbert

Remove any Policy or Role with Infinite Permission

Controls & Requirements Met

ISO 27001:2022

  • A.5.18 - Access rights (IAC-20,IAC-21)

  • A.5.15 - Access control (IAC-21)

SOC2

CC6.1 - The entity implements logical access security software, infrastructure, and architectures overprotected information assets to protect them from security events to meet the entity's objectives. (IAC-20, IAC-21)

Motivation

Roles and policies should not have unbounded permissions in case they are compromised. For this every policy should define fine-grained permissions of the services and resources that will be used.

Approach

One of the most common ways this is not complied is by using CDKs bootstrap defaults, creating a role with administrator access. Instead, a specific pre-defined IAM policy that only contains the permissions the stack actually needs should be created and passed on bootstrapping with the --cloudformation-execution-policies flag.

Additionally, while using other kinds of infrastructure automation such as Terraform or Crossplane the role assumed should limit the permissions to only the services that will be used.

If using AWS Organizations, SCPs guardrails and permission boundaries can also be useful to limit the permissions used within all accounts, as to avoid surprises with unbound policies. Andrei Stefanie from Cyscale wrote a great implementation blog explaining how to use them.

IAM Least Pivilege
Image credit: Andrei Stefanie

Backup all sensitive data with AWS Backups

Controls & Requirements Met

ISO 27001:2022

  • A.8.13 - Information backup (BCP-01, BCP-02)

  • A.5.29 - Information security during disruption (BCP-03)

  • A.5.30 - ICT readiness for business continuity (BCP-03, BCP-04)

SOC2

A1.2 - The entity optimizes capacity, availability, and performance – The entity maintains, monitors, and evaluates current processing capacity and utilization of software, data back-up and recovery infrastructure, and the communication software.

CC7.5 - The entity identifies, selects, and develops risk mitigation activities – The entity identifies and selects risk mitigation activities that help minimize the impact of identified security events and restore the entity’s ability to meet its objectives.

CC9.1 - The entity identifies, selects, and develops control activities – The entity assesses and manages risks associated with the custody and accounting for assets, including data restoration and recovery procedures.

Motivation

In case of data loss due to any unexpected incident, the company should have backups in place to be able to recover the lost data and resume normal operations.

Approach

AWS backups is a relatively easy to configure and economic services to perform backups when the account is of a modest size.

The steps to be done are:

1- Set up AWS Backup

2- Create a Backup Plan

3- Assign Resources to the Backup Plan

4- Setup S3 Cross-Region Replication

5- Create Restore procedure

Nawazdhandala wrote an excellent blog explaining this procedure in detail.

If using a Kubernetes cloud-agnostic setup, using Velero is a great alternative to backup clusters and its volumes.

Conclusions

The methods approached in this blog will help make the AWS account compliant. However, also remember that other controls need to be enforced out of the scope of this article to be fully compliant, such as application and infrastructure alerting and monitoring, static code analysis, application security as some other organizational controls needed to be applied.

The methods aforementioned requires a month of work for an empty account, but it will require far more time to implement the bigger the amount of resources deployed within the account. Therefore, the sooner the AWS account becomes compliant, the fewer issues the organization will face in the future to successfully pass an audit.

If you want to read more about cloud security and compliance check us out at thevenin.io

Disclaimer: No AI tool has been used to write any of the contents of this blog. AI has only been used for formatting and research purposes, to find references to sustain the claims exposed in this article.

References & Resources

Tools & Vendors

Articles, Reports & Tutorials

Official Documentation

Top comments (0)