Kachi

Posted on Mar 9

Terraform for Security Engineers

#aws #security #terraform #devops

You write the Terraform code. The plan looks clean. You run apply, everything provisions successfully, and you move on. Three weeks later someone flags an S3 bucket with public read access sitting quietly in your account. The Terraform code was perfect. The security was not.

This is the problem with learning Terraform from deployment documentation. It teaches you how to provision infrastructure, not how to provision it safely. As a security engineer working with AWS, I started asking different questions when I read Terraform blocks. Not just "will this deploy" but "what does this expose, what does this trust, and what happens if this state file ends up in the wrong hands."

This article is what I wish someone had handed me earlier.

As A Security Engineer You Need To Think Differently About Terraform

Terraform helps to ship infrastructure faster. That is a valid goal. But speed without security awareness is just automating your attack surface at scale.

The traditional security review happens after infrastructure is built. A ticket gets raised, someone does a manual audit, findings come back, and by that point the team has already built three more environments on top of the same misconfigured foundation. Terraform breaks that cycle if you let it. Because with IaC, the infrastructure exists as code before it exists in reality. That means security review can happen at the code stage, in a pull request, before a single resource is created.

That shift, from reactive to preventive, is why Terraform is one of the most powerful tools in a security engineer's hands. But only if you know what to look for.

The questions I now ask when reviewing any Terraform block are simple: What identity does this resource assume? What can it access? What does it expose to the internet? And where are the secrets?

If I cannot answer all four from reading the code alone, the code is not done yet.

AWS Service Security Considerations in Terraform

This is where most Terraform tutorials stop at "here is how to create the resource" and move on. As a security engineer, creating the resource is only half the job. Here is what I look for in the most commonly provisioned AWS services.

IAM Roles and Policies

IAM is the front door to everything in AWS. Get this wrong and everything else you secure becomes irrelevant because an attacker with the right permissions does not need to break anything. They just walk in.

The most common mistake I see in Terraform IAM blocks is the wildcard. Action star, Resource star. It deploys cleanly, the service works, and nobody questions it. But what you have just done is handed that resource a master key to your entire AWS account. If that Lambda function, EC2 instance, or ECS task is ever compromised, the attacker inherits every permission you gave it. With a wildcard that means they can read your secrets, exfiltrate your data, create new users, and cover their tracks, all using legitimate AWS API calls that look like normal activity in your logs.

# What gets written when speed matters more than security
resource "aws_iam_role_policy" "bad_example" {
  role = aws_iam_role.example.id
  policy = jsonencode({
    Statement = [{
      Effect   = "Allow"
      Action   = "*"
      Resource = "*"
    }]
  })
}

This policy says: this identity can do anything to anything in AWS. There is no legitimate production use case that requires this. When you see this in a codebase it means someone prioritized getting it working over getting it right.

The principle of least privilege means every identity gets exactly the permissions it needs for its specific job and nothing beyond that. A Lambda function that reads from one S3 bucket should only be able to read from that one S3 bucket. Not write. Not delete. Not access any other bucket. Not touch IAM or EC2 or anything else.

# Scoped to exactly what this function needs
resource "aws_iam_role_policy" "good_example" {
  role = aws_iam_role.example.id
  policy = jsonencode({
    Statement = [{
      Effect   = "Allow"
      Action   = [
        "s3:GetObject",   # Read only
        "s3:ListBucket"   # List contents of this bucket only
      ]
      Resource = [
        "arn:aws:s3:::my-specific-bucket",
        "arn:aws:s3:::my-specific-bucket/*"
      ]
    }]
  })
}

The difference between these two policies is the difference between a compromised function that reads one bucket and a compromised function that owns your entire AWS account. The attacker's reach is directly proportional to the permissions you granted. Narrow the permissions and you narrow the blast radius.

When writing IAM in Terraform, ask yourself: if this resource were compromised right now, what could an attacker do with these permissions? If the answer makes you uncomfortable, the policy needs to be tighter.

S3 Buckets

S3 misconfigurations have been behind some of the most public and damaging cloud breaches in history. Millions of records exposed. Not because of sophisticated attacks. Because someone created a bucket and left the door open.

Terraform makes it trivially easy to create an S3 bucket. It does not stop you from making it public and it does not enforce encryption by default. That responsibility sits entirely with the engineer writing the code.

There are two non-negotiable blocks that must accompany every S3 bucket you provision.

The first is public access blocking. AWS provides four settings that together form a complete shield against public exposure. Blocking public ACLs prevents anyone from granting public access through object ACLs. Blocking public policies prevents bucket policies that allow public access. Ignoring public ACLs means even if a public ACL somehow exists it is ignored. Restricting public buckets means no public access is possible regardless of any other setting.

resource "aws_s3_bucket_public_access_block" "example" {
  bucket                  = aws_s3_bucket.example.id
  block_public_acls       = true   # Reject any request that grants public access via ACL
  block_public_policy     = true   # Reject bucket policies that allow public access
  ignore_public_acls      = true   # Ignore public ACLs even if they exist
  restrict_public_buckets = true   # Deny all public access regardless of other settings
}

All four must be true. Setting three out of four is not good enough. Each setting closes a different attack vector and an attacker only needs one open door.

The second non-negotiable is encryption at rest. Data sitting in an unencrypted S3 bucket is readable by anyone who gains access to it. Encryption ensures that even if someone gets to the data they cannot read it without the key.

resource "aws_s3_bucket_server_side_encryption_configuration" "example" {
  bucket = aws_s3_bucket.example.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
    bucket_key_enabled = true
  }
}

Using aws:kms instead of AES256 matters because KMS gives you control over the encryption key. You can rotate it, restrict who can use it, audit every time it is used, and revoke access instantly if needed. With AES256 you have encryption but no control over the key. In a security incident that distinction is critical.

Security Groups

Security groups are your network perimeter inside AWS. Every rule you add is a decision about who gets access and from where.

The most dangerous rule in any security group is port 22 or port 3389 open to 0.0.0.0/0. Port 22 is SSH. Port 3389 is RDP. These are direct remote access protocols. Opening them to the entire internet means every automated scanner, every botnet, and every attacker probing AWS IP ranges can attempt to authenticate to your instance.

# This exposes your instance to the entire internet
ingress {
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]  # Every IP address on earth
}

The fix is not just narrowing the CIDR. It is questioning whether SSH needs to be open at all. AWS Systems Manager Session Manager gives you shell access to EC2 instances without any open inbound ports. If you are using that, port 22 should not exist in your security group at all.

If SSH must be open, restrict it to a specific known IP range and nothing broader.

ingress {
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["10.0.0.0/8"]
  description = "SSH from internal network only"
}

Always add a description to every security group rule. When someone reviews this code six months later the description is the difference between understanding why the rule exists and being afraid to delete it in case something breaks.

Apply the same thinking to database ports. Port 5432 for PostgreSQL should never be open to 0.0.0.0/0. It should be restricted to the specific security group of the application that needs database access and nothing else.

ingress {
  from_port       = 5432
  to_port         = 5432
  protocol        = "tcp"
  security_groups = [aws_security_group.app.id]
  description     = "PostgreSQL access from application tier only"
}

This means even if an attacker gets into your VPC they cannot reach your database directly. They have to compromise the application layer first which gives you another layer of detection opportunity.

CloudTrail

CloudTrail is your audit log for everything that happens in your AWS account. Without it you are operating blind. With a misconfigured one you only think you can see.

The default CloudTrail configuration captures management events in a single region. That sounds sufficient until an attacker creates resources in eu-west-2 while you are watching us-east-1. Or until they use a global service like IAM and your trail misses it because global service events are disabled.

resource "aws_cloudtrail" "main" {
  name                          = "main-trail"
  s3_bucket_name                = aws_s3_bucket.cloudtrail.id
  include_global_service_events = true   # Captures IAM, STS, and other global services
  is_multi_region_trail         = true   # Captures activity in every region not just one
  enable_log_file_validation    = true   # Detects if log files are tampered with

  event_selector {
    read_write_type           = "All"
    include_management_events = true

    data_resource {
      type   = "AWS::S3::Object"
      values = ["arn:aws:s3:::"]
    }
  }
}

Multi-region means an attacker cannot hide activity by operating in a region you are not watching. Global service events means IAM changes are captured. Log file validation means if someone tampers with your logs after the fact you will know because the validation hash will not match.

Also protect the CloudTrail bucket itself. An attacker who can delete your logs can erase evidence of everything they did.

resource "aws_s3_bucket_policy" "cloudtrail" {
  bucket = aws_s3_bucket.cloudtrail.id
  policy = jsonencode({
    Statement = [
      {
        Effect    = "Deny"
        Principal = "*"
        Action    = ["s3:DeleteObject", "s3:DeleteBucket"]
        Resource  = [
          "${aws_s3_bucket.cloudtrail.arn}",
          "${aws_s3_bucket.cloudtrail.arn}/*"
        ]
      }
    ]
  })
}

Evidence preservation is not an afterthought. It is part of your security architecture.

Terraform State - The Hidden Security Risk

Let me ask you something. Where is your Terraform state file right now?

Your Terraform state file is one of the most sensitive files in your entire infrastructure. It contains the current state of every resource Terraform manages. Resource IDs, ARNs, IP addresses, database connection strings, and in many cases plaintext secrets. Everything Terraform needs to know about your infrastructure is in that file. Which means everything an attacker needs to understand, map, and move through your infrastructure is also in that file.

Most tutorials show you how to write Terraform. Very few tell you that the state file it produces needs to be treated like a secret itself.

The Local State Problem

By default Terraform stores state locally in a terraform.tfstate file. This is fine for learning. It is a serious security problem in any real environment because it means your infrastructure map lives on whoever's laptop last ran terraform apply. If that laptop is lost, stolen, or compromised, the attacker has a complete blueprint of your AWS environment.

Remote State on S3 - The Right Way

The solution is remote state stored in S3 with encryption, versioning, and strict access controls.

terraform {
  backend "s3" {
    bucket         = "your-terraform-state-bucket"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:..."
    dynamodb_table = "terraform-state-lock"
  }
}

Encrypt true means the state file is encrypted at rest using KMS. Without this anyone with S3 bucket access reads your entire infrastructure in plaintext.

The KMS key gives you control over who can decrypt the state. You can restrict KMS key usage to specific IAM roles meaning only your CI/CD pipeline and specific engineers can read or write state.

The DynamoDB lock table prevents two engineers or two pipeline runs from applying changes simultaneously. Without this two concurrent applies can corrupt your state file and recovering from state corruption in production is one of the most stressful experiences in infrastructure engineering.

# Enable versioning so you can recover previous state
resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

# Deny unencrypted uploads
resource "aws_s3_bucket_policy" "state" {
  bucket = aws_s3_bucket.terraform_state.id
  policy = jsonencode({
    Statement = [
      {
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:PutObject"
        Resource  = "${aws_s3_bucket.terraform_state.arn}/*"
        Condition = {
          StringNotEquals = {
            "s3:x-amz-server-side-encryption" = "aws:kms"
          }
        }
      }
    ]
  })
}

Secrets in State

Even if you use AWS Secrets Manager to manage your secrets, if Terraform creates a resource that has a secret as an attribute, that secret ends up in the state file in plaintext.

A database password passed as a variable, an API key set on a Lambda environment variable, a certificate private key. These all appear in your state file regardless of how carefully you managed them during provisioning.

This is not a bug. It is how Terraform works. Your job as a security engineer is to know this and design around it. Remote encrypted state with strict KMS access controls is your primary defence. The question to ask about any sensitive value in your Terraform code is: am I comfortable with this appearing in the state file and who has access to read it?

Where Terraform Actually Makes You More Secure

As security engineers we can be quick to point out what tools do wrong. But Terraform has genuine strengths that, when used intentionally, make your security posture significantly stronger than manual provisioning ever could.

Your Security Baseline Lives in Version Control

When infrastructure is provisioned manually through the console, the security configuration exists only in the current state of the resource. Nobody knows who changed what, when, or why. A security group rule gets added during an incident and never removed.

With Terraform every infrastructure decision is a line of code committed to a repository with a timestamp, an author, and a commit message. Your security baseline is auditable. In a security incident that audit trail is invaluable.

Security Review Happens Before Deployment

Because infrastructure is defined as code before it is created, security review can happen at the pull request stage. A security engineer can review a Terraform PR the same way they review application code, before anything exists in the real world. Earlier review means cheaper fixes.

Drift Detection Exposes Unauthorised Changes

If someone goes into the AWS console and manually changes a security group rule, adds an IAM policy, or modifies an S3 bucket setting, Terraform will detect that drift the next time you run terraform plan.

terraform plan -detailed-exitcode

Exit code 2 means drift was detected. From a security perspective this is a detection mechanism. Unauthorised changes to security controls show up as drift. Drift is not always malicious but it always needs to be investigated.

Consistent Security Controls Across Every Environment

With Terraform you define security controls once and apply them consistently across every environment. The same encryption settings, the same security group rules, the same IAM boundaries apply everywhere. Consistency is a security property.

Where Terraform Falls Short On Security

Terraform is a provisioning tool. It is exceptionally good at creating, updating, and destroying infrastructure. It is not a security tool and it does not pretend to be. So do not treat it as one. Understand that it has real limitations and design around them.

No Native Secrets Management

Terraform has no built-in mechanism for handling secrets safely. The common mistake is passing secrets as variables directly in code.

# Never do this
variable "db_password" {
  default = "MySecretPassword123"
}

Use AWS Secrets Manager or Parameter Store and reference it using a data source. The value never appears in your code.

data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "prod/database/password"
}

Terraform Does Not Know If What You Are Building Is Dangerous

This is the most important weakness to understand. Terraform validates syntax. It does not validate that what you are building is secure. A security group open to the world is valid Terraform. A bucket with public access is valid Terraform. An IAM policy with wildcard permissions is valid Terraform. The tool will plan it, apply it, and report success.

The security intelligence has to come from you or from additional tooling layered on top. Tools like tfsec and Checkov scan your Terraform code before apply and flag known security misconfigurations. Run them in your CI pipeline on every pull request.

Unvetted Public Modules

Community modules are written by individuals and organisations with varying security standards. A module that provisions an RDS instance might default to no encryption. When you use a module you inherit every default it sets.

Always pin modules to a specific version and always read the source before using any public module in production. Version pinning means a module author cannot push a malicious update that gets pulled into your next apply.

Provider Credentials Are a High Value Target

Long-lived AWS credentials that can provision and destroy infrastructure are one of the highest value targets in your environment. Never use long-lived access keys for Terraform in CI/CD. Use IAM roles with OIDC federation so your pipeline assumes a role dynamically with short-lived credentials that expire after each run.

Practical Recommendations For Security Engineers Working With Terraform

Reading about security risks is useful. Knowing exactly what to do about them is what straightens you as a security engineer. Here are tips to implement on every Terraform project.

Run Security Scanning Before Every Apply

Add tfsec and Checkov to your CI pipeline so every pull request is scanned automatically.

- name: Run tfsec
  uses: aquasecurity/tfsec-action@v1.0.0

- name: Run Checkov
  uses: bridgecrewio/checkov-action@master
  with:
    directory: .
    framework: terraform

If the scan fails the pipeline fails. Misconfigured infrastructure never reaches production.

Pin Every Module and Provider Version

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "= 5.31.0"
    }
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "= 5.4.0"
}

Every change to a version is a deliberate decision reviewed in a pull request.

Separate Plan and Apply Permissions

Developers can run terraform plan and see proposed changes. Only the CI/CD pipeline can run terraform apply and only after a pull request has been reviewed and approved. No human runs apply directly against production.

Enable Audit Logging on Your State Bucket

resource "aws_s3_bucket_logging" "state" {
  bucket        = aws_s3_bucket.terraform_state.id
  target_bucket = aws_s3_bucket.access_logs.id
  target_prefix = "terraform-state-access/"
}

Unusual state access is a potential indicator of reconnaissance activity.

Tag Every Resource for Security Visibility

locals {
  common_tags = {
    Environment = var.environment
    Owner       = var.team
    Project     = var.project
    ManagedBy   = "terraform"
  }
}

Resources without tags are invisible to security monitoring. A resource you cannot identify is a resource you cannot protect.

Terraform does not make your infrastructure secure or insecure. It is a force multiplier. In the hands of a security engineer who understands the risks it automates your security baseline across every environment consistently and auditability. In the hands of someone who does not it automates your vulnerabilities at the same scale.

The difference is asking the right questions before you run apply.

Written by Obidiegwu Onyedikachi Henry - Cloud Security Engineer
Portfolio | GitHub | LinkedIn

DEV Community