Udoh Deborah

Posted on Mar 31

Managing Terraform State: Best Practices for DevOps

#aws #devops #terraform #tutorial

Introduction

If Day 5 was about building scaled infrastructure, Day 6 was about understanding what keeps it all together terraform state.

Today I migrated from local state to a fully remote S3 backend with state locking, and the difference between the two is not a small thing. It is the difference between infrastructure you can trust and infrastructure that is one concurrent run away from disaster.

What is Terraform State?

Every time you run terraform apply, Terraform writes a file called terraform.tfstate. This JSON file is Terraform's complete record of everything it manages — every resource, every attribute, every dependency. It is not a log. It is the source of truth.

When you run terraform plan, Terraform does three things:

Reads your configuration code
Reads the state file
Queries real AWS infrastructure

It then calculates the difference between what your code says should exist and what actually exists. Without state, none of this is possible. Terraform would have no way to know what it already created.

What the state file actually stores

After applying my Day 6 infrastructure, I ran terraform state show aws_lb.web and was surprised by how much detail was recorded. Every attribute AWS returns for the load balancer is stored — not just the ones I configured. Fields like desync_mitigation_mode, idle_timeout, preserve_host_header, and xff_header_processing were all there, even though I never set them in my config.

Running terraform state list showed every resource Terraform was tracking:

data.aws_ami.amazon_linux
data.aws_subnets.default
data.aws_vpc.default
aws_autoscaling_group.web
aws_launch_template.web
aws_lb.web
aws_lb_listener.http
aws_lb_target_group.web
aws_security_group.alb
aws_security_group.instance

Why Local State Breaks Down

Local state works fine when you are the only person touching the infrastructure. The moment a second person gets involved, everything breaks:

Concurrent runs — Two engineers run terraform apply at the same time. Both read the same local state, make different changes, and write back conflicting versions. State is now corrupted.

Lost state — An engineer runs apply on their laptop and the laptop dies. The state file is gone. Terraform no longer knows what it created.

No locking — Local state has no locking mechanism. Nothing stops two operations from running simultaneously.

Secrets in plaintext — The state file stores sensitive values like passwords and access keys in plaintext JSON. Committing it to Git exposes those secrets to everyone with repo access — and to anyone who ever had access, since Git history is permanent.

The Solution: Remote State with S3 and DynamoDB

The fix is to store state remotely in AWS S3, with DynamoDB handling state locking. Every engineer and every CI/CD pipeline reads and writes to the same state file, and only one operation can hold the lock at a time.

The Bootstrap Problem

Here is the challenge: you cannot use Terraform to create the S3 bucket that Terraform itself needs as a backend. The bucket has to exist before terraform init can use it.

The solution is to split the setup into two separate configurations. First, a backend-setup folder creates the S3 bucket and DynamoDB table using local state. Once those exist, the main configuration can use them as its backend.

Creating the S3 Bucket and DynamoDB Table

resource "aws_s3_bucket" "terraform_state" {
  bucket = "terraform-state-585706661633"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "enabled" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "default" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "public_access" {
  bucket                  = aws_s3_bucket.terraform_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Key decisions here:

prevent_destroy = true — stops anyone from accidentally deleting the state bucket with terraform destroy
Versioning enabled — every version of the state file is kept, so you can roll back if something goes wrong
Server-side encryption — state is encrypted at rest using AES256
Public access blocked — the state bucket is never accessible from the internet

Configuring the Backend

terraform {
  backend "s3" {
    bucket       = "terraform-state-585706661633"
    key          = "day6/terraform.tfstate"
    region       = "us-east-1"
    use_lockfile = true
    encrypt      = true
  }
}

Every argument matters here. bucket is where state lives. key is the path inside the bucket — using a path like day6/terraform.tfstate means multiple projects can share one bucket without overwriting each other. use_lockfile enables S3-native state locking. encrypt ensures state is encrypted in transit and at rest.

Note: The older dynamodb_table parameter is now deprecated in Terraform v5. Use use_lockfile = true instead — it achieves the same locking behaviour using S3 natively.

Proof It Worked

After running terraform apply, the infrastructure came up successfully with state stored remotely in S3.

*Terminal output showing successful apply with state lock releasing: *

[! Image description]

The terminal shows all 7 resources created and "Releasing state lock" confirming the lock was acquired and released correctly.

ALB response in browser confirming Day 6 infrastructure is live:

[]

The page explicitly confirms state is stored in the S3 remote backend.

Checking the S3 bucket confirmed the state file was there:

aws s3 ls s3://terraform-state-585706661633/day6/
2026-03-31 08:29:18      28315 terraform.tfstate

28KB, versioned, encrypted, and safely stored in S3.

Testing State Locking

To prove locking works, I opened two terminals pointing at the same configuration. Terminal 1 ran terraform apply. Immediately, Terminal 2 ran terraform plan.

Terminal 2 was blocked with a lock error:

Error: Error acquiring the state lock

Error message: ConditionalRequestFailed: The conditional request failed
Lock Info:
  Path:      terraform-state-585706661633/day6/terraform.tfstate.tflock
  Operation: OperationTypeApply

This is exactly the behaviour you want in a team environment. No two operations can run simultaneously. The second one waits or fails until the first releases the lock.

Errors I Hit and How I Fixed Them

S3 bucket does not exist on terraform init — I ran terraform init in the root folder before the S3 bucket existed. Fix: run the backend-setup config first to create the bucket, then init the main config.

Deprecated dynamodb_table parameter — Terraform v5 replaced this with use_lockfile = true. Updated the backend block accordingly.

Stuck state lock after DNS failure — A DNS drop mid-apply left a .tflock file in S3. Fix: aws s3 rm s3://terraform-state-585706661633/day6/terraform.tfstate.tflock

Intermittent DNS failures — An unstable internet connection caused repeated no such host errors. Fix: wait for connection to stabilise and retry — the infrastructure and state were always fine, just the network dropping temporarily.

Key Takeaways

Terraform state is the source of truth — treat it with the same care as your database
Never commit terraform.tfstate to Git — use remote state from day one
The bootstrap problem is real — always create your backend infrastructure in a separate config
State locking is not optional in a team environment — it is what prevents catastrophic corruption
S3 versioning is your safety net — it lets you recover from a bad apply by rolling back to a previous state version

DEV Community