Mukami

Posted on Mar 22

# Managing Terraform State: Best Practices for DevOps ## How to Stop Fighting State Files and Start Collaborating

#beginners #terraform #tutorial #tfstate

Day 6 of the 30-Day Terraform Challenge — and today I learned something that every DevOps engineer eventually discovers the hard way: Terraform state is like your infrastructure's diary, and if you don't protect it, your team will pay the price.

Remember when I thought storing terraform.tfstate locally was fine? That was Day 1 me. Naive. Innocent. About to learn a valuable lesson.

Part 1: What's Actually in That State File?

Before today, I treated terraform.tfstate like a mysterious black box. "It's there, it works, don't touch it." But today, I opened it. And what I found surprised me.

Here's what a state file actually contains:

{
  "version": 4,
  "resources": [
    {
      "type": "aws_s3_bucket",
      "name": "demo",
      "instances": [{
        "attributes": {
          "arn": "arn:aws:s3:::my-demo-bucket",
          "bucket": "my-demo-bucket",
          "region": "eu-north-1",
          "tags": {
            "Environment": "Learning"
          }
        }
      }]
    }
  ]
}

What I found inside:

Resource ARNs — the unique identifiers AWS assigns
IP addresses — public and private IPs of every instance
Tags — all the metadata I thought was just for organization
Dependencies — Terraform knows which resources depend on which
Sensitive data — secrets, keys, and credentials (in plaintext!)

The scary part: If I committed this to Git (which I was doing before Day 6), anyone with access to my repo would have seen everything about my infrastructure. Every IP. Every ARN. Every. Single. Detail.

Part 2: The Bootstrap Problem — Terraform's Chicken-and-Egg

Here's a fun paradox: You need Terraform to create the infrastructure that stores Terraform's state. But you need state to run Terraform.

The Bootstrap Problem: How do you create the S3 bucket and DynamoDB table for remote state without already having remote state?

The Solution: Create them manually (or with a separate, simpler Terraform configuration) first.

I created a bootstrap configuration that deployed just the S3 bucket and DynamoDB table with local state. Once those were up, I could reconfigure my main infrastructure to use them as a remote backend.

It's like building a ladder to build a house. You need something to stand on while you construct the real thing.

Part 3: Remote State — Your Infrastructure's Safe Haven

Before (Local State):

State lived on my laptop
Team members couldn't see it
Concurrent runs = corruption
If my laptop died, state died
Secrets in plaintext on my hard drive

After (Remote State with S3 + DynamoDB):

State lives in S3 (versioned, encrypted)
Team members share the same state
Locking prevents concurrent runs
Versioning means I can recover from mistakes
Encryption keeps secrets safe

My Remote Backend Configuration:

terraform {
  backend "s3" {
    bucket         = "my-team-terraform-state"
    key            = "terraform.tfstate"
    region         = "eu-north-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}

What each argument does:
| Argument | Purpose |
|----------|---------|
| bucket | Where the state file lives (S3) |
| key | The path/filename in the bucket |
| region | Where the bucket lives |
| dynamodb_table | Locking mechanism (critical for teams) |
| encrypt | Server-side encryption at rest |

Part 4: State Locking — The Team Player's Best Friend 🔒

Remember the school project where two people edited the same Google Doc at the same time? Chaos, right?

State locking prevents that exact scenario.

I tested it with two terminals:

Terminal 1:

$ terraform apply
# Running...

Terminal 2 (while apply was running):

$ terraform plan

╷
│ Error: Error acquiring the state lock
│ 
│ Lock Info:
│   ID:        abc123-def456-ghi789
│   Path:      my-bucket/terraform.tfstate
│   Operation: OperationTypeApply
│   Who:       user@computer
│   Created:   2026-03-22 10:30:45 UTC
│ 
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time.
╵

What this means:

Terraform knows when someone else is already working
It refuses to run until the lock is released
No two people can corrupt the state simultaneously

In a team environment, this is non-negotiable. Without locking, you're one simultaneous terraform apply away from infrastructure chaos.

Part 5: Why State Files Should NEVER Go in Git 🚫

I used to commit terraform.tfstate to Git. I was wrong. Here's why:

Problem	Explanation
Secrets in Plaintext	State files contain passwords, access keys, and database credentials in plaintext
Merge Conflicts	Two engineers committing state = unresolvable conflicts
No Locking	Git doesn't prevent concurrent writes
Large Files	State files grow huge and bloat the repository
Audit Issues	Git history doesn't reflect actual infrastructure changes

The correct approach:

Store state in S3 (encrypted, versioned)
Use DynamoDB for locking
Commit only your code to Git
Let the state live safely in the cloud

Part 6: The Migration Experience 🚀

When I added the backend configuration and ran terraform init, something magical happened:

Initializing the backend...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "local" backend to the
  newly configured "s3" backend. Enter "yes" to copy and "no" to start empty.

  Enter a value: yes

Successfully configured the backend "s3"!

Terraform detected my local state and offered to migrate it to S3. I said yes, and seconds later, my state file was safely in the cloud, encrypted, versioned, and ready for team collaboration.

No manual copying. No complex scripts. Just Terraform being Terraform.

Part 7: What I Learned About State That Changed Everything 💡

State is the source of truth — Not your code, not the AWS console. The state file is what Terraform believes exists.
Drift detection is automatic — If someone manually changes infrastructure, terraform plan will show you exactly what's different.
S3 versioning is your safety net — Accidentally corrupted your state? Roll back to the previous version. It's like git revert for your infrastructure.
Encryption isn't optional — State files contain secrets. encrypt = true should be the default, always.
The bootstrap problem is solvable — Create your backend infrastructure first (manually or with a separate config), then migrate.

Best Practices Checklist

Practice	Why It Matters
Store state remotely (S3)	Team access, disaster recovery
Enable versioning	Roll back from mistakes
Enable encryption	Protect secrets
Use DynamoDB for locking	Prevent concurrent corruption
Never commit state to Git	Avoid secrets exposure and merge conflicts
Protect the bucket with `prevent_destroy`	Accidental bucket deletion = lost state
Block public access	State should never be publicly readable

The Bottom Line

Day 6 taught me that Terraform isn't just about writing code — it's about managing the state that code creates.

Local state is fine for learning. But the moment you work with a team (or even just a second laptop), you need remote state with locking.

I started today thinking "state is just a file." I'm ending today with a full S3 + DynamoDB backend that:

✅ Stores state securely
✅ Prevents concurrent corruption
✅ Encrypts everything
✅ Keeps version history
✅ Never touches Git

If you're still storing state locally in a team environment, you are one concurrent run away from disaster. Fix it today.

P.S. If you're wondering why I didn't just use terraform apply for the bootstrap — that's the bootstrap problem! You can't use Terraform to create the infrastructure that Terraform itself needs. I had to create the bucket and table first (manually), then migrate. Mind-bending, but it works. 🧠

DEV Community