Farouq Mousa for AWS Community Builders

Posted on Nov 12

A Better Way to Write Production-Ready Terraform - Part 2 - Remote State Management

#terraform #aws #devops #cloud

Hey everyone, and welcome back! In Part 1 of this series, we tackled the first major challenge in writing professional IaC: modularity. We took a complex EKS cluster, broke it down into a reusable module, and learned how to use .tfvars and locals to create clean, declarative environment configurations.

Our dev environment's main.tf looked great. But we left off with a critical, unanswered question:

What happens when you actually run terraform apply?

If you followed along, you now have a terraform.tfstate file sitting in your environments/dev directory. This single file is the "source of truth" that maps your code to your real-world AWS resources.

And right now, it's a real Production-Killer!

The Danger of Local State Files

If you're working alone, on a single project, a local state file is fine. The second you add a teammate or a CI/CD pipeline, that local terraform.tfstate file becomes your biggest liability.

Here are the scenarios that keep me up at night:

The "Who Has the Latest?" Problem: You run apply, then your co-worker (who doesn't have your state file) also runs apply. They just created a second EKS cluster, or worse, their operation failed, thinking the first one didn't exist.
The "It's on My Laptop" Problem: You go on vacation. A production-down incident happens. The only copy of the production state file is on your encrypted laptop, which is 10,000 miles away. The team is completely blocked.
The "Race Condition" Problem: You and a colleague apply at the exact same time. You both read the same state file, and you both try to modify the same resource. This corrupts your state file, and now Terraform has no idea what's real and what's not.
The "Leaked Secrets" Problem: State files often contain sensitive data in plain text. If you accidentally git commit your state file, you've just pushed secrets to your repository.

The solution to all of this is Remote State.

Solution: Remote State with S3 and DynamoDB

A remote state backend moves the state file off your laptop and into a shared, centralized, and secured location. For our AWS stack, the standard pattern is a combination of two services:

Amazon S3: Used to store the state file itself.
Amazon DynamoDB: Used for state locking.

Wait, locking? What's that?

When you run terraform apply, Terraform will first place a "lock" in the DynamoDB table. If your co-worker tries to run apply at the same time, their command will fail, stating that the state is already locked by you. This simple mechanism completely prevents race conditions and state corruption.

How to Implement It

First, you need to create the S3 bucket and DynamoDB table. (You only do this once).

Pro Tip: Since these resources are the foundation for all your Terraform projects, I recommend creating them manually or with a simple, separate Terraform setup that you run and then "forget" about.

S3 Bucket: Create an S3 bucket. Let's call it my-awesome-app-tfstate. Enable bucket versioning (so you can roll back a bad state) and block all public access.
DynamoDB Table: Create a DynamoDB table. Let's call it terraform-state-lock. It only needs one attribute: a Partition key named LockID (with a type of String).

Configuring Your Environment

Now, in each of your environment directories (environments/dev, environments/prod), you add a new file. Let's call it backend.tf.

environments/dev/backend.tf

terraform {
  backend "s3" {
    bucket         = "my-awesome-app-tfstate"
    key            = "eks-cluster/dev/terraform.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

environments/prod/backend.tf

terraform {
  backend "s3" {
    bucket         = "my-awesome-app-tfstate"
    key            = "eks-cluster/prod/terraform.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Look closely at the key property. This is the magic.

We are storing both environment state files in the same bucket, but we're giving them unique paths (or "keys"). This provides perfect isolation. Your dev apply will read/write to the dev state file, and your prod apply will only touch the prod state file.

Now, when you cd environments/dev and run terraform init, Terraform will detect the backend block. It will ask if you want to copy your existing local state to the new S3 backend. Say "yes," and you're officially running on remote state!

This is a highly valuable update for your article series! The deprecation of DynamoDB-based locking in favor of the S3 native mechanism is a major shift that simplifies production workflows.

I will integrate this information into a new section in Part 2, aligning with your previous structure and tone. Note that the S3 native locking was generally available (GA) starting in Terraform v1.11.0, after being introduced as an experimental feature in v1.10.

Introducing S3 Native State Locking (Terraform v1.11+)

The core of our Part 2 solution—using a dedicated DynamoDB table for state locking—is the battle-tested, standard pattern. However, the world of Terraform is constantly evolving, and a major simplification has arrived that we must address: S3 Native State Locking.

While effective, relying on a DynamoDB table added cost and complexity. It forced us to manage an extra resource and grant additional IAM permissions for every environment, violating our goal of minimal overhead.

With Terraform v1.11.0 (and later), the S3 backend now includes a built-in locking mechanism that works without DynamoDB, leveraging S3's conditional write capabilities to ensure safety.

Why Switch from DynamoDB?

The motivation is simple: simplification and cost reduction.

Fewer Resources: You eliminate the need to provision and maintain a dedicated DynamoDB table.
Reduced Overhead: Less IAM policy management and fewer resources to monitor.
Lower Cost: Eliminates the small but constant cost associated with DynamoDB table usage.

While DynamoDB locking is robust, Terraform's long-term roadmap signals a shift towards this simplified, native locking model, with DynamoDB support slated for future deprecation.

How to Enable S3 Native Locking

The process is incredibly straightforward, requiring only the addition of the use_lockfile argument to your backend configuration.

Before: Using DynamoDB for Locking (The Classic Pattern)

terraform {  
  backend "s3" {  
    bucket         = "your-terraform-state-bucket"  
    key            = "path/to/your/statefile.tfstate"  
    region         = "us-east-1"  
    dynamodb_table = "terraform-state-lock"  # 👋 Goodbye, complexity!
    encrypt        = true  
  }  
}

After: Switching to S3 Native Locking

Ensure you are on Terraform v1.11.0 or newer. Simply remove the dynamodb_table line and add the lockfile flag:

terraform {  
  backend "s3" {  
    bucket       = "your-terraform-state-bucket"  
    key          = "path/to/your/statefile.tfstate"  
    region       = "us-east-1"  
    encrypt      = true  
    use_lockfile = true  # 🎉 S3 native locking enabled
  }  
}

With S3 locking enabled, Terraform creates a temporary .tflock file in the same location as the state file during any operation. You may need to update your S3 bucket policies and IAM permissions to accommodate the new lock file. You can also temporarily use both dynamodb_table and use_lockfile = true during your migration for maximum safety.

The New Problem: We're Not DRY

This is a huge improvement. Our state is secure, locked, and versioned. But as a senior engineer, something about this should bother you...

We're repeating ourselves.

That backend.tf block is identical in dev and prod, except for one line: the key. And what about our provider.tf? We're probably copying that into every environment too.

If we have 50 microservices, that's 50 (or 100, or 150) copies of the same backend.tf and provider.tf files. What happens when we need to update our provider version? We have to find and replace it in 150 places.

This is a violation of the DRY (Don't Repeat Yourself) principle.

The "Next Level" Solution: Terragrunt

This is where a tool like Terragrunt comes in. Terragrunt is a thin wrapper for Terraform that provides extra tools to manage multiple environments.

Its main superpower is keeping your environment configurations DRY.

With Terragrunt, your file structure changes. You get rid of backend.tf, provider.tf, etc., in your environment directories. Instead, you create a terragrunt.hcl file.

New Project Structure:

terraform-project/
├── modules/
│   └── aws-eks-cluster/
│       ├── main.tf
│       └── ...
├── environments/
│   ├── dev/
│   │   ├── terragrunt.hcl
│   │   └── dev.tfvars
│   ├── prod/
│   │   ├── terragrunt.hcl
│   │   └── prod.tfvars
│   └── terragrunt.hcl  # <--- A NEW ROOT FILE

1. The Root terragrunt.hcl
This file defines the configuration you want to share across all environments.

# environments/terragrunt.hcl

# Configure the remote state backend ONCE
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "override"
  }
  config = {
    bucket         = "my-awesome-app-tfstate"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

# Define the inputs we want to pass to our Terraform modules
inputs = {
  # We can automatically load .tfvars files
  # This finds dev.tfvars in dev, prod.tfvars in prod, etc.
}

That key = "${path_relative_to_include()}/terraform.tfstate" is the magic. Terragrunt will automatically generate a unique key for each environment based on its directory path (e.g., dev/terraform.tfstate).

2. The Environment terragrunt.hcl
Now, your environment-specific files become incredibly simple.

environments/dev/terragrunt.hcl

include "root" {
  path = find_in_parent_folders()
}

# Tell Terragrunt where our actual Terraform module is
terraform {
  source = "../../modules/aws-eks-cluster"
}

# All inputs are automatically loaded from dev.tfvars!

That's it. This file tells Terragrunt to:

Go find the root terragrunt.hcl and inherit all its settings (like the S3 backend).
Use the aws-eks-cluster module as its source code.
Automatically find and use all the variables defined in dev.tfvars.

Now, to deploy dev, you cd environments/dev and run:
terragrunt apply

Terragrunt will, in the background, generate the backend.tf file for you, pull down the module, and run terraform apply with all your variables from dev.tfvars.

We have achieved the ultimate goal:

Modules are DRY (Part 1).
State Management is robust and safe (Part 2).
Environment Configuration is DRY (Part 2).

What's Next in Part 3?

We've come a long way. We've defined our infrastructure as reusable modules, and we've built a scalable, DRY structure to manage state and configuration for multiple environments.

But how do we run this? So far, we've been running terragrunt apply from our laptops. That's not a real-world workflow.

In Part 3, we'll tie this all together in a CI/CD Pipeline. We'll explore:

How to set up GitHub Actions (or your tool of choice) to run plan on every pull request.
The "human in the loop": Using tools like Atlantis or GitHub Actions approval steps to safely run apply.
A full, "PR-to-Prod" automated workflow.

Stay tuned, and happy building! Feel free to leave your questions in the comments, and I will be glad to connect on LinkedIn.

Disclaimer: Parts of this article were drafted with the help of an AI assistant. The technical concepts, code examples, and overall structure were directed, curated, and verified by the author to ensure technical accuracy and reflect real-world experience.

Top comments (4)

Jesse • Nov 12

Great tips! :) Just a quick nit about state locking, both v1.10 OpenTofu and Terraform support native S3 state locking so DynamoDB should be avoided (and it's technically deprecated already)! Everyone I know already switched to Tofu. Thanks for your post!

opentofu.org/docs/intro/whats-new/...
developer.hashicorp.com/terraform/...

Darryl Ruggles AWS Community Builders • Nov 12

I personally don't know anyone who has switched from Terraform so i guess it's all anecdotal.

Farouq Mousa AWS Community Builders • Nov 12 • Edited

Thanks for commenting @jessefarinacci
seems i missed that section :)
I wrote the core of our this article by —using a dedicated DynamoDB table for state locking— , my plan was to mention that old approach and then to mention the other approach about introducing S3 Native State Locking, thanks again.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community