DEV Community

David Lu
David Lu

Posted on

Two Terraform Traps That Burned Me: Hidden Defaults & Circular Dependencies

Bringing unmanaged AWS infrastructure under Terraform control—the classic 'Brownfield Migration'—is one of the most deceptive challenges in DevOps.

On the surface, it looks like a simple scripting task: just wrap terraform import and loop through resources. However, based on my experience navigating these migrations in complex environments, this naive approach almost always fails.

The impedance mismatch between AWS and Terraform creates two distinct classes of problems that standard tools miss: Hidden API Defaults and Graph Cycles.

Here is a technical breakdown of what went wrong and how we solved it using graph theory and strict schema mapping.

Trap 1: The root_block_device & The "Silent" Replacement

The task was simple: Import 34 production EC2 instances.
After generating the HCL code and running terraform plan, I expected a clean state.

Instead, I got this:

Plan: 34 to add, 0 to change, 34 to destroy.

# aws_instance.prod_web_01 must be replaced

Enter fullscreen mode Exit fullscreen mode

Every single production instance was flagged for replacement.

The Investigation

Rule #1 of IaC is "Always read the plan." But when the plan says "replace," you need to know why.

The diff looked like this:

-/+ resource "aws_instance" "prod_web_01" {
      ~ id = "i-0abc123def456" -> (known after apply)

      - root_block_device {
          - volume_size = 100    # Actual Prod State
          - volume_type = "gp3"
          - device_name = "/dev/xvda"
        }
      + root_block_device {
          + volume_size = 8      # AMI Default
          + volume_type = "gp2"
        }
    }

Enter fullscreen mode Exit fullscreen mode

The Root Cause: Read-Only vs. Writable Attributes

The issue wasn't just "I forgot to declare values." It was a conflict between the AWS API and the Terraform Schema.

  1. AWS API Reality: When you query an instance, AWS returns everything, including the DeviceName (e.g., /dev/xvda).
  2. Terraform Schema: The device_name inside root_block_device is a Computed (Read-Only) attribute. You cannot set it.

If you blindly map the API response to HCL, Terraform errors out because you're trying to set a read-only field.
If you omit the block entirely (thinking "it already exists"), Terraform assumes you want the AMI defaults (often 8GB gp2).

Because AWS cannot shrink a 100GB volume to 8GB in-place, Terraform's only option is to destroy and recreate the instance.

The Fix: Surgical Mapping

You can't just dump the API response. You have to filter it through a logic layer that understands the Terraform provider's quirks:

# Pseudo-code for the fix
def transform_root_block_device(api_response):
    ebs = api_response.get('Ebs', {})
    volume_type = ebs.get('VolumeType', 'gp2')

    result = {
        # Keep writable attributes
        'volume_size': ebs.get('VolumeSize'),
        'volume_type': volume_type,
        'delete_on_termination': ebs.get('DeleteOnTermination'),
        'encrypted': ebs.get('Encrypted'),
    }

    # Filter out Read-Only attributes that cause errors
    # - device_name
    # - volume_id

    return result

Enter fullscreen mode Exit fullscreen mode

This ensures the generated code matches the actual state of the disk without triggering schema violations.


Trap 2: The Cycle (Graph Theory vs. AWS Reality)

If the first trap was a configuration error, the second was a fundamental structural conflict.

Terraform requires a Directed Acyclic Graph (DAG). AWS allows cycles.

The Deadlock

The most common culprit is Security Groups. Imagine two microservices:

  • SG-App allows outbound traffic to SG-DB
  • SG-DB allows inbound traffic from SG-App

If you write this with inline rules (which is what terraform import generates by default), you create a cycle:

resource "aws_security_group" "app" {
  egress {
    security_groups = [aws_security_group.db.id]  # Needs DB's ID
  }
}

resource "aws_security_group" "db" {
  ingress {
    security_groups = [aws_security_group.app.id]  # Needs App's ID
  }
}

Enter fullscreen mode Exit fullscreen mode

Terraform cannot apply this. It can't create app without db's ID, and vice versa.

Visualizing the Problem

In a healthy Terraform config, dependencies flow one way:

[VPC] --> [Subnet] --> [EC2]

Enter fullscreen mode Exit fullscreen mode

But Security Groups often form cycles (Strongly Connected Components):

     ┌──────────────┐
     ▼              │
  [SG-App]       [SG-DB]
     │              ▲
     └──────────────┘

Enter fullscreen mode Exit fullscreen mode

The Solution: Tarjan's Algorithm & "Shell & Fill"

When building RepliMap (the tool I wrote to automate this), I realized we couldn't just export resources one by one. We had to model the entire AWS account as a graph using NetworkX.

We use Tarjan's algorithm to detect Strongly Connected Components (SCCs)—the "knots" in the graph.

Once a cycle is detected, we use a "Shell & Fill" strategy to break it:

  1. Create Empty Shells: Generate the Security Groups with no rules. Terraform can create these instantly because they have no dependencies.
  2. Fill with Rules: Extract the rules into separate aws_security_group_rule resources. These reference the IDs of the shells created in step 1.
Step 1: Create Shells (No Dependencies)
  [SG-App (empty)]      [SG-DB (empty)]

Step 2: Create Rules (Reference Shells)
        ▲                     ▲
        │                     │
  [Rule: egress->DB]    [Rule: ingress<-App]

Enter fullscreen mode Exit fullscreen mode

The graph is now acyclic, and Terraform is happy.


Conclusion

Tools like terraform import or Terraformer are great starting points, but they often act as simple API-to-HCL dumpers. They don't always account for:

  1. Implicit Defaults: Where missing config != existing state.
  2. Graph Topology: Where valid AWS states are invalid Terraform states.

For small projects, you can fix these manually. For brownfield migrations with 2,000+ resources, you need a deterministic engine to handle the translation.

I've open-sourced the documentation and the read-only IAM policies for the engine we built to solve this. If you're interested in the edge cases of AWS imports, check it out:

GitHub logo RepliMap / replimap-community

Reverse-engineer AWS infrastructure into production-ready Terraform. Visualize dependencies, detect drift, estimate costs.

RepliMap

AWS Infrastructure Intelligence Engine

Reverse-engineer any AWS account. Visualize dependencies. Generate Terraform. Optimize costs

Quick StartFeaturesUse CasesInstallationDocs

PyPI Python 3.10+ License

Dev.to Hacker News Twitter

RepliMap Demo


👋 About This Repository
RepliMap is a commercial tool built with a "Local-First" architecture.
This repository (replimap-community) hosts documentation, issue tracking, and examples.
The core engine is distributed via PyPI.
Your AWS credentials and data never leave your machine — the only network call is license key validation.


The Problem

You inherited an AWS account. Or maybe you built it yourself over 3 years of "just one more click."

Now you have:

  • 🤷 500+ resources and no idea what connects to what
  • 😰 No Terraform — everything was ClickOps
  • 💸 Oversized instances burning money 24/7
  • 📋 SOC2 audit next month — good luck

Sound familiar?

The Solution

RepliMap scans your AWS, builds a dependency graph, and gives you





💬 Join the discussion

Interested in the graph theory aspect? We're discussing the Tarjan implementation and edge cases over on Hacker News.

Top comments (0)