Arbythecoder

Posted on Oct 8

My First Week with Terraform: The Struggles Nobody Talks About (And How I Overcame Them)

#terraform #webdev #beginners #tutorial

I still remember my first job application that required Terraform experience. I thought I had it figured out—after all, Infrastructure as Code sounded straightforward enough: write code, deploy infrastructure, done.

The reality? I couldn't complete the take-home assignment. The state file corrupted. My resources kept getting destroyed accidentally. Error messages made no sense. I didn't get that job.

But that failure became my best teacher. During my #90DaysOfDevOps challenge, I forced myself to work with Terraform daily. I broke things. I fixed them. I broke them again. Today, after managing infrastructure for multiple projects and writing extensively about DevOps, I want to share the real struggles beginners face with Terraform—the ones tutorials gloss over—and the practical solutions that actually work.

The State File Mystery (And Why 30% of Developers Get It Wrong)

What I Thought vs. Reality

My naive assumption: "State file? Must be like a capital of a state, right? Just a log of what happened." (Yes, I actually thought this—I even made jokes about it.)

The reality: The state file is Terraform's memory—it tracks the current status of your infrastructure, mapping what exists in the cloud to what's in your code. Without it, Terraform is completely blind.

The Painful Lesson

I was working on that job assignment, confidently running terraform apply across different machines. Suddenly, Terraform wanted to recreate everything. My heart sank. I had ignored the state file, and now Terraform had amnesia about what it had built.

According to a 2024 Stack Overflow survey, 30% of developers reported state file breaches when configurations weren't properly secured. I was firmly in that 30%.

What Actually Causes State File Problems

The primary culprits behind state file corruption are network issues during updates, manual modifications to infrastructure that Terraform is managing, and software bugs. I learned this the hard way when I manually edited resources in the AWS console while Terraform was managing them. Big mistake.

During my 90 days of DevOps, I probably corrupted my state file at least 5 times in the first two weeks. Each time taught me something new about what NOT to do.

The Fix That Saved Me

Always use remote state (S3 + DynamoDB for AWS, Azure Storage for Azure, GCS for Google Cloud)
Never manually edit infrastructure that Terraform is managing—not even "just this once"
Enable state file versioning to rollback when things break
Use state locking to prevent concurrent modifications (this saved me during team projects)

# This configuration saved my sanity
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "project/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Running `apply` Without `plan` (The Lesson That Cost Me Sleep)

The Anxiety is Real

Every beginner feels that pit in their stomach the first time they see:

Plan: 0 to add, 0 to change, 8 to destroy

Your hands shake. You double-check. You triple-check. Is this right? What if I press yes?

I remember the first time I saw terraform destroy in a tutorial. My immediate thought was: "This command should require a password, a fingerprint scan, AND a written letter from your mother." The fear was REAL.

What Actually Breaks

When you skip terraform plan, you're essentially deploying blind. Here's what typically goes wrong:

Resource conflicts: Dependencies aren't met, causing cascading failures
Accidental deletions: That one typo destroys your test database (happened to me on day 23 of my challenge)
Network misconfigurations: Servers get created without proper network access
Cost explosions: Wrong instance types get deployed

During my DevOps challenge, I once accidentally deployed resources in the wrong AWS region because I didn't carefully read the plan output. It took me 3 hours to figure out why nothing was working.

The Safe Workflow (That I Now Use Religiously)

# ALWAYS this sequence—no exceptions
terraform plan -out=tfplan

# Read EVERY line of the output
# Ask yourself: "Does this make sense?"
# Check: What's being added? Changed? DESTROYED?

terraform apply tfplan

Running apply blindly can cause unintended deletions or recreations. The 30 seconds you "save" by skipping plan can cost hours—sometimes days—in recovery.

Resource Dependencies (When Terraform Isn't Psychic)

The Error Message That Confused Me for Hours

Error: Error creating EC2 instance: VPCIdNotSpecified

"But I DEFINED the VPC! It's right there in my code! Why can't Terraform see it?"

This was during week 2 of my 90-day challenge. I spent 4 hours debugging before I understood that just because resources are in the same file doesn't mean Terraform knows which order to create them.

What Beginners Try to Create (And What Goes Wrong)

Most beginners start with:

A virtual machine (EC2, Azure VM, GCP Compute)
A database (RDS, Cloud SQL)
Some networking (VPC, subnets, security groups)

The problem? They forget the creation order matters in the real cloud, even if it doesn't matter in your .tf file.

The Solution: Explicit Dependencies

Sometimes Terraform can't infer relationships from your code. You need to be explicit:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.main.id  # Implicit dependency - Terraform sees this

  depends_on = [aws_internet_gateway.gw]  # Explicit dependency - you tell Terraform
}

Pro Tip That Changed Everything for Me

Use terraform graph | dot -Tsvg > graph.svg to visualize your dependencies. This command generates a visual representation of how your resources relate to each other.

When I discovered this during my challenge, it was like turning on the lights in a dark room. Suddenly I could SEE why things were failing.

For more interactive visualizations, tools like Blast Radius, Inframap, Rover, or Terraform Visual can provide better clarity than the raw graph command.

The Fear of `terraform destroy` (It's Healthy Fear)

The Nightmare Scenario

You're tired. It's late. You're on day 47 of your DevOps challenge. You type terraform destroy in what you THINK is your dev environment.

It's production.

You type "yes."

🚨💀🔥

I never did this (thank God), but the FEAR of doing it kept me up at night during my learning phase. That fear is actually good—it keeps you careful.

How to Protect Yourself

Use workspaces to separate environments (terraform workspace new dev)
Add -target flag when destroying specific resources
Implement destroy protection for critical resources
Always verify with terraform workspace show before running destroy
Check your cloud console to confirm which account/region you're in

resource "aws_db_instance" "production" {
  # ... other configuration ...

  lifecycle {
    prevent_destroy = true  # Can't destroy without removing this first
  }
}

This simple setting has saved countless production databases from accidental destruction.

What the Tutorials Don't Tell You (The Real Learning)

1. Start Ridiculously Small

Don't try to Terraform your entire infrastructure in week one. During my challenge, I started with:

Day 1-3: One EC2 instance
Day 4-7: One S3 bucket with versioning
Day 8-14: One VPC with subnets
Day 15+: Started combining them

Master basics before complexity. Your future self will thank you.

2. Error Messages Are Cryptic (And That's Normal)

Terraform errors often come from the underlying cloud provider, not Terraform itself. That confusing AWS error about IAM permissions? That's AWS talking, not Terraform.

Your debugging workflow:

Copy the entire error message
Google: [error message] + [cloud provider] + terraform
Someone on StackOverflow has had your exact problem
Read the provider documentation carefully

I probably Googled 200+ different Terraform errors during my 90-day challenge. It gets easier.

3. Version EVERYTHING

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # Pin this!
    }
  }
}

Allowing Terraform to always pull the latest provider version can cause sudden failures when a provider update changes behavior. Pin your versions. Trust me on this.

4. The Documentation Is Your Best Friend

I spent week 1 fighting Terraform. Week 2, I spent half my time reading the AWS provider documentation. Week 2 was infinitely more productive.

The HashiCorp documentation is actually excellent—you just need to give it time.

5. Modules Come Later

Beginners think they should use modules immediately. Don't. Copy-paste is okay when you're learning. I didn't touch modules until day 40 of my challenge, and that timing was perfect.

Understand what the code does before you abstract it.

Real Companies Using Terraform Successfully

Let me share some examples of how real companies are using Terraform, based on publicly available case studies and documentation:

Example 1: GitLab's Infrastructure Approach

GitLab uses Terraform for infrastructure automation while GitLab serves as the single source of truth for version control, ensuring the same infrastructure environment deploys each time and eliminating inconsistent configurations.

Key lesson for beginners: Treat your infrastructure code like application code—version control, code reviews, and CI/CD pipelines apply here too.

Example 2: Financial Services at Scale

A top-5 financial services company standardized on HashiCorp Terraform for all self-service provisioning needs, allowing developers to deploy infrastructure on whatever cloud vendor they need.

Key lesson for beginners: Terraform's multi-cloud capability isn't just marketing—companies actually use it to avoid vendor lock-in.

Example 3: Startups Moving Fast

Many early-stage startups use Terraform to:

Spin up entire environments in 15 minutes instead of 2 days
Clone production to staging with a single command
Tear down dev environments overnight to save costs
Onboard new developers faster with documented infrastructure

Key lesson for beginners: Infrastructure as Code means Infrastructure as an ON/OFF switch. This is especially valuable when you're watching every dollar.

My Honest Assessment After the Struggle

Three Months and 90 Days Later

I'm not a Terraform expert. I still Google things daily. I still make mistakes. But I can now:

Deploy multi-tier infrastructure in minutes instead of hours
Collaborate with teams without stepping on each other's toes
Reproduce environments consistently across AWS, Azure, and GCP
Sleep at night knowing I can rollback any change
Confidently take on DevOps projects that require IaC

Would I Recommend Terraform?

Yes, but with realistic expectations:

✅ Do use Terraform if:

You deploy infrastructure more than once
You work with a team (or plan to)
You want reproducible environments
You're tired of clicking through cloud consoles
You need to document your infrastructure decisions

❌ Don't start with Terraform if:

You're deploying a single static website (overkill)
You have zero command-line experience (learn bash basics first)
You're in a rush (the learning curve is real—budget 2-4 weeks)
You expect it to be "easy" (it's not, but it's worth it)

The Transformation

That job I didn't get because of Terraform? It pushed me to start my 90 Days of DevOps challenge. I studied. I broke things in dev environments. I fixed them. I wrote about my experiences.

Six months later, I landed a better role specifically because of my documented Terraform journey on Dev.to and my GitHub projects. The failure was the beginning, not the end.

Your struggles with Terraform right now? They're not signs you should quit. They're signs you're learning something valuable that most people give up on.

🎁 FREE RESOURCE: Don't Get Destroyed by Cloud Costs

Speaking of Infrastructure as Code and cloud infrastructure—one of the biggest shocks for beginners isn't just learning Terraform, it's the AWS bill at the end of the month.

I created a free checklist based on mistakes I've seen (and made myself): "Don't Get Financially Destroyed by the Cloud" Checklist

It covers:

✅ Cost optimization settings you should enable on Day 1
✅ The 5 cloud resources that silently drain your budget
✅ Monitoring alerts that actually matter

Download it free here — it's the resource I wish I had when I started.

Practical Resources That Actually Helped Me

HashiCorp's Official Tutorials - Start here, not YouTube (seriously)
Terraform Registry - Real module examples from real companies
Your Cloud Provider's Terraform Documentation - AWS/Azure/GCP have excellent Terraform docs
Dev.to DevOps Community - Where I share my journey and get amazing feedback
r/Terraform on Reddit - Quick answers to specific questions
The Terraform Discord - Real-time help when you're stuck

Your Turn: Let's Learn Together

If you're learning Terraform right now, I want to hear from you:

What's your biggest confusion right now?
What error message is driving you crazy?
What concept do the tutorials keep skipping over?
Are you doing your own 90-day challenge?

Drop it in the comments. I probably made the same mistake two months ago. Let's figure it out together.

And if you're past the beginner stage: What do you wish someone had told you in week one? Share your wisdom. Help the next person who's struggling with their first terraform apply.

Key Takeaways (TL;DR)

State files are Terraform's memory - protect them like production data, use remote state from day one
Always run plan before apply - 30 seconds of review saves hours of recovery
Dependencies aren't always obvious - use terraform graph to visualize, use explicit depends_on when needed
Fear of destroy is healthy - implement safeguards, use workspaces, verify before confirming
Start small, master basics - don't Terraform everything on day one
Error messages come from providers - Google the cloud provider name with the error
Documentation beats tutorials - spend time in HashiCorp and provider docs
Real companies use this successfully - from startups to Fortune 500 financial institutions

Top comments (7)

Richard Quadling • Oct 10 • Edited

By the time you've got Terraform under control, along comes terrafom-docs, tflint, tfenv, and terragrunt. And then OpenToFu. And then public and private modules. That's before you've even started patching the providers and utilities!

I've been in the Infrastructure team for about 6 years. From being #1 in a field of 1, to a 5 man team. The unofficial job title of "The Site Must Not Go Down" is pretty much baked in.

I find Terraform easy to use now. Have pipelines doing drift detection, automated PR generation, automated version upgrades, and a LOT of reporting and tracking.

Once you've got the flow, adding a new project that's like one of the other projects, one quick make command and off to go configuring the services. If it takes more than an hour for a new application or service to be deployed, then, well, we give the newbies a couple of hours.

But it is ALL automation. And monitoring. And that is the place where the real work is.

We learnt to go from these projects, applications, services, utilities being "pets" to being a "herd".

Everything fails all the time.

We just need to be able to respond to that effectively and efficiently. That is what we are really there for. And to have fun trying to break things. Often without knowing about it!!

Arbythecoder • Oct 14

Richard, this is the roadmap I didn't know I needed! 6 years in and you've built what looks like production-grade IaC at scale , drift detection, automated PRs, sub-hour deployments. That's impressive.
I'm definitely past the "will this destroy production?" panic stage, but you've just shown me how much of the advanced tooling ecosystem I haven't explored yet. Terraform-docs, tflint, terragrunt, OpenToFu, I've heard the names but haven't committed to learning them systematically.
The "pets to herd" mindset is spot-on. I'm in that transition phase where I understand it intellectually but I'm still building the muscle memory for it practically.
Curious, when you were building out your automation stack, what was your first major win? What's the one tool or practice that gave you the biggest leverage early on?
And "The Site Must Not Go Down" absolutely needs to be official. I respect teams that own chaos engineering with that level of confidence 😄

Dima Altukhov • Oct 11

The next step is to start using one of terraform orchestrator to avoid code duplication when you need to create exact same resources just with different parameters like vpc with cidr 10.0.0.0/16 for prod and 10.1.0.0/16 for staging.

Take a look on my pet project: github.com/alt-dima/tofugu

Arbythecoder • Oct 14

Dima, this hits exactly where I am right now. I've been managing environment configs manually and yeah, the duplication is real.
Tofugu's approach looks solid. What was the specific gap you found with Terragrunt that pushed you to build this? I'm evaluating orchestration tools and always interested in the "why build vs extend" story.
Quick question - how's the transition if you're coming from standard tfvars workflows?

Dima Altukhov • Oct 14

To be honest, I never worked with any other public orchestration tool , only CDKTF and private tool that inspired to write Tofugu.

From my quick look and understanding, terragrunt still requires to touch the folders with terraform code if you need to create another set of resources (like staging2).

With Tofugu, Toaster (UI, DB, API) and Jenkins it is much easier. Because you create only one terraform code and then just manages staging1, staging2, production configurations and execute tofugu in Jenkins.

Transition from standard tfvars to Tofugu is very fast and easy, because under the hood, tofugu just automates tfvars population! And attach them to teraform code.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

The State File Mystery (And Why 30% of Developers Get It Wrong)

What I Thought vs. Reality

The Painful Lesson

What Actually Causes State File Problems

The Fix That Saved Me

Running apply Without plan (The Lesson That Cost Me Sleep)

The Anxiety is Real

What Actually Breaks

The Safe Workflow (That I Now Use Religiously)

Resource Dependencies (When Terraform Isn't Psychic)

The Error Message That Confused Me for Hours

What Beginners Try to Create (And What Goes Wrong)

The Solution: Explicit Dependencies

Pro Tip That Changed Everything for Me

The Fear of terraform destroy (It's Healthy Fear)

The Nightmare Scenario

How to Protect Yourself

What the Tutorials Don't Tell You (The Real Learning)

1. Start Ridiculously Small

2. Error Messages Are Cryptic (And That's Normal)

3. Version EVERYTHING

4. The Documentation Is Your Best Friend

5. Modules Come Later

Real Companies Using Terraform Successfully

Example 1: GitLab's Infrastructure Approach

Example 2: Financial Services at Scale

Example 3: Startups Moving Fast

My Honest Assessment After the Struggle

Three Months and 90 Days Later

Would I Recommend Terraform?

The Transformation

🎁 FREE RESOURCE: Don't Get Destroyed by Cloud Costs

Practical Resources That Actually Helped Me

Your Turn: Let's Learn Together

Key Takeaways (TL;DR)

Running `apply` Without `plan` (The Lesson That Cost Me Sleep)

The Fear of `terraform destroy` (It's Healthy Fear)