DEV Community

Cover image for When Murphy Meets Terraform: The Tale of a Simple Guard That Saved My Friday
David Gamez for MobilityData

Posted on

When Murphy Meets Terraform: The Tale of a Simple Guard That Saved My Friday

This is a story about how simple precautionary measures can save you hours of work and a fair bit of your sanity.

Context

At MobilityData, we created the MobilityDatabase to host public transit and shared mobility feeds.

Our infrastructure lives in Google Cloud Platform (GCP), and everything is deployed using Infrastructure as Code (IaC) powered by Hashicorp Terraform.

In theory, Terraform keeps everything tidy and predictable. In practice... well, let's just say Murphy's Law also lives in the cloud.


Terraform (in a nutshell)

At a high level, Terraform revolves around three main ingredients:

  1. Configuration files (.tf): These define what your infrastructure should look like, which services to create, their properties, and dependencies.
  2. State file (terraform.tfstate): A JSON file that stores the current reality of your deployed resources, including IDs, metadata, and versions.
  3. The actual cloud resources: The real stuff living in GCP, such as Cloud Run services, Pub/Sub topics, buckets, and databases.

Here is the magic: Terraform continuously compares your configuration with your state file, then plans changes needed to align reality with your desired setup.

When you run terraform plan, it says:

"Hey, your config says there should be a new Cloud Run variable, but the state doesn't know about it. Should I fix that for you?"

Then, when you approve (terraform apply), Terraform talks to the cloud provider APIs to make those updates happen.

It is a beautiful system until versions get out of sync.

More about how this works is explained here.


Terraform State: The Single Source of Truth

That humble JSON file, the state, is Terraform's brain.

It knows everything: what you created, how it is configured, and who is responsible for it. Without it, Terraform becomes forgetful and might try to rebuild your infrastructure from scratch.

Because of that, the state file must be handled like a sacred artifact: secure, backed up, and shared safely among your team. (In our case, it lives in a GCP Storage bucket, because no one wants to accidentally delete production with terraform apply from their laptop.)


Different Clouds, Different Flavours

Terraform is cloud-agnostic. It does not care whether you are deploying to AWS, Azure, GCP, or something more exotic like Cloudflare or GitHub Actions.

Each cloud has its own Terraform provider, which translates Terraform's "desired state" into API calls specific to that platform.

That flexibility is excellent until provider versions change.

That is when the fun begins.


Breaking Changes Actually Break Stuff

Without going too deep into our setup, we needed to add a new property to one of our Cloud Run services.

The configuration looked simple: "Just one new line, what could go wrong?"

Turns out, a lot.

We discovered that our Terraform provider was one version behind, and the new property looked slightly different in the latest one. "No problem," we thought. "Let's just upgrade the provider."

Famous last words.

The moment we tried to deploy, Terraform refused to play along:

Error: Resource instance managed by newer provider version

This was the "mic drop" moment, on a Friday, of course.

After diving through GitHub issues, Terraform docs, and consulting my AI army, the top suggestion was terrifyingly simple:

"You can just modify the Terraform state manually."


Editing the State File (a.k.a. Playing with Fire)

Now, JSON may look friendly, but manually editing your Terraform state is like performing surgery with a chainsaw.

You can do it, but you really should not.

Even the official docs practically scream in all caps:

"You should not manually change information in your state file to avoid unnecessary drift between your configuration, state, and infrastructure."

And they are right. One wrong edit, and Terraform might decide your entire setup no longer exists and "fix" that by deleting everything.

So yeah, not exactly my idea of a relaxing Friday morning.


Redo My State, Perhaps?

Here is where our fire extinguisher came in.

When we first set up Terraform, we decided to store our state file in a GCP bucket, and we also turned on object versioning (docs here).

We didn't think much of it at the time; it was more of a "just in case" safety measure. But that small checkbox turned out to be our hero.

We simply rolled back the state file to the previous version, re-ran terraform apply, and voila, everything was back to normal. My Friday was saved. Coffee tasted better again. The weekend was back on track.


Conclusion

Terraform is powerful, but like any tool that touches production, it deserves respect and a few safety nets.

So, before you dive into fancy refactors or provider upgrades, make sure you have your "fire extinguishers" in place:

  • Versioned state file
  • Remote backend
  • Provider version pinning
  • And maybe a reminder not to terraform apply on Fridays

Because when Murphy shows up, you will want more than luck on your side.

Happy coding!

About MobilityData
MobilityData is a global non-profit maintaining the development of open data standards that power transit and shared mobility apps worldwide. We support specs like GTFS and GBFS, working with agencies, companies, and developers to make mobility data more usable and consistent.

Top comments (0)