John Munn

Posted on Jun 3

Infrastructure as Code Isn’t About Speed. It’s About Trust

#devops #softwareengineering #infrastructure

I’ve never had a disaster caused by someone clicking the wrong button in a cloud console.

But I’ve had plenty of conversations that took 30 minutes longer than they should have, just to figure out:

What changed?
Who changed it?
And whether it was safe to roll back.

That’s what Infrastructure as Code (IaC) solves for me.

Not just automation. Not just speed. Trust.

Speed Is a Side Effect

Most articles will tell you that IaC makes things faster

Sure, it does. Spinning up environments, codifying pipelines, standardizing infra.

But speed isn’t the reason. It’s a byproduct of something more important: systemic trust.

Teams don’t move fast because of automation.
They move fast because they trust what happens when they do.

Trust in Repeatability

When infrastructure is defined in code:

New hires don’t need tribal knowledge to ship.
Rollbacks aren’t just possible, they’re predictable.
Every environment can be rebuilt from source. Even in disaster recovery.

And rollback strategies matter.

Here’s how I frame them:

Feature flags: Ideal for application-level toggles with fast reversion and minimal infra impact.
Canary deployments: Great for distributed systems where early signals are visible at scale.
Blue-green deployments: My go-to for controlled, clean-cut rollbacks, especially for critical systems with predictable traffic.

Testing IaC is also critical. Unit tests can verify module behavior (e.g., naming conventions, tags, required outputs), while integration tests can validate full-stack environments and ensure resources are created as expected. IaC that isn’t tested is just scripted guesswork.

Concrete Scenario: IaC During a Platform Shift

In one platform evolution I oversaw, we moved from a tightly coupled architecture toward microservices. This shift surfaced a key realization: our infrastructure definitions couldn’t scale linearly with the services.

The team adjusted by:

Breaking IaC into service-aligned modules
Creating shared network and security templates
Managing environments through workspace-based isolation

The payoff wasn’t just more reuse, it was that each team could now own their service’s infra config safely, without stepping on others.

As a result, deployment times dropped from multi-day coordination to same-day rollouts. Changes became smaller, safer, and easier to understand.

That’s trust at scale.

Deep Dive: Modularity Matters

If you’re working with Terraform, modularization isn’t optional. It’s the difference between sustainable growth and “who touched main.tf?”

Here’s what strong modularity looks like in practice:

Separate state files per service or domain: This minimizes blast radius and simplifies parallel deployment.
Re-usable core modules (e.g. VPC, ECS cluster, S3 buckets) imported by each service.
Environment overlays via Terragrunt or wrapper scripts for dev/stage/prod parity.

Even if you’re not the one writing every line, understanding why this structure reduces risk is a strategic skill. It enables org-wide alignment and shared ownership without bottlenecks.

Trust in Visibility

Without IaC, infra lives in someone’s AWS console, or worse, their memory.

With IaC, it lives in Git:

Reviewed like any other code
Auditable for compliance
Reproducible for every environment

GitOps is a natural extension of this, and something I recommend when:

You want full auditability
You have multiple clusters or environments
You need convergence over push-based deployment

GitOps shines in larger orgs. In smaller teams, a push-based pipeline might be simpler and faster. Both are valid, the key is intentionality.

Tooling Reflects Strategy

Tooling isn’t preference. It’s architecture. Your choice shapes team workflow, onboarding speed, and failure recovery.

Here’s how I think about it:

Tool Comparison for IaC Strategy

Terraform

When to use it: Ideal for multi-cloud or standardized infra across teams. Strong ecosystem and provider support.
Trade-offs: Proven DSL, broad community support. Manual state file management unless using backends like S3/DynamoDB. Limited type safety.

Pulumi
When to use it: Great for teams that prefer using general-purpose languages (TypeScript, Python, Go).
Trade-offs: Type-safe and testable. Easier to build abstractions. Smaller ecosystem. Requires more language discipline.

AWS CDK
When to use it: Best for AWS-native teams that want higher-level abstractions in familiar programming languages.
Trade-offs: Strong AWS integration. Limited multi-cloud support.️ Evolving quickly, which may introduce churn.

CloudFormation
When to use it: Useful for regulated environments or legacy systems requiring AWS-native tooling.
Trade-offs: No external dependencies. Verbose syntax. Difficult to modularize or scale in complex environments.

Each has its place, but the key is consistency and clarity across your teams.

IaC and Broader Technical Strategy

IaC doesn’t live in isolation. It touches your entire stack:

Observability: IaC lets you consistently provision monitoring across environments.
Security: You can codify least privilege, secrets rotation, and policy-as-code.
Cost optimization: IaC makes resources visible, and deletable.
Multi-cloud: IaC is the only way to avoid drift and chaos across providers.

And increasingly, it’s foundational to platform-as-a-product thinking. Developer experience matters. The best infra setups I’ve seen make creating a service feel like using a well-designed SDK: opinionated, repeatable, and low-friction.

Looking ahead:

I expect IaC to integrate more tightly with platform engineering and internal developer portals.
IaC will evolve to support AI/ML workflows where reproducibility, GPU scheduling, and hybrid (cloud/edge) environments are non-negotiable.
Long-term, event-driven infra provisioning and policy-as-code baked into platform layers will drive the next leap in governance and agility.

Strategic Decision Framework

I use a few mental models when making infra decisions:

Paved Path vs. Permissive Architecture: What’s the default experience vs. the escape hatch?
Debt-to-Velocity Ratio: If we’re borrowing against the future, is the velocity gain worth the repayment plan? For example, hardcoding secrets into a dev environment might unlock faster POCs, but it builds habits that create security debt later. Smart teams plan for the cleanup, or at least acknowledge it.
Reversibility: Can we undo this? If not, we need more consensus.
Scope of Blast Radius: Who gets affected when this goes wrong?

Good infra strategy isn’t about saying no to change. It’s about helping teams take intentional, informed risks.

Org Structure and Infrastructure Strategy

Infrastructure mirrors team shape. Always.

If you’re scaling fast, service-aligned infra modules help teams stay independent.
In smaller orgs, a central platform team can drive consistency, but risks becoming a bottleneck without strong enablement.
I’ve found Conway’s Law useful as a reality check: if the team structure doesn’t support infra boundaries, the architecture won’t either.

Ultimately, your IaC strategy should evolve alongside your org chart. Not after it.

How I Think About It as a Leader

If I were starting from scratch as a CTO, I wouldn’t ask,

Do we use Infrastructure as Code?

I’d ask:

“Can we rebuild prod without anyone sweating?”
“Do infra decisions happen in code, or in chat?”
“Is our platform built for speed and safety?”

I’ve noticed that the most confident teams I’ve worked with. The ones who ship without fear, treat their infrastructure the same way they treat their application code: versioned, reviewed, and testable.

If your team had to rebuild prod from scratch today, how long would it take? That answer has less to do with tooling and everything to do with trust.

DEV Community