우병수

Posted on May 6 • Originally published at techdigestor.com

I Ditched Terraform for These Free IaC Tools — Here's What Actually Held Up in Production

#productivity #tools #webdev #discuss

TL;DR: The BSL switch blindsided a lot of teams because it wasn't a "you can't use this anymore" announcement — it was more subtle than that. HashiCorp relicensed Terraform from MPL 2.

📖 Reading time: ~33 min

The Problem: Terraform's BSL License Change Broke Our Pipeline Planning

The BSL switch blindsided a lot of teams because it wasn't a "you can't use this anymore" announcement — it was more subtle than that. HashiCorp relicensed Terraform from MPL 2.0 to the Business Source License starting with v1.6, and the specific clause that made our legal team nervous was this: you can't use Terraform to build a product that competes with HashiCorp. That sounds fine until your company builds internal developer platforms, sells managed cloud services, or runs a CI/CD product as part of a larger offering. Suddenly "are we competing?" becomes a real legal question, not a rhetorical one.

The friction for my team wasn't day-one usage — we kept shipping without interruption. The problem showed up in planning meetings. We had three open initiatives that touched Terraform directly: a self-service infrastructure portal for internal teams, a plan to package our deployment pipeline as a reusable module library for a sister company, and a quarterly audit checklist that now had a new line item: "verify IaC toolchain license compliance." That last one is the hidden tax nobody talks about. Every six months someone has to research whether our usage still qualifies as non-commercial under BSL terms, which is not a trivial question and not free legal time.

I've heard the "just pin to 1.5.x and move on" argument a dozen times. It doesn't hold up past about 18 months for a few concrete reasons. Providers keep moving — the AWS provider, Azure provider, and GCP provider all release breaking changes on a rolling basis, and they test against current Terraform versions. I've already hit situations where a provider version I needed for a new resource type required Terraform 1.6+. Beyond that, 1.5.x will eventually fall out of the community's practical support window: no new security patches, no fixes for edge cases that get discovered, and increasingly outdated examples in the docs. You're not frozen in amber — you're accumulating drift.

Before the switch, my setup was about as standard as it gets: Terraform 1.5.4, S3 remote backend with DynamoDB state locking, roughly 40 modules split across three workspaces (AWS for core infra, GCP for data pipelines, Azure for a legacy app we can't fully migrate). The CI/CD side ran plan on PR and apply on merge to main inside GitHub Actions. Nothing exotic. That's exactly the profile where you feel most comfortable staying put — and also where migration is most predictable if you plan it right. For a complete list of tools that keep your infrastructure workflow sane, check out our guide on Productivity Workflows.

# What our backend config looked like — straightforward S3 setup
terraform {
  required_version = "~> 1.5.0"

  backend "s3" {
    bucket         = "company-tfstate-prod"
    key            = "infra/core/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-lock"
    encrypt        = true
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

The thing that finally pushed us to actually evaluate alternatives wasn't a legal mandate — it was a new hire asking "why are we still on Terraform?" in an architecture review. He'd come from a team already running OpenTofu and had opinions. That question forced us to articulate our real reason for staying, and "inertia" isn't a good enough answer when you have 40 modules, three providers, and a compliance checklist that now has a new line every cycle. The evaluation process that followed is what this whole article is actually about.

The Contenders I Actually Tested (and the One I Skipped)

I spent about three months running real workloads through these tools after HashiCorp flipped Terraform to BSL. Not toy examples — actual EKS clusters, RDS instances, S3 bucket policies, and the kind of messy multi-account AWS setups that expose every rough edge a tool has. Here's what I actually found.

OpenTofu

The drop-in replacement story is mostly true. I migrated a 14,000-line Terraform codebase by changing one binary path and running tofu init. That's it. No HCL rewrites, no provider version drama. The Linux Foundation fork moved fast after the BSL announcement, and as of 1.7.x they've already shipped features Terraform hasn't — native state encryption being the most practically useful one. The thing that caught me off guard was how good the community momentum is: providers from the OpenTofu registry are just the Terraform registry under the hood for now, so your existing required_providers blocks work unchanged. The honest trade-off is that OpenTofu is still playing catch-up on first-party tooling — the cloud IDE integrations and Sentinel policy-as-code equivalent aren't as polished yet.

# Migration is genuinely this simple
wget https://github.com/opentofu/opentofu/releases/download/v1.7.2/tofu_1.7.2_linux_amd64.zip
unzip tofu_1.7.2_linux_amd64.zip
./tofu init    # reads your existing .terraform.lock.hcl fine
./tofu plan    # output is identical to terraform plan

Pulumi

Pulumi is the tool I reach for when the team writing the infra is stronger in TypeScript or Python than HCL. You write actual loops, conditionals, and abstractions — not count hacks and for_each gymnastics. The free tier gives you 200,000 resource updates per month against their managed state backend, which is plenty for most teams. I've seen the ceiling hit exactly once, on a shop doing 15+ deploys a day across many stacks. Past that, you switch to self-hosted state (S3 + DynamoDB, same pattern as Terraform) at zero cost. The gotcha nobody warns you about: Pulumi's execution model is fundamentally different — your program runs top-to-bottom and the dependency graph is inferred from Output<T> references, which means async reasoning errors during the first week or two are inevitable.

// Real Pulumi TypeScript — creates an EKS cluster with typed config
import * as eks from "@pulumi/eks";
import * as aws from "@pulumi/aws";

const cluster = new eks.Cluster("prod", {
    instanceType: "t3.medium",
    desiredCapacity: 3,
    minSize: 1,
    maxSize: 5,
    // outputs are strongly typed — your IDE catches mismatches at write time
    vpcId: vpc.id,
    subnetIds: privateSubnets.ids,
});

Crossplane

Crossplane requires a mindset shift that most teams underestimate. You're not writing a plan and applying it — you're declaring desired state as Kubernetes CRDs and letting controllers reconcile continuously. I wrote about this in depth in my Crossplane + CI/CD post, but the short version: once you internalize the operator model, config drift disappears. Something changes your RDS instance out-of-band? Crossplane corrects it on the next reconciliation loop. The downside is the learning curve is steep and the error messages from failed compositions are some of the worst I've seen in any infrastructure tool. Crossplane makes sense when your team already lives in Kubernetes and you want infra to feel like the rest of your cluster config. It makes zero sense if you're managing infra from a CI pipeline with no K8s footprint.

Ansible

Yes, some IaC purists will tell you Ansible doesn't count because it's procedural and doesn't manage state. Those people are right, and also I've shipped production infrastructure with Ansible for six years. The honest framing: Ansible is declarative enough for cloud resource provisioning when you use the right modules (amazon.aws.ec2_instance, amazon.aws.rds_instance, etc.), it's genuinely free under the GPL, and every ops person already knows YAML. Where it breaks down is at scale — running 200 tasks against 80 EC2 instances with --limit filters and trying to understand what actually changed is painful. I use it for day-2 operations and bootstrapping, not as my primary cloud provisioning tool.

# Provision an RDS instance — this is declarative enough to be useful
- name: Create production database
  amazon.aws.rds_instance:
    state: present
    db_instance_identifier: prod-postgres-16
    engine: postgres
    engine_version: "16.2"
    db_instance_class: db.t3.medium
    allocated_storage: 100
    master_username: admin
    master_user_password: "{{ vault_db_password }}"
    multi_az: true
  register: rds_result

AWS CDK and CDKtf

CDK is genuinely excellent if you're AWS-only and your team thinks in constructs. The L2 and L3 construct library saves enormous amounts of boilerplate — a single new ApplicationLoadBalancedFargateService() creates the ALB, target group, ECS service, IAM roles, and security groups. CDK for Terraform (CDKtf) applies the same idea to Terraform providers, meaning you get full provider coverage beyond just AWS. Both are free. I reach for CDK specifically when I'm building reusable internal platform components for AWS — it's overkill for a single-repo infra setup and the synthesized CloudFormation output is nearly unreadable for debugging.

What I skipped and why

Terragrunt is a wrapper around Terraform, not a replacement. It solves real DRY problems with module composition, but if you're evaluating alternatives because of the BSL license, Terragrunt doesn't help you — it still calls terraform` under the hood. Pair it with OpenTofu if you want it. I skipped Farmer entirely because it generates ARM templates through F# DSL. If your team runs .NET, it's genuinely clever. My team doesn't, and forcing a .NET toolchain into a Go/TypeScript shop to manage Azure infra is a hard sell I didn't want to make.

`## OpenTofu: The Path of Least Resistance The thing that surprised my team most after the BSL license change wasn't how long the migration took — it was how short. OpenTofu is a hard fork of Terraform at the 1.5.x branch, maintained under the Linux Foundation with an MPL 2.0 license. That license matters because MPL 2.0 is genuinely open — you can use it in commercial products, fork it, wrap it, whatever. No "community edition" gotcha hiding two pages into the ToS. Installation takes about two minutes and you probably already have the package manager:

# macOS brew install opentofu # Linux (snap) snap install opentofu --classic # or grab the binary directly curl --proto '=https' --tlsv1.2 -fsSL https://get.opentofu.org/install-opentofu.sh | sh

The "drop-in replacement" claim is actually true for most setups. In your CI config, swap `terraform` for `tofu` and you're done with 90% of the work. GitHub Actions, GitLab CI, whatever — it's just a binary rename. The one real gotcha I hit: the provider lock file. If you copy over your existing `.terraform.lock.hcl`, OpenTofu may refuse to use it or produce checksum mismatches. Regenerate it explicitly:

# Run this after switching — don't just copy your old lock file tofu providers lock \ -platform=linux_amd64 \ -platform=darwin_arm64 # Commit the result git add .terraform.lock.hcl

State compatibility is real and not just marketing copy. OpenTofu reads S3-backed state, GCS state, Terraform Cloud state files — whatever you're storing — without any migration step. I pointed it at an existing S3 bucket with a `terraform.tfstate` key and ran `tofu plan`. It just worked. The state format hasn't diverged because they deliberately kept it compatible at the storage layer. Where OpenTofu is actually pulling ahead of where Terraform was at 1.5.x: the 1.7 release added early-evaluation for variables, meaning you can now use variables in places like `backend` configurations that previously required workarounds or wrapper scripts. The built-in testing framework (the `.tftest.hcl` format) has also gotten meaningful improvements — mock provider support being the one I actually use, since it lets you test module logic without spinning up real cloud resources:

# example.tftest.hcl mock_provider "aws" { mock_resource "aws_s3_bucket" { defaults = { id = "my-test-bucket" arn = "arn:aws:s3:::my-test-bucket" } } } run "bucket_has_correct_tags" { command = plan assert { condition = aws_s3_bucket.main.tags["env"] == "prod" error_message = "Missing env tag on bucket" } }

My honest take: if you have existing HCL, a working state backend, and a team that already knows Terraform — just use OpenTofu. It's the boring answer, and boring is right here. You don't get to brag about migrating to a new paradigm, but you also don't spend three weeks rewriting modules in Pulumi TypeScript while your deploys are blocked. Save the exotic tooling for greenfield. For anything with existing infrastructure, OpenTofu is the answer that lets you ship by Friday. ## Pulumi: When You're Tired of Writing YAML That Pretends to Be Code The thing that actually surprised me about Pulumi wasn't the "real programming languages" pitch — every IaC tool claims to be developer-friendly. It was opening a Python file and writing a `for` loop to spin up 12 S3 buckets with different retention policies without copy-pasting 200 lines of HCL. That moment was genuinely different. Terraform's HCL is Turing-complete in theory but torturous in practice. Pulumi just… isn't. Getting started is one command:

# Install the CLI curl -fsSL https://get.pulumi.com | sh # Scaffold a new AWS stack with TypeScript pulumi new aws-typescript # Or Python if that's your team's language pulumi new aws-python

That second command drops you into an interactive prompt asking for your stack name, AWS region, and project description. Then it generates a real `requirements.txt` or `package.json`, a `Pulumi.yaml`, and an entrypoint file you can actually read. No wrestling with provider blocks and version constraints as step one. Here's what an S3 bucket with lifecycle rules looks like in Python vs what you'd write in HCL. The Python version:

import pulumi import pulumi_aws as aws # Lifecycle config as a real Python dict — no HCL template syntax lifecycle_rule = aws.s3.BucketLifecycleConfigurationV2RuleArgs( id="expire-old-logs", status="Enabled", filter=aws.s3.BucketLifecycleConfigurationV2RuleFilterArgs(prefix="logs/"), expiration=aws.s3.BucketLifecycleConfigurationV2RuleExpirationArgs(days=90), ) bucket = aws.s3.BucketV2("app-logs", tags={"env": pulumi.get_stack()}) aws.s3.BucketLifecycleConfigurationV2( "app-logs-lifecycle", bucket=bucket.id, rules=[lifecycle_rule], ) pulumi.export("bucket_name", bucket.id)

That's under 20 lines including whitespace. The equivalent Terraform HCL with the `aws_s3_bucket_lifecycle_configuration` resource, proper backend config, required providers block, and variable declarations is closer to 50-60 lines — and the moment you want to parameterize the prefix or retention days per environment, you're reaching for `count` or `for_each` and fighting with HCL's awkward iteration syntax. In Pulumi Python you just pass a variable. The state backend question is where I see teams get confused. By default, `pulumi up` stores state in Pulumi Cloud, which is free for individual use with unlimited resources. For teams, check [pulumi.com/pricing](https://www.pulumi.com/pricing/) because the limits kick in around collaborative features, not resource count. If you want to self-host — which I do for anything touching production credentials — it's one command:

# Point Pulumi at your own S3 bucket for state storage pulumi login s3://your-state-bucket # Or an Azure blob container pulumi login azblob://your-container # Verify what backend you're currently using pulumi about

Speaking of `pulumi about` — run this constantly. The rough edge I hit hardest was SDK version drift. The `pulumi` CLI, the `pulumi_aws` Python package, and the underlying Pulumi resource plugin are three separate versioned things that can quietly get out of sync. I once had a `pulumi up` fail with a cryptic gRPC error that turned out to be a mismatch between CLI 3.94 and an AWS provider plugin that hadn't updated. `pulumi about` dumps every version in one shot and usually points you right at the conflict. Keep your `requirements.txt` pinned and run `pulumi plugin install` after any upgrade. Pulumi genuinely wins when your infrastructure has logic in it — multi-region deployments where each region gets slightly different config, component libraries that your platform team publishes as pip packages that app teams consume, or environments where you're generating resource names programmatically. I've also seen it win hard for teams building internal developer platforms where the IaC is itself an API. Where it loses is onboarding. If a teammate knows neither TypeScript nor Python, the first week of Pulumi is slower than the first week of Terraform — they're learning the tool _and_ a language simultaneously. For a five-person team where everyone codes, that's fine. For an ops-heavy team where half the people live in YAML, you'll feel that friction immediately. ## Crossplane: IaC for Teams Already Living in Kubernetes The thing that caught me off guard when I first tried Crossplane wasn't the complexity — it was the complete mental model flip. You're not writing HCL files and running a CLI. You're installing operators into a running cluster and letting Kubernetes reconciliation loops manage your cloud resources. Your AWS RDS instance becomes a CRD. Your GCS bucket is just another object in etcd. If that sounds alien, you're probably not the target audience. If that sounds _right_, keep reading. Install is clean if you already have Helm set up:

# requires Kubernetes 1.26+; Crossplane 1.15 as of this writing helm repo add crossplane-stable https://charts.crossplane.io/stable helm repo update helm install crossplane crossplane-stable/crossplane \ -n crossplane-system \ --create-namespace \ --set args='{"--enable-composition-functions"}' # verify the core controllers are running kubectl get pods -n crossplane-system

Then you install a provider separately — Crossplane itself does nothing without one. The big three are `upbound/provider-aws`, `upbound/provider-gcp`, and `upbound/provider-azure`. Pin your versions aggressively here. I got burned chasing a provider-aws update from 0.46 to 1.x that changed CRD structure mid-sprint. Treat provider upgrades like you'd treat a database migration: planned, tested in staging first.

apiVersion: pkg.crossplane.io/v1 kind: Provider metadata: name: provider-aws spec: package: xpkg.upbound.io/upbound/provider-aws:v1.7.0 # pin the version — don't use 'latest' here, ever

Where Crossplane genuinely earns its complexity is platform engineering. The pattern that works: your platform team writes a `CompositeResourceDefinition` (XRD) and a `Composition` that describes how an RDS instance gets built — subnet groups, parameter groups, security groups, the whole thing. Then a developer just does this:

apiVersion: database.myplatform.io/v1alpha1 kind: PostgreSQLInstance metadata: name: my-app-db namespace: team-payments spec: parameters: storageGB: 20 version: "15" writeConnectionSecretToRef: name: my-app-db-creds

That's it. The dev never touches the AWS console, never deals with VPC config, never creates IAM roles. The platform team owns all that complexity inside the Composition. The connection string lands in a Kubernetes Secret automatically. I've seen this pattern completely eliminate the "I need an AWS IAM user to test this" ticket that used to clog platform queues every week. The gotcha nobody warned me about: when a provision fails, the error is buried three layers deep in Kubernetes event chains. You'll run `kubectl describe composite my-db` and see a vague "cannot resolve references" message. Then you'll chase down the individual managed resources:

# find the managed resources the composite created kubectl get managed # drill into the specific failing one kubectl describe rdsinstance my-app-db-xr-abc12 # if that's still vague, check the provider controller logs kubectl logs -n crossplane-system \ -l pkg.crossplane.io/revision=provider-aws-1234 \ --tail=100 | grep ERROR

Budget time for this debugging loop. It's not intuitive and the error messages are not good. The Crossplane community is active on Slack and GitHub issues are usually answered, but expect to spend an afternoon the first time something goes wrong in production. Honest take: if you're not already running Kubernetes as your primary control plane — meaning your apps, your CI runners, your tooling all live there — Crossplane is genuinely not worth it. You'd be adopting a full Kubernetes cluster just to manage cloud resources, which is absurd when Pulumi or OpenTofu give you the same result with a `pulumi up`. But if Kubernetes _is_ your operating system and you have a platform engineering function that wants to hand developers self-service infrastructure? Crossplane is the most coherent answer I've found. Nothing else integrates that cleanly into the "everything is a Kubernetes resource" mental model that cloud-native shops have already bought into. ## Ansible: The Tool People Dismiss Until They're Debugging at 2am The tool that genuinely surprised me after years of reaching for Terraform first: Ansible handles a huge percentage of real provisioning work, and the only reason people underestimate it is that it doesn't look like "proper" IaC. No state file, no plan output, no provider ecosystem. But that's also why it's still running on half the servers I've ever touched at 2am when something is actually broken. What keeps pulling people back is the architecture. No agents to install, no daemon to keep alive, no enrollment process. Ansible SSHes directly into your targets and runs. For cloud provisioning specifically, that means your control plane is just a Python process on your laptop or CI runner. Install is genuinely two commands:

# Python 3.9+ recommended pip install ansible pip install boto3 botocore # required for all AWS modules # Confirm AWS collection is available ansible-galaxy collection install amazon.aws community.aws

The actual IaC usage looks more normal than people expect. You're not shelling out to the AWS CLI — you're using proper module interfaces with real return values you can register and use downstream in the same playbook:

name: Provision app server hosts: localhost gather_facts: false tasks: - name: Launch EC2 instance community.aws.ec2_instance: name: "app-server-prod" instance_type: t3.medium image_id: ami-0c02fb55956c7d316 region: us-east-1 vpc_subnet_id: subnet-0abc123 security_groups: ["sg-0def456"] tags: Environment: production state: running register: ec2_result - name: Create S3 bucket for assets amazon.aws.s3_bucket: name: myapp-assets-prod region: us-east-1 versioning: true state: present Where Ansible genuinely wins against Terraform is the configuration management + provisioning in one pass. With Terraform you provision the instance, then you need something else — Chef, cloud-init, a separate Ansible run — to actually configure it. With Ansible, you provision the EC2 instance in task 1, grab its IP from `ec2_result`, add it to the in-memory inventory, and then run your entire app setup in tasks 2 through 50. One tool, one run, one log file to read when it breaks. That reduction in moving parts matters more than any feature comparison matrix. The state problem is real though, and you need to go in with eyes open. Terraform tracks every resource it created in `.tfstate`. Ansible has no equivalent. If someone manually resizes that EC2 instance in the console, your playbook has no idea. The `--check` flag gives you a dry-run that shows what _would_ change, and I run it before every production play as a sanity check: # Always do this before touching prod ansible-playbook provision.yml --check --diff # --diff shows you line-level changes on files/templates # --check won't actually change anything But `--check` only tells you what the playbook would do — it doesn't scan your entire cloud account for drift the way `terraform plan` does. To actually detect drift you'd need to integrate something like AWS Config or write explicit assertion tasks. For teams that need strict drift detection, this is a genuine gap. For teams that mostly want reproducible provisioning and are disciplined about not clicking around in the console, it's fine. I reach for Ansible now in two specific scenarios: bootstrapping bare metal or VMs where Terraform would need a provider with spotty support and I'd spend more time fighting HCL than solving the problem, and when I need to run post-provisioning commands on resources that already exist. If I've already got 20 servers from a previous Terraform run and I need to rotate credentials, update a config file, and restart a service across all of them, Ansible is the right tool. Terraform never had a real answer for that. Writing a `null_resource` with a bunch of remote-exec provisioners is one of the worst experiences in modern infrastructure tooling, and I'll die on that hill. ## Side-by-Side Comparison: What Actually Matters When Picking One The most useful thing I can tell you before this comparison: the "best" tool here is almost entirely determined by two constraints — what your team already knows, and whether you're running Kubernetes. Everything else is secondary. I've seen teams pick Pulumi because it "supports real languages" and then spend weeks fighting state backend decisions they weren't ready for. Pick wrong and you're not just rewriting HCL, you're migrating state files under production pressure. Tool License State Management Learning Curve Cloud Provider Support Drift Detection Best For **OpenTofu** MPL-2.0 S3, GCS, Azurerm backends Low (HCL) All major providers Yes — `tofu plan` HCL teams migrating off Terraform **Pulumi** Apache 2.0 Pulumi Cloud or self-hosted Medium (depends on language) All major providers Yes — `pulumi preview` Teams that want real programming languages **Crossplane** Apache 2.0 Kubernetes etcd High AWS/GCP/Azure via providers Yes — controller reconciles continuously Platform engineering / GitOps-native teams **Ansible** GPL-3.0 Stateless (no tracking built-in) Low–Medium Very broad via modules No native drift detection Mixed provisioning + config management The drift detection column looks similar across three of these tools, but the _mechanism_ is completely different. `tofu plan` and `pulumi preview` are point-in-time checks — you run them, you see drift, you act. Crossplane is continuous reconciliation: if someone manually deletes an RDS instance, the controller notices within seconds and recreates it. That's not just "drift detection", that's enforced desired state. If you care about someone fat-fingering a console change at 2am and having it silently persist, Crossplane is the only one on this list that actually prevents it rather than reports it. The dealbreaker column is where I'd spend the most time before committing: - **OpenTofu** — Genuinely no dealbreaker for existing HCL users. If you're on Terraform today, `tofu init` just works against your existing configs. The migration cost is basically a find-replace of the binary name in your CI pipelines. - **Pulumi** — You _must_ decide on a state backend before you write a single line of real infrastructure. Pulumi Cloud is free up to 200k resources/month but that ceiling is easier to hit than you'd think across multiple stacks. Self-hosting with S3 + a custom passphrase works, but the setup is not as clean as S3 backend in Terraform. I've seen teams defer this decision and then scramble to migrate state three months later. - **Crossplane** — There is no path to running Crossplane without Kubernetes. Not "it's harder without k8s" — it literally requires a cluster because it _is_ a set of Kubernetes controllers. If you don't already have a cluster, you're now also in the business of running one just to provision infrastructure. That's a serious ops tax. - **Ansible** — No state file means Ansible has no memory. It'll run your playbook, provision an EC2 instance, and if that instance gets deleted or modified, Ansible has zero idea. You're flying blind unless you build your own inventory reconciliation. For pure config management it's fine; for cloud provisioning where drift tracking matters, this is a real gap. My honest recommendation based on team type: if your team writes Python or TypeScript daily and you're building non-trivial infrastructure with loops, conditionals, and shared logic, Pulumi's language support genuinely pays off over HCL. I switched a side project to Pulumi TypeScript after hitting the limits of HCL's `for_each` and the difference in readability was immediate. But if you're a solo ops engineer or a small team that just wants to stop paying for Terraform Cloud, OpenTofu is the correct answer — drop it in, nothing breaks, you move on with your life. The Crossplane path only makes sense if your org is already Kubernetes-native and you want infrastructure to live in the same GitOps loop as your app manifests. ## When to Pick What: Matching Tools to Actual Situations The tool that burned me the most was choosing Ansible for a multi-cloud setup where I actually needed real dependency graphs. I spent two weeks writing `wait_for` tasks and polling loops that Pulumi would have handled in 15 lines of Python. So here's the honest breakdown by situation, not by feature list. **You have an existing Terraform codebase and just want off the BSL** — OpenTofu, no question, migrate this afternoon. Drop in the `tofu` binary, update your CI to call `tofu plan` instead of `terraform plan`, and you're done. Your `.tf` files, your state files, your provider registry calls — all compatible. The OpenTofu 1.7+ fork has already diverged enough to add things Terraform hasn't (client-side encryption for state, for one), but nothing breaks. The migration looks like this: # Replace the Terraform binary in your pipeline brew install opentofu # or use the installer script # Your existing state works as-is tofu init tofu plan # exact same output you're used to # If using Terraform Cloud, switch the backend block # to S3/GCS/Azure Blob — Tofu doesn't talk to TFC **Greenfield project, your team writes Python daily** — Pulumi with the Python SDK and an S3 backend. Don't let anyone convince you to learn HCL when your engineers already think in Python. The thing that caught me off guard here was how much better Pulumi handles secrets — `pulumi config set --secret db_password` encrypts it at rest automatically, and you can swap in AWS KMS or Vault as the secrets provider with one config line. Set the backend in `Pulumi.yaml`: name: my-infra runtime: python backend: url: s3://my-state-bucket/pulumi # works with any S3-compatible storage **You're building an internal developer platform** where teams request infrastructure the way they request Kubernetes namespaces — Crossplane is the right answer, but budget 2–3 sprints for setup, minimum. You'll spend real time writing Compositions, figuring out how XRDs (Composite Resource Definitions) map to what your teams actually want to expose, and wiring up RBAC so a dev team can create a `PostgreSQLInstance` custom resource without getting access to your AWS credentials. The payoff is huge — devs get a self-service API backed by real cloud resources, no tickets, fully GitOps-able. But if you think you'll have this running in a week, I've got bad news. **Small team, mixed cloud + on-prem, and you need config management too** — Ansible. Accept the state limitation upfront and design around it. Use tags aggressively, write idempotent tasks that check before they act, and use the `--check` flag in CI to catch drift before applying. Ansible's real strength here is that it talks to cloud APIs _and_ SSH targets with the same playbook runner, which nothing else on this list does cleanly. Just don't use it to manage resources with complex dependencies — the ordering becomes your problem, not the tool's. **Multi-cloud with complex dependencies and conditional logic** — Pulumi, and the for-loops alone will save you. I've written HCL dynamic blocks and Terraform count hacks that made reviewers visibly uncomfortable. Pulumi lets you express "create one subnet per AZ, tag it based on whether it's private, then attach different route tables conditionally" in code that a Python developer reads without squinting: import pulumi_aws as aws azs = aws.get_availability_zones() subnets = [] for i, az in enumerate(azs.names[:3]): subnet = aws.ec2.Subnet(f"subnet-{i}", vpc_id=vpc.id, cidr_block=f"10.0.{i}.0/24", availability_zone=az, # conditional tagging based on index — try this in HCL tags={"Type": "private" if i > 0 else "public"} ) subnets.append(subnet) **Regulated environment needing audit trails and GitOps** — Crossplane with Flux or ArgoCD watching your Compositions. Every infrastructure change is a Git commit. ArgoCD sees the drift between desired state in Git and actual state in Kubernetes, Crossplane reconciles that down to AWS/GCP/Azure. Your auditors get a complete commit history for every resource change, timestamps, author, PR approval chain — all from standard Git tooling. The one gotcha: make sure you configure `deletionPolicy: Orphan` on your Crossplane Managed Resources during initial setup, or a botched Composition update can cascade-delete real cloud resources. I learned that one in staging, thankfully not production. ## Migration Path: Moving Off Terraform Without Breaking Everything The thing that catches most teams off guard isn't the tool swap — it's discovering their Terraform state is messier than they thought. Before you run a single command for any new tool, do this: # Get a complete picture of what you're managing terraform state list > resources.txt # Also export the full state for backup terraform state pull > terraform.tfstate.backup # Count what you're dealing with wc -l resources.txt I've seen teams skip this step and end up in a split-brain situation where their new tool doesn't know about half the cloud resources it's supposed to manage. The `resources.txt` file becomes your migration checklist. Cross things off as you verify them in your new tool's state. If you have 200+ resources, block out a week — this is not an afternoon task. ### OpenTofu: The Easiest Path by Far OpenTofu is a near-identical fork of Terraform 1.5.x, so the migration is mostly mechanical. Copy your state file, update your backend config, and run three commands: # Copy the existing state (never work directly on prod state) cp terraform.tfstate terraform.tfstate.tofu-migration-backup # Update your backend block — example for S3 backend # In your main.tf, change nothing except which binary reads it. # OpenTofu reads Terraform state files natively. tofu init -reconfigure tofu plan If `tofu plan` shows zero diff, you're done with the migration itself. If it shows changes, stop and figure out why before touching anything. In my experience the most common surprise is provider version constraints — OpenTofu has its own registry at `registry.opentofu.org`, and some third-party providers need their source updated from `registry.terraform.io`. Your `.terraform.lock.hcl` will need regenerating too. Run `tofu providers lock` after init. ### Pulumi: The Import Rabbit Hole Pulumi doesn't read Terraform state, so every existing resource needs to be imported. The command pattern looks clean in the docs: # Import an existing S3 bucket into Pulumi state pulumi import aws:s3/bucket:Bucket my-bucket my-actual-bucket-name # For something like an RDS instance pulumi import aws:rds/instance:Instance prod-db db-identifier-in-aws Here's the honest problem: `pulumi import` generates boilerplate code, but it's rarely production-ready. The generated TypeScript or Python will have every single property set to its current value — including things like tags, minor version numbers, and ARN references that will immediately drift or conflict. For a stack with 50+ resources, plan to spend several days cleaning this up. The generated code is a starting point, not a finish line. I'd recommend importing resources in logical groups (all S3 first, then IAM, then compute) rather than trying to do everything at once. ### Crossplane: No Magic, Just Work Crossplane is the hardest migration target because there's no `crossplane import` command. Existing AWS/GCP/Azure resources need to be represented as Managed Resource CRDs, and you have to create those manifests manually and then tell Crossplane to adopt the existing resource rather than create a new one. The pattern for adoption looks like this: apiVersion: s3.aws.upbound.io/v1beta1 kind: Bucket metadata: name: my-existing-bucket annotations: # This tells Crossplane to adopt, not create crossplane.io/external-name: my-actual-bucket-name-in-aws spec: forProvider: region: us-east-1 providerConfigRef: name: aws-provider Apply that manifest and Crossplane will observe the real resource and import it into its state. But you need to get the spec correct first — if it doesn't match reality, Crossplane will try to reconcile the diff, meaning it might modify or break your live resource. Test every import in a staging environment. Crossplane migration from Terraform is a multi-sprint effort for anything beyond a toy stack. ### State Locking and CI/CD: Verify Before You Ship State locking is non-negotiable before going to production with any of these tools. Both OpenTofu and Pulumi support the S3 + DynamoDB locking pattern that Terraform teams are already familiar with: # OpenTofu backend config — identical to Terraform terraform { backend "s3" { bucket = "my-tfstate" key = "prod/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-lock" encrypt = true } } # Pulumi uses the same S3 backend for state # Set via env or pulumi login pulumi login s3://my-tfstate Pulumi's S3 backend doesn't use DynamoDB for locking — it uses a separate lock file in S3 itself. That works, but it's not the same guarantee as DynamoDB's conditional writes. For CI/CD, the swap is mostly mechanical: install the new binary in your pipeline, update the command names, and keep plan output posting to PR comments. The pattern is identical regardless of tool — run plan on PR, apply on merge to main. Here's the OpenTofu swap for a GitHub Actions workflow:
name: Install OpenTofu run: | curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash -s -- --install-method deb tofu version - name: Plan run: tofu plan -out=plan.tfplan 2>&1 | tee plan-output.txt - name: Post plan to PR uses: actions/github-script@v7 with: script: | const output = require('fs').readFileSync('plan-output.txt', 'utf8'); github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: \\\n${output.slice(0, 60000)}\n\})One thing I'd add: don't migrate CI/CD and state in the same PR. Do the state migration first, verify it manually, then update the pipeline. Doing both at once means when something breaks — and something will — you won't know which change caused it. ## What I Actually Run Today My stack isn't a single tool, and I stopped pretending it should be. Here's what's actually running in production today across a team of four engineers managing maybe 15 environments across two cloud accounts. **OpenTofu handles everything we inherited.** We had about 80k lines of HCL across 30-odd modules when HashiCorp flipped the BSL switch, and OpenTofu was a straight drop-in — I ran the migration in an afternoon, changed the binary in our CI config, and nothing exploded. State lives in S3 with DynamoDB locking, same as it always did. GitHub Actions calls it with OIDC authentication so we're not storing long-lived AWS credentials anywhere. The CI job looks roughly like this:- name: OpenTofu Plan uses: opentofu/setup-opentofu@v1 with: tofu_version: 1.7.2 - run: | tofu init \ -backend-config="bucket=${{ vars.TF_STATE_BUCKET }}" \ -backend-config="dynamodb_table=${{ vars.TF_LOCK_TABLE }}" \ -backend-config="region=us-east-1" tofu plan -out=plan.tfplan tofu apply plan.tfplanZero drama after four months. The OpenTofu registry is a solved problem now, provider compatibility is solid, and the community has been backporting fixes faster than I expected. The honest trade-off: you're still writing HCL, and HCL still has the samefor_eachlimitations that made you want to flip a table in 2022. But if you have existing modules, the migration cost is low enough that it's the obvious choice. **Pulumi with TypeScript is what I reach for on greenfield work.** The thing that sold me wasn't the "real language" pitch — it was debugging a loop that dynamically created 12 S3 buckets with different lifecycle policies and not once having to google "how do you do a conditional in HCL." Self-hosted state goes to GCS with a bucket I created manually (yeah, I know) and protected with a lifecycle rule that retains the last 30 state versions. The state backend config is just a URL:gs://my-pulumi-state/infra. Pulumi's free SaaS backend would work fine here too, but I didn't want another vendor dependency after the Terraform situation. The gotcha I hit early: Pulumi's resource graph is implicit from your code, so if you accidentally create a circular dependency with a carelessapply()call, the error message is confusing as hell. Budget an hour for that the first time it bites you. **Ansible still lives in this stack and I'm not embarrassed about it.** Post-provisioning configuration, bare metal workers, anything where I need to SSH into a box and actually check that a service is running — that's Ansible. Trying to do all of that in Pulumi or OpenTofu with remote-exec providers is a lesson in pain I learned the hard way in 2023. The right mental model: IaC tools provision the resource, Ansible configures what's inside it. My Ansible runs trigger from the same GitHub Actions workflow, after the OpenTofu/Pulumi apply step finishes, using dynamic inventory pulled from AWS tags. **Crossplane was a serious evaluation, not a dismissal.** I spent three weeks building a proof-of-concept with Crossplane 1.15, running XRDs for our RDS and VPC patterns. The abstractions were genuinely good — platform teams at larger orgs clearly benefit from having their Kubernetes-native Compositions. But the operational reality for a four-person team is that you now own a control plane that needs upgrades, provider pods that OOM if you're not careful, and a debugging experience that requires knowing both Kubernetes and the cloud provider API when something goes wrong. That's a lot of yak-shaving when I can just runtofu apply. Revisiting this if we ever hire dedicated platform engineers or hit the point where application teams need self-service infra without involving me directly. The rule I enforce now: mix tools across environments freely, never mix them within a single environment's state boundary. We had a brief experiment where one environment was half-managed by Pulumi and half by OpenTofu, sharing the same VPC. The drift detection conflicts were maddening — both tools thought they owned certain tags and kept overwriting each other on every apply. Hard boundary now: one tool owns one environment, end of story. If you're migrating an environment from one tool to another, you do a clean handoff — import the state into the new tool, delete it from the old one, don't run them in parallel against the same resources. * * * _**Disclaimer:** This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content._ ```

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

DEV Community