DEV Community: James Joyner

Migrating from Terraform to OpenTofu: A Low-Risk Playbook

James Joyner — Sun, 19 Jul 2026 19:30:09 +0000

I've migrated a few real environments from Terraform to OpenTofu now, and the good news is that a careful migration is almost boring. The state format is compatible, the CLI is a near drop-in, and the whole thing can be done with a rollback path at every step. The bad news is that "almost boring" still has a couple of sharp edges, and the teams that get hurt are the ones who skip the parity check and go straight to apply. Here's the calm, low-risk playbook I actually follow.

Step 0: Know what "low-risk" means here

The core insight that makes this safe: OpenTofu reads the same HCL and the same state file that Terraform does. A migration is not a rewrite — it's swapping which binary talks to your existing state. That means at almost every step, your rollback is just "keep using the terraform binary." As long as you don't trigger a one-way-door feature (more on those later), you can walk back.

So the whole strategy is: prove parity before you change anything real, change one thing at a time, and keep the old binary installed until you're confident.

Step 1: Check and pin your Terraform version first

Before you touch OpenTofu, get your current setup deterministic. OpenTofu forked from the last MPL-licensed Terraform, so very old or very new Terraform configs can have edges. Find out exactly what you're running:

terraform version

Pin it. If you're not already using a version manager or a pinned CI image, do that now — you want a fixed, known-good Terraform baseline to compare against and to fall back to. Also pin your provider versions in a lockfile:

terraform providers lock

A migration where both the tool version and the provider versions are floating is a migration where you can't tell what caused a diff. Lock everything down first.

Step 2: Install `tofu` alongside, not instead

Install OpenTofu without removing Terraform. On a workstation or a scratch CI runner:

# Verify the binary is there and note the version
tofu version

Keep both terraform and tofu on PATH during the migration. You'll be running them back to back to compare, and having both is what makes rollback trivial. As of 2026 the install methods and current versions are in the OpenTofu docs — check them rather than trusting a version number from a blog post (including this one).

Step 3: The parity check — init and plan, no apply

This is the heart of the whole exercise. Work on a copy or a non-production workspace first. Point OpenTofu at your existing configuration and existing state, initialize, and produce a plan — but do not apply.

# Fresh working dir state, same backend/state as before
tofu init

# The critical test: does OpenTofu see zero changes?
tofu plan -out tofu.plan

What you want to see is a clean, no-changes plan. If OpenTofu reads your Terraform-written state and reports that nothing needs to change, you have parity. That's the green light.

If the plan shows drift, stop and read it carefully before doing anything. Common causes I've hit:

Provider version differences. OpenTofu resolved a slightly different provider than your pinned Terraform lockfile. Reconcile the versions.
Registry source differences. OpenTofu uses its own registry; a provider or module might resolve from a different source. Verify the provider actually publishes where OpenTofu looks.
A genuinely different interpretation of some config. Rare, but read the diff — do not apply your way past it.

Do this parity check per module/workspace, not once globally. State lives per-workspace, and a clean plan in one doesn't guarantee a clean plan in another.

Step 4: Apply once, deliberately, in a safe place

Once you've got a clean plan in a non-prod workspace, run the apply there so OpenTofu writes state at least once:

tofu apply tofu.plan

Even a no-op apply may rewrite state metadata. That's fine and expected — but it's the moment worth noting, because after OpenTofu writes state, that workspace's state has been touched by tofu. Terraform can generally still read it, but this is the point where you start treating that workspace as "OpenTofu-managed." Do it somewhere you can afford to be wrong before you do it in production.

Step 5: Swap CI, one pipeline at a time

Now change the automation. In your CI config, this is usually as small as swapping the binary and the command name:

# Before
terraform init && terraform plan -out plan.tfout

# After
tofu init && tofu plan -out plan.tfout

Roll it out per-pipeline, lowest-stakes environment first. Keep the plan-review gate in your pipeline — a human or a required approval looking at the plan output — for the first few runs on each environment. The whole point of a slow rollout is that if OpenTofu ever produces a plan you didn't expect, you catch it at plan time, not after apply.

I also recommend keeping a terraform-based fallback job available (even if disabled) during the transition, so reverting CI is a one-line change rather than an archaeology project.

Step 6: Watch for the one-way doors

Everything above is reversible as long as your config stays compatible with both tools. The way you lose your rollback is by adopting an OpenTofu-only feature. The big ones to be aware of:

Native state/plan encryption. Once OpenTofu encrypts your state, stock Terraform can't read it. This is a feature you may want — but adopt it as a deliberate, post-migration decision, not mid-migration.
Early variable evaluation in backend blocks or module sources. Configs that rely on it won't parse under Terraform.
.tofu / .tofu.json override files and provider-defined functions via the provider:: namespace. Both are OpenTofu-specific surface area.

My rule during the migration window: change the tool, not the config. Keep your HCL dual-compatible until every environment is on OpenTofu and stable. Only then start adopting the divergent features — and when you do, understand you're closing the door behind you. If you want more detail on the specific compatibility gotchas and error messages these features throw, I keep a running set of OpenTofu troubleshooting notes from real migrations.

Step 7: How to actually roll back

If something goes wrong before you've crossed a one-way door, rollback is genuinely simple:

Switch the binary back. In CI and locally, tofu becomes terraform again.
Re-init with Terraform so its lockfile and provider selections are in place: terraform init.
Run a plan and confirm a clean, no-change result: terraform plan.
Restore state from backup only if you actually corrupted or encrypted it. This is why you keep versioned state — an S3 bucket with versioning, or whatever your backend offers, so you can retrieve the pre-migration state object.

Back up your state before you start. A cheap tofu state pull > backup.tfstate (or the Terraform equivalent) before the first apply gives you a plain escape hatch. I've never had to use it on a careful migration, but the whole reason the migration feels calm is that the backup exists.

Takeaway

Migrating from Terraform to OpenTofu is mostly a swap, not a rewrite, and the parity check is what makes it safe: prove OpenTofu reads your existing state with a clean plan before you change anything real. Pin your versions, keep both binaries installed, roll CI out one environment at a time, and don't adopt one-way-door features until you're fully migrated and stable. Do it in that order and the scariest part of the whole thing will be how uneventful it is.

OpenTofu vs Terraform in 2026: What Actually Changed

James Joyner — Sat, 18 Jul 2026 14:30:45 +0000

I've been running both Terraform and OpenTofu across production infra for a while now, and the number one question I still get is some version of "wait, aren't they the same thing?" The honest answer in 2026 is: they share a common ancestor and a lot of DNA, but they are no longer the same tool. Here's what actually changed, from someone who has to keep both green in CI.

A very short history of the fork

If you missed the drama: HashiCorp relicensed Terraform from the MPL open-source license to the Business Source License (BSL) in 2023. The BSL is source-available but not OSI-approved open source, and it carries a use restriction aimed at competitors. A chunk of the community, backed by a group of vendors and users, forked the last MPL-licensed Terraform codebase. That fork landed under the Linux Foundation as OpenTofu.

So the core distinction is governance, not features: OpenTofu is a Linux Foundation project with open governance and an MPL-2.0 license, and Terraform is a HashiCorp product under the BSL. That licensing split is the reason a lot of teams looked at OpenTofu at all — if your legal team is nervous about the BSL's use restriction, an actual open-source license is the whole ballgame.

The CLI and state are (mostly) drop-in compatible

The thing that makes OpenTofu practical to adopt is that it started as a literal fork. The binary is tofu instead of terraform, and for a lot of everyday work it behaves identically:

tofu init
tofu plan -out plan.tfout
tofu apply plan.tfout
tofu state list

It reads your existing .tf files, understands the same HCL, and uses the same state file format. On real infra I've pointed tofu at a state file that was last touched by Terraform and had it produce a clean, no-change plan. That parity is not an accident — keeping the state format compatible is what makes migration a low-drama exercise rather than a rewrite.

But "mostly compatible" is doing real work in that sentence. The two projects have been diverging since the fork, and the gap widens with every release. Treating them as interchangeable is where teams get burned, so let's talk about the divergences that actually matter.

The real divergences in 2026

State and plan encryption, natively

This is the feature I care about most. OpenTofu ships native state and plan encryption. You configure it directly in your OpenTofu configuration, pick a key provider (PBKDF2 with a passphrase, a cloud KMS, and so on) and a method, and OpenTofu encrypts state at rest — including the plan file, which can leak secrets just as badly as state.

terraform {
  encryption {
    key_provider "pbkdf2" "mykey" {
      passphrase = var.encryption_passphrase
    }

    method "aes_gcm" "default" {
      keys = key_provider.pbkdf2.mykey
    }

    state {
      method = method.aes_gcm.default
    }

    plan {
      method = method.aes_gcm.default
    }
  }
}

In the Terraform world you traditionally solved this at the backend level — a KMS-encrypted S3 bucket, restrictive IAM, and hoping nobody terraform shows a plan file into a CI log. OpenTofu moves encryption into the tool itself. If you handle regulated data, this alone can justify the switch.

Early variable evaluation

For years the answer to "why can't I use a variable in my backend block?" was a shrug. OpenTofu added early variable evaluation, which lets you use variables (and locals) in places that used to demand static literals — most notably backend configuration and module sources. That means you can drive your backend bucket or key by variable instead of maintaining a wall of -backend-config flags or partial-backend hacks.

It's genuinely useful, and it's also a one-way door: a config that relies on early eval in a backend block won't parse cleanly under stock Terraform. Keep that in mind before you sprinkle it everywhere.

Provider-defined functions

Both ecosystems moved toward letting providers ship their own functions rather than waiting for the core team to add every string-munging helper. In OpenTofu you call them through the provider:: namespace:

locals {
  parsed = provider::aws::arn_parse(var.role_arn)
}

The exact set of available functions depends on the provider version you're pinning, so I won't quote a catalog — check the current provider docs. The point is that the language surface is no longer frozen to whatever core ships.

`.tofu` and `.tofu.json` override files

This is a small feature with big ergonomic payoff. OpenTofu recognizes .tofu and .tofu.json files, and it prefers them over the equivalent .tf/.tf.json when both exist. That gives you a clean way to keep a shared codebase that runs under both tools: keep the common config in .tf, and drop OpenTofu-specific overrides in .tofu files that Terraform simply ignores.

main.tf          # shared, runs under both
backend.tofu     # OpenTofu-only overrides, invisible to terraform

If you're maintaining a module that needs to support both tools during a transition, this is the mechanism that keeps you sane.

The registry

OpenTofu runs its own provider and module registry rather than depending on HashiCorp's. In practice most of the popular providers are mirrored and resolve fine, but the source of truth is different, and provider/module availability is something to actually verify rather than assume. If you have a niche or internal provider, confirm it publishes where OpenTofu looks for it.

A note on version numbers

I'm deliberately not quoting exact version numbers or "OpenTofu is X% faster" benchmarks, because those age badly and half the ones you'll see online are made up. As of 2026 both projects are shipping regularly and the feature sets keep moving — treat any specific version claim (including mine) as something to verify against the current docs before you build a decision on it.

So which do you pick, and should you switch?

Here's my honest take.

If you're starting green today, I'd default to OpenTofu. You get a real open-source license, native state/plan encryption, and the divergent features are mostly additive quality-of-life wins. The compatibility story means you lose almost nothing by choosing it.

If you're an existing Terraform shop, the calculus is about your actual pain. Switch if the BSL license is a genuine legal or procurement problem, or if native state encryption solves a compliance requirement you're currently duct-taping. Don't switch just to be on the trendy side of a fork — a migration is still real work and real risk, even when it's low-risk.

If you depend on Terraform Cloud / HCP-specific features or a paid workflow tightly coupled to HashiCorp's platform, weigh that integration honestly. OpenTofu is the engine, not the whole platform, and you'll be assembling backend, state, and workflow pieces yourself or via third-party platforms.

Whatever you choose, the one thing I'd avoid is drifting into using divergent features by accident and then being surprised you can't go back. If you want a cross-referenced dive into the specific errors and edge cases I hit while running OpenTofu on real clusters, I keep my OpenTofu troubleshooting guides updated as I trip over new ones.

Takeaway

OpenTofu and Terraform are close cousins, not twins. The compatibility is real enough that adoption is cheap, but the divergences — state/plan encryption, early evaluation, provider-defined functions, .tofu overrides, and a separate registry — are real enough that you should choose deliberately and know which one-way doors you're walking through. Pick based on your license posture and your compliance needs, pin your versions, and read the current docs before betting on any specific feature.

The OpenTofu Errors You'll Actually Hit (and How to Fix Them Fast)

James Joyner — Fri, 17 Jul 2026 23:14:11 +0000

Every OpenTofu user hits the same wall of errors eventually, usually at the worst possible moment — mid-deploy, in CI, with a teammate waiting. After enough tofu apply runs on real infra I've learned that most of these have a fast, deterministic fix once you recognize the message. This is the field guide I wish I'd had: the errors you'll actually see and the shortest path out of each.

The CLI is tofu (OpenTofu is the Linux Foundation fork of Terraform), but almost all of these apply identically to terraform if you're still on it.

1. State lock: "Error acquiring the state lock"

You'll see a ConditionalCheckFailedException or a lock ID dump. It means a previous run died without releasing the lock, or someone is genuinely running apply right now.

First, make sure nobody is actually applying. Then force-unlock with the ID from the error:

tofu force-unlock 1a2b3c4d-5e6f-7890-abcd-ef1234567890

Do not reach for -force flags or delete the DynamoDB lock item by hand unless force-unlock refuses. Ninety percent of the time a stale lock from a crashed CI job is the cause.

2. Provider checksum mismatch / dependency lock

Error: registered checksum for provider ... does not match

Your .terraform.lock.hcl was generated on one platform (say, macOS arm64) and CI runs on linux amd64, so the recorded hashes don't cover the platform being used. The fix is to record hashes for all the platforms your team and CI use:

tofu providers lock \
  -platform=linux_amd64 \
  -platform=darwin_arm64 \
  -platform=linux_arm64

Commit the updated .terraform.lock.hcl. If you legitimately upgraded a provider and want to accept the new hash, tofu init -upgrade regenerates it. Never delete the lock file to "fix" this — that just moves the problem to the next person.

3. "Backend initialization required, please run tofu init"

The backend config changed, a new module was added, or you're in a fresh checkout. This is not a real error, just an uninitialized working directory:

tofu init

If you changed backends (e.g. local to S3), you'll need to migrate state:

tofu init -migrate-state

And if init complains the backend config differs from what's cached and you want to blow away the cached backend, tofu init -reconfigure. Use -migrate-state when you want to keep state, -reconfigure when you want to point at a fresh one.

4. "Invalid for_each argument" — unknown values

Error: Invalid for_each argument
The "for_each" value depends on resource attributes that cannot be
determined until apply.

This is the single most common structural error I see. for_each needs to know its keys at plan time, but you fed it a value that only exists after another resource is created — an ARN, a generated ID, a computed name.

The fix is to key the map on something static. Use the input you control, not the computed output:

# BAD: keys depend on a created resource's attribute
resource "aws_route53_record" "r" {
  for_each = { for s in aws_subnet.this : s.id => s }
}

# GOOD: key on the static input you already know
resource "aws_route53_record" "r" {
  for_each = var.subnet_defs   # a map you defined
}

If you truly can't avoid it, a -targeted apply to create the upstream resource first, then a normal apply, is the escape hatch — but restructuring the keys is the real fix.

5. "Unsupported argument" / "Unsupported attribute"

Error: Unsupported argument
An argument named "foo" is not expected here.

Two usual causes. Either the provider version changed and renamed/removed the argument, or you're referencing an attribute that doesn't exist on that resource. Check the exact version you have and its docs:

tofu version
tofu providers

If a provider upgrade renamed things, pin the version you were on while you migrate:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }
}

For "Unsupported attribute," run tofu state show <address> on the resource to see exactly which attributes it actually exposes — I've wasted real time guessing at attribute names that the provider simply renamed between versions.

6. Dependency cycle

Error: Cycle: module.a.aws_x.foo, module.b.aws_y.bar

Two resources (or modules) reference each other, directly or through a chain, so OpenTofu can't order them. Visualize it instead of squinting at HCL:

tofu graph | dot -Tsvg > graph.svg

The fix is to break the loop. Usually one of the two references can be replaced with a static value, moved into a separate resource (like an aws_security_group_rule broken out of the group), or resolved by passing a value in as a variable rather than reading it back. Self-referential security groups are the classic offender — split the ingress rule into its own resource.

7. "Provider produced inconsistent final plan"

Error: Provider produced an inconsistent final plan
... produced an invalid new value for .some_attr ...

This is a provider bug, not your HCL — the value the provider promised at plan time didn't match apply time. It's rarely something you can fix in config directly. Fastest mitigations, in order:

First, upgrade the provider — these are frequently patched:

tofu init -upgrade

If that doesn't help, tell OpenTofu to stop tracking the flapping computed attribute with ignore_changes, or drop lifecycle { ignore_changes = [some_attr] } on the resource. As a last resort, tofu apply -replace=<address> forces a clean recreate so the provider computes the attribute fresh. If it persists, it's worth an upstream issue with the provider.

8. Registry service discovery failure

Error: Failed to query available provider packages
could not connect to registry.opentofu.org

Network, proxy, or registry outage. First confirm it's reachable:

curl -sSf https://registry.opentofu.org/.well-known/terraform.json

If you're behind a corporate proxy, set HTTPS_PROXY and NO_PROXY before init. If the public registry is flaky or you need reproducible CI, configure a provider mirror and point OpenTofu at local packages:

tofu providers mirror ./tofu-mirror

Then reference that directory with a provider_installation { filesystem_mirror { ... } } block in your CLI config. A mirror also inoculates you against the next registry hiccup, which is why every serious CI pipeline I run has one.

When the message isn't in this list

Some errors are stack- or provider-specific and don't have a one-line fix. When I hit one that isn't obvious, I check my OpenTofu error library for the exact message before I start guessing, because reading the actual failure text carefully almost always beats trial-and-error re-applies.

Takeaway

Most tofu errors fall into a handful of buckets: locking (force-unlock), lock-file/checksum drift (providers lock with all platforms), uninitialized dirs (init / -migrate-state), unknown-value for_each (key on static inputs), version-renamed arguments (pin and read state show), cycles (tofu graph, then break the loop), provider plan bugs (init -upgrade / -replace), and registry outages (mirror). Learn to recognize the message and the fix is usually one command away. Keep a mirror and a pinned lock file and you'll pre-empt half of these before they ever fire.

The 12 DevOps Errors That Page Teams Most (And the First Thing to Check)

James Joyner — Wed, 15 Jul 2026 14:58:06 +0000

Over the last while I've been cataloguing production DevOps errors — the exact strings that show up in logs at 2 a.m. — and writing a fix for each one. A pattern jumps out fast: a small number of errors account for a huge share of the pages. Here are the twelve that come up most, with the one-thing-to-check-first for each.

None of these are exotic. That's the point. The stuff that actually pages you is rarely exotic — it's the same dozen failure modes wearing different hats.

1. `CrashLoopBackOff`

The pod started, died, and Kubernetes is now backing off between restarts. CrashLoopBackOff is a symptom, never a cause. Go straight to kubectl logs <pod> --previous — the logs from the crashed container are where the real error lives. Nine times out of ten it's a bad config value, a missing env var, or a failed migration on startup.

2. `ImagePullBackOff` / `ErrImagePull`

Kubernetes can't pull the image. Don't guess — kubectl describe pod spells it out in Events. It's almost always a typo in the tag, a missing imagePullSecret, or a registry rate limit.

3. `OOMKilled` (exit code 137)

This is not "the node ran out of memory." It's "this container hit its own cgroup memory limit and the kernel killed it." Different problem, different fix. Compare the pod's resources.limits.memory against what it actually uses (kubectl top pod) before you touch anything at the node level.

4. `No space left on device`

The classic — and the trap is when df -h shows free space anyway. Then it's one of two things: you're out of inodes (df -i), or a process is holding a deleted-but-still-open file (lsof +L1). rm won't reclaim that space until you restart the process holding the file descriptor.

5. DNS timeouts inside pods

An external lookup that works from the node but intermittently times out inside a pod is almost always the ndots:5 search-domain cascade colliding with a conntrack UDP race — you get a flat 5-second stall that blows your client timeout. Overriding ndots on the pod spec and running NodeLocal DNSCache is the fix.

6. `FATAL: sorry, too many clients already` (Postgres)

Bumping max_connections is the trap, not the fix — each connection costs real memory. You need a pooler (PgBouncer), not 500 backend processes.

7. `Connection refused`

Something reached the host and nothing was listening on that port. It's rarely DNS or the network — it's the service being down, bound to 127.0.0.1 instead of 0.0.0.0, or a firewall. ss -tlnp on the target tells you in one line.

8. `TLS handshake timeout`

Usually not a cert problem at all — it's a network path problem (MTU, a proxy, or a firewall silently dropping the handshake) masquerading as TLS. Test raw connectivity first with openssl s_client -connect host:443.

9. `Read-only file system`

A filesystem that was mounted read-write and is suddenly read-only almost always means the kernel remounted it ro after detecting I/O errors. Check dmesg — you may be looking at a failing disk, not a permissions issue.

10. `Multi-Attach error for volume`

A ReadWriteOnce volume can attach to exactly one node at a time — not one pod, one node. If a node goes NotReady with the volume still attached, a pod rescheduled elsewhere gets this error. Kubernetes waits ~6 minutes before force-detaching on purpose — to protect your data from being written by two hosts at once.

11. `502 Bad Gateway`

502 means your proxy reached the upstream and the upstream said no (or died). It's rarely the proxy. connect() failed (111: Connection refused) in the NGINX error log → your app isn't listening where the proxy thinks it is.

12. `exec format error`

You built an image for one CPU architecture and ran it on another (hello, Apple Silicon → x86 clusters). Build multi-arch, or match your --platform.

The pattern

Every one of these has the same shape: the error message describes the symptom the system noticed, not the cause you need to fix. CrashLoopBackOff isn't why your pod is dying. OOMKilled isn't the node. The skill isn't memorizing fixes — it's knowing which single command turns the symptom back into a cause.

I keep a full, searchable library of these — every error above has a complete guide with the diagnostic workflow, an example root-cause analysis, and the prevention checklist. If you want the deeper version of any of them:

The Kubernetes ones live in the Kubernetes troubleshooting toolkit
Everything else is in the full error-guide library (Linux, Postgres, Docker, NGINX, and more)

What's the error that pages your team most? Curious whether it's on this list or something I should go write up next.

I Built Free Browser-Based Validators for YAML, Kubernetes and Terraform (No Upload, No Signup)

James Joyner — Tue, 14 Jul 2026 15:32:55 +0000

Every DevOps engineer has done this dance: you've got a chunk of YAML or a Terraform file that looks right, something's rejecting it, and you want a fast sanity check. So you paste it into some random online validator — and a small voice asks, wait, where did that config just go?

That config often has structure, comments, sometimes internal hostnames or resource names in it. Pasting infrastructure definitions into an unknown server is a habit worth breaking. So I built a set of validators that never send your config anywhere — they run entirely in your browser.

What they are

Free, browser-based validators for the formats DevOps folks paste-and-pray most:

YAML — catches the indentation and structure errors that make Kubernetes and CI configs fail with cryptic messages
Kubernetes manifests — schema-aware checks beyond "is it valid YAML," so you catch the wrong apiVersion or a misplaced field before kubectl apply does
Terraform / HCL — structural validation for the syntax slips that terraform validate flags only after you've context-switched away

The one design decision that matters

100% client-side. No upload, no signup, no server round-trip. Your config is parsed by JavaScript running in your own tab — it never leaves your machine. You can literally open dev-tools, watch the network panel, and see nothing go out. Turn off your wifi and they still work.

This isn't a privacy gimmick — it's the correct architecture for a tool that handles infrastructure definitions. A validator has no business seeing your config on a server it doesn't need to.

Why I bother

Two reasons, honestly.

One: I kept wanting this exact thing and kept not trusting the options. The nth time I hesitated before pasting a manifest into a stranger's website, I decided to just build the version I'd trust.

Two: fast feedback loops are the whole game in this job. The gap between "save the file" and "find out it's malformed" is pure friction — and the tighter that loop, the less of your working memory it burns. A validator that's one tab away and gives an answer in milliseconds is a small thing that compounds.

Try them

The validator workbench — YAML, Kubernetes, and Terraform, all client-side

If you're the kind of person who'd rather script it, a lot of the underlying tooling is open source — CLIs and a small read-only API for the prompt and error-guide data — over on the developer page and the GitHub org.

Client-side tools have real limits — they can't know your cluster's live state, and schema validation isn't the same as a policy check. But for the "did I just fat-finger the indentation" question, having the answer without a network request is exactly the trade I want.

What config format do you most wish had a trustworthy, offline, no-signup validator? That's genuinely how I decide what to build next.

Fix Docker Exit Code 137 (OOMKilled): Why It Happens and How to Stop It

James Joyner — Tue, 14 Jul 2026 03:21:09 +0000

Your container died and docker ps -a shows something like Exited (137) 4 minutes ago. Nine times out of ten that's the kernel's OOM killer, not your app crashing on its own. Here's what exit code 137 actually means and how I go about stopping it from happening again.

What exit code 137 actually means

Exit code 137 is 128 + 9. The 128 + part is the shell convention for "terminated by a signal," and 9 is SIGKILL. So 137 means your process was hard-killed — no chance to clean up, no graceful shutdown.

Two things commonly send that SIGKILL:

The kernel OOM killer. Your container hit its memory cgroup limit, or the whole host ran out of RAM, and the kernel picked a process to kill to stay alive.
A docker stop that timed out. Docker sends SIGTERM, waits (10s by default), and if the process is still alive it escalates to SIGKILL. That also produces 137.

Both look identical in docker ps -a. The rest of this is about the first case — OOMKilled — because that's the one that quietly recurs.

Confirm it was OOM, not something else

Before changing anything, confirm the cause. Docker records whether the OOM killer was involved:

docker inspect my-service --format '{{.State.OOMKilled}} {{.State.ExitCode}}'

If that prints true 137, you're done guessing — it was OOM. If it prints false 137, the SIGKILL came from somewhere else (most often a docker stop timeout).

For the fuller picture:

docker inspect my-service --format '{{json .State}}' | python3 -m json.tool

You can also confirm from the kernel side. The OOM killer logs every kill:

sudo dmesg -T | grep -i -E 'out of memory|oom-kill|killed process'
# or, on a systemd host:
journalctl -k | grep -i -E 'out of memory|oom-kill'

You're looking for a line like Out of memory: Killed process 12345 (java).

One distinction that changes your fix: did you hit the container's own --memory limit, or did the whole host run out of RAM? If OOMKilled is true but the host has plenty of free memory, the container hit its own cgroup limit. If the host itself was starved, the kernel may kill the biggest process regardless of which container it's in — sometimes an innocent bystander. dmesg shows the cgroup and total-vm in the kill line, which tells you which case you're in.

Why it happens

A few root causes cover most of what I see:

No --memory limit at all. The container can grow until the host is exhausted. This is the one that takes down neighbours.
A limit that's just too low for the real working set. The app was always going to need ~400 MB and you capped it at 256 MB.
A real leak. Memory climbs steadily under load and never comes back down. A limit only changes when it dies, not whether.
A runtime that ignores the cgroup limit. This is the classic. An old JVM sees the host's total RAM, sizes its heap for that, and blows past the container limit. Node has a similar story — its old-space heap defaults to roughly 1.5–2 GB regardless of the container limit unless you tell it otherwise with --max-old-space-size.
A batch job that spikes. Steady-state memory is fine, but one large request or a big file load briefly doubles it and trips the limit.

The runtime-unaware case is worth dwelling on because it surprises people: the container limit and the runtime's idea of "how much memory exists" are two different numbers, and if the runtime's number is bigger, it will happily allocate its way into an OOM kill.

How I'd diagnose it (in order)

Cheapest, least invasive first. I don't reach for a profiler until the simple checks rule things out.

Watch live usage against the limit. docker stats shows current memory and the limit side by side:

   docker stats --no-stream my-service

If MEM USAGE / LIMIT reads 254MiB / 256MiB right before it dies, you're pegged at the limit — that's your answer.

Reproduce and watch it climb. Leave docker stats streaming (drop --no-stream) while you drive load. Steady climb that never recedes points at a leak; a sharp spike on one operation points at a batch/request problem.
Check the configured limit. Confirm what the container was actually given:

   docker inspect my-service --format 'mem={{.HostConfig.Memory}} memswap={{.HostConfig.MemorySwap}}'

0 means no limit. Otherwise it's bytes.

Check the app-level heap settings. Look at how the runtime was told to size itself — JVM flags, NODE_OPTIONS, whatever applies. Mismatch between this and the container limit is a common culprit.
Only now, a profiler. If usage is legitimately high and you need to know what's holding memory, attach the language's heap profiler. This is the expensive step, so I earn my way to it.

A worked example

Say I've got a JVM service running with a tight limit:

docker run -d --name my-service --memory=256m my-registry/my-service:1.4.2

It OOMs under load. docker inspect confirms it:

docker inspect my-service --format '{{.State.OOMKilled}} {{.State.ExitCode}}'
# true 137

docker stats shows it pinned at the limit before each death. The problem is twofold: the limit is a bit low for the real working set, and the JVM isn't sizing its heap to the container.

First, give it a limit that reflects reality. I measured the steady-state working set at around 350 MB, so I'll allow headroom for the JVM's non-heap overhead (metaspace, thread stacks, off-heap buffers) on top of the heap:

docker run -d --name my-service \
  --memory=512m \
  -e JAVA_OPTS="-XX:MaxRAMPercentage=70.0" \
  my-registry/my-service:1.4.2

-XX:MaxRAMPercentage=70.0 tells the JVM to cap its heap at 70% of the container limit — leaving the other 30% for non-heap memory so the process total stays under 512 MB. On JDK 10+ container support (-XX:+UseContainerSupport) is on by default, so the JVM reads the cgroup limit rather than the host's RAM. On older JVMs you'd set an explicit -Xmx instead, but percentage-based is more robust across environments.

For a Node service the equivalent is bounding the old-space heap under the limit:

docker run -d --name my-worker \
  --memory=512m \
  -e NODE_OPTIONS="--max-old-space-size=384" \
  my-registry/my-worker:2.1.0

384 (MB) sits comfortably under the 512 MB container limit, leaving room for Node's other allocations.

The pattern in both cases: pick the container limit from measured usage plus headroom, then tell the runtime to keep its heap under that limit.

What to watch out for

Setting the limit too low just to "cap" it. If the app has a leak, a tight limit doesn't fix the leak — it converts a slow degradation into a fast crash loop. You've made it more visible, not healthier. Fine as a deliberate blast-radius guard; not fine as a substitute for fixing the leak.
No limit at all. One unbounded container can consume the host and get other containers OOM-killed. The victim in dmesg may be a service that did nothing wrong.
Swap accounting. --memory and --memory-swap are different knobs. If you set --memory=512m and leave swap unset, Docker may allow up to twice the memory in swap, which masks the real usage — the container limps along swapping instead of failing cleanly. Set --memory-swap equal to --memory to disable swap for that container when you want hard, predictable behaviour.
Measuring the wrong number. MEM USAGE in docker stats includes page cache, which can make usage look scarier than the actual anonymous (unreclaimable) memory that drives OOM decisions. Watch the trend and the kill line in dmesg rather than a single snapshot.

Making it repeatable

To stop this being a recurring surprise:

Right-size from data. Use docker stats or your metrics stack to find the real working set under load, then set --memory to that plus honest headroom.
Pin the limit and the runtime heap together. Set --memory and a matching runtime flag (-XX:MaxRAMPercentage, --max-old-space-size, etc.) so the two numbers can't drift apart.

  # docker-compose.yml
  services:
    my-service:
      image: my-registry/my-service:1.4.2
      mem_limit: 512m
      restart: on-failure
      environment:
        JAVA_OPTS: "-XX:MaxRAMPercentage=70.0"

Add a restart policy and an alert. restart: on-failure keeps you online through a transient spike, and an alert on OOMKilled events or restart count means you hear about it before your users do. Be honest about the restart policy though: it buys time, it does not fix a leak. A container that restarts every ten minutes is telling you something you shouldn't silence.

If you'd rather not reassemble all of this under pressure the next time a container flaps, I keep the reusable Docker patterns — limits, healthchecks, restart policies — as a reference set of Docker runbook patterns so it's a lookup, not an investigation.

Wrapping up

Exit code 137 is almost always a memory conversation, not a crash. The durable fix isn't a bigger number — it's making the container limit and the runtime's own idea of "available memory" agree, sized from what the app actually uses. Get those two to match and 137 stops being a mystery.

Hardening Docker Containers: The Security Habits That Actually Matter

James Joyner — Mon, 13 Jul 2026 14:38:47 +0000

Most of what people call "container security" is a short list of defaults you can set in an afternoon. The exotic tooling — admission controllers, runtime sensors, policy engines — matters at scale, but it's the polish, not the foundation. What actually moves the needle is a handful of flags and one or two Dockerfile lines, ranked here by leverage: highest-impact, lowest-effort first.

I'm going to skip the compliance framing. None of this is about passing an audit. It's about making the difference between "an attacker got code execution in the container" and "an attacker got code execution on the host" as wide as I can with defaults.

Don't run as root

This is the single highest-leverage change, so it goes first. By default a process in a container runs as UID 0. It's namespaced root, not host root — but it's still root inside, and the day someone chains a container escape (a kernel bug, a misconfigured mount, a runc CVE) with root-in-container, that's the bad day. Root inside plus an escape is root on the box. A non-root UID turns the same escape into a much smaller problem.

Set it in the Dockerfile. Create a real user, don't just USER 1000 on top of a root-owned filesystem:

FROM node:20-slim

# Create an unprivileged user and group
RUN groupadd --gid 10001 app \
 && useradd --uid 10001 --gid app --home-dir /app --no-create-home app

WORKDIR /app
COPY --chown=app:app . .
RUN npm ci --omit=dev

USER app
EXPOSE 8080
CMD ["node", "server.js"]

Two things people trip on. First, order matters: anything that needs to write to the image (installing packages, chowning files) has to happen before the USER line, because after it you're unprivileged. Second, listening on a port below 1024 as non-root fails unless you grant a capability — more on that in a second.

If you can't rebuild the image (third-party base you don't control), force it at runtime:

docker run --user 10001:10001 REDACTED-image:tag

That won't fix files inside the image owned by root that the app expects to write, which is exactly the kind of thing you find by testing. But a numeric UID with no matching entry in the container's /etc/passwd is fine for most apps and is a clean default.

Drop capabilities, add back only what you need

Even non-root containers get a default set of Linux capabilities — roughly fourteen of them, including things like CHOWN, SETUID, MKNOD, NET_RAW. Most applications use approximately none of these. The default set is far broader than a typical web service needs, and every capability you don't need is attack surface you're carrying for free.

Drop everything, then add back the specific ones the app actually uses:

docker run \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  REDACTED-image:tag

NET_BIND_SERVICE is the common one — it lets a non-root process bind a port below 1024, so you can run as UID 10001 and still listen on 80 or 443. If your app does something lower-level (raw sockets, for instance) you'll discover the missing capability as a permission error and add it back deliberately.

The honest tradeoff: --cap-drop ALL needs testing. Some images shell out to tools that quietly expect a capability, and you won't know until an unusual code path runs. Test the real workload, not just startup.

Read-only root filesystem

If an attacker gets code execution, a writable filesystem lets them drop tools, modify binaries, or persist. Make the root filesystem read-only and the whole class of "write a webshell to disk" goes away:

docker run \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  REDACTED-image:tag

Almost every app needs some writable path — usually /tmp, sometimes a cache or a PID directory. The --tmpfs mount gives you a small, in-memory writable spot without making the whole filesystem writable, and noexec means nothing dropped there can be run.

How do I find what actually needs to be writable? Run it read-only and watch it fail. The app will throw EROFS / "read-only file system" errors pointing at exact paths. Add a tmpfs or a named volume for each real one and re-test. It's iterative, and yes, some apps write all over the place and fight you — those you either fix or grant a narrow volume. But most well-behaved services need /tmp and nothing else.

no-new-privileges

Cheap, no downside I've ever hit, so it's always on:

docker run --security-opt no-new-privileges REDACTED-image:tag

This sets the kernel flag that stops a process from gaining privileges via setuid binaries. If there's a setuid-root binary lurking in the image, this prevents it from being used to escalate. Combined with running as non-root, it closes the "call sudo, become root" path. There's no reason not to set it.

Never `--privileged` (and mind the Docker socket)

--privileged is not "a bit more access." It disables most of the isolation that makes a container a container: it grants all capabilities, drops the seccomp and AppArmor profiles, and gives access to host devices. A privileged container is, for practical purposes, running on the host. If you've reached for it to make one thing work, you almost certainly wanted a single --cap-add or a specific --device instead.

The sharper edge of the same knife is mounting the Docker socket:

# Do not do this unless you fully mean it
-v /var/run/docker.sock:/var/run/docker.sock

Anything that can talk to docker.sock can start a new container mounting the host's root filesystem, and that is root on the host — full stop. CI runners and "container that manages containers" tooling do this constantly, and it's the footgun I see most often. If a workload needs it, treat that workload as if it already has host root, because effectively it does.

Secrets: not in ENV, not in layers

A sibling article covers image slimming in depth, so I'll keep this tight, but it belongs on the list. Two rules.

Don't put secrets in ENV — they're visible to anything that can run docker inspect and to every child process. And don't COPY a secret file into the image expecting a later RUN rm to erase it; the secret persists in the earlier layer forever, and anyone with the image can pull it back out.

For build-time secrets, use BuildKit's secret mount. The secret is available during that one RUN and never lands in a layer:

# syntax=docker/dockerfile:1
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN="$(cat /run/secrets/npm_token)" npm ci

docker build --secret id=npm_token,env=NPM_TOKEN .

For runtime secrets, inject them as environment or mounted files from your orchestrator or a secrets manager at run time — not baked into the image. The image should be safe to push to a registry with nothing sensitive inside it.

Pin base images by digest, and scan them

Here's the unglamorous truth: patching base images is most of your real CVE exposure. Your application code is a small target; the OS packages and language runtime underneath it are where the published, exploitable CVEs actually accumulate.

Pin the base by digest so the build is reproducible and can't drift under you:

FROM node:20-slim@sha256:REDACTEDdigest0000000000000000000000000000000000000000000000000000000000

A tag like node:20-slim moves; a digest doesn't. Then scan on a schedule and treat a bumped digest as routine maintenance, not an event:

docker scout cves REDACTED-image:tag
# or
trivy image REDACTED-image:tag

Pinning without scanning just means you've frozen your vulnerabilities in place, so the two go together: pin for reproducibility, scan to know when to move the pin.

Limit the blast radius

Resource limits aren't confidentiality controls, but they cap what a compromised or buggy container can do to its neighbors — a fork bomb or memory balloon shouldn't take out the host:

docker run \
  --memory 512m \
  --pids-limit 256 \
  REDACTED-image:tag

--pids-limit in particular is a cheap defense against fork bombs, and --memory stops one container from starving everything else on the node. Noisy-neighbor insurance, basically.

Putting it together

Here's a single docker run with the high-leverage flags stacked:

docker run \
  --user 10001:10001 \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --security-opt no-new-privileges \
  --memory 512m \
  --pids-limit 256 \
  REDACTED-image:tag

And the same thing in Compose, which is where most of this ends up living anyway:

services:
  api:
    image: REDACTED-image:tag
    user: "10001:10001"
    read_only: true
    tmpfs:
      - /tmp:rw,noexec,nosuid,size=64m
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    mem_limit: 512m
    pids_limit: 256

Start with these on your least-critical service, watch it run, and fix what breaks before you roll it out wider. Read-only and cap-drop are the two that need testing; the rest are close to free.

One habit worth building alongside this: before an image ships, run the Dockerfile through the free Dockerfile validator — it flags secrets in ENV, a missing USER, and unpinned base images, which are exactly the mistakes that survive code review because they look normal.

Wrapping up

The durable takeaway, in priority order:

Run as non-root — USER in the Dockerfile, or --user at runtime. Highest leverage, lowest cost.
--cap-drop ALL, add back only what the app needs (usually just NET_BIND_SERVICE).
--read-only root filesystem plus a small --tmpfs for the paths that genuinely need writing.
--security-opt no-new-privileges — always on, no downside.
Never --privileged, and never mount docker.sock unless that workload already effectively owns the host.
Keep secrets out of ENV and out of layers — BuildKit --mount=type=secret at build, injected at runtime.
Pin base images by digest and scan them — patching the base is most of your real CVE exposure.
Set --memory and --pids-limit to cap the blast radius.

None of this is exotic. It's a short list of defaults, and most of the security you'll ever get from a container comes from just turning them on.

Docker Networking, Demystified: Bridge, Host, and Container DNS

James Joyner — Sun, 12 Jul 2026 00:58:54 +0000

Most people meet Docker networking the same way I did: two containers that should be able to talk to each other, and one of them stubbornly refusing to connect. You did nothing wrong on the application side, the code works locally, and yet curl inside one container gives you connection refused reaching the other. The good news is that Docker networking is small once you understand three or four moving parts, and almost every real-world problem comes down to the same handful of causes.

I want to walk through how containers actually find each other and the outside world, with commands you can paste and run. No magic, just the model.

The pieces: bridges, DNS, and published ports

Every container attaches to one or more networks. On Linux, the default network is a bridge — a virtual switch (docker0) that hands each container a private IP and NATs its outbound traffic to the host. Containers on the same bridge can reach each other by IP; the host reaches them only if you publish a port.

There are three network modes worth knowing well: the default bridge, a user-defined bridge, and host. There's also none for full isolation. The single most useful thing I can tell you is: don't use the default bridge for anything real. Create your own.

Default bridge vs. user-defined bridge

Here's the difference that trips everyone up. On the default bridge, containers can talk by IP but there is no automatic name resolution. If you want app to reach db by name, you'd have to use the legacy --link flag, which is deprecated and awkward.

On a user-defined bridge, Docker runs an embedded DNS server, and every container is resolvable by its name (and by --network-alias if you set one). That one feature — automatic DNS by container name — is why you almost always want a user-defined network. You also get better isolation: only containers you explicitly attach can see each other.

Create one and look at it:

docker network create appnet
docker network ls

NETWORK ID     NAME      DRIVER    SCOPE
0a1b2c3d4e5f   appnet    bridge    local
1122aabbccdd   bridge    bridge    local
99887766ffee   host      host      local
ab12cd34ef56   none      null      local

appnet is a bridge just like the default one, but with DNS turned on for its members.

A worked example: an app talking to Redis by name

Let me make this concrete with two containers on appnet — a Redis instance and a small client — resolving each other by name.

# Redis, named so DNS has something to resolve
docker run -d --name redis --network appnet redis:7.2-alpine

# An app container on the same network
docker run -d --name app --network appnet alpine:3.19 sleep infinity

Now check that app can resolve and reach redis purely by name:

docker exec app getent hosts redis

10.0.0.5         redis

That IP (10.0.0.5 here, yours will differ) came from Docker's embedded DNS, not from anything you configured. Test the actual TCP path:

docker exec app sh -c "apk add --no-cache redis >/dev/null && redis-cli -h redis ping"

PONG

No IPs hardcoded, no --link, no /etc/hosts editing. The name redis resolves because both containers share a user-defined network. Restart Redis and it may get a new IP, but the name keeps working — which is exactly why you reference services by name and never by IP.

For contrast, run the same thing on the default bridge and the name lookup fails:

docker run -d --name redis2 redis:7.2-alpine
docker run --rm alpine:3.19 getent hosts redis2   # no --network, so default bridge

(no output — lookup fails)

Same image, same host, no DNS. That's the whole argument for user-defined networks in one command.

Publishing ports: EXPOSE is not publish

This is the other big source of confusion. Two different concepts get used interchangeably and they are not the same thing.

EXPOSE 80 in a Dockerfile is documentation. It records that the app listens on port 80 inside the container. It does not open anything to the host. You can EXPOSE a port and still be unable to reach it from your laptop.

Publishing with -p is what actually maps a host port to a container port:

docker run -d --name web -p 8080:80 nginx:1.27-alpine

The syntax is -p HOST:CONTAINER. So 8080:80 means "traffic to the host's port 8080 goes to the container's port 80." Now you can hit it:

curl -s http://localhost:8080 | head -n 1

<!DOCTYPE html>

A subtlety worth knowing: by default -p 8080:80 binds to all host interfaces. If you only want it reachable from the host itself, bind to loopback explicitly:

docker run -d -p 127.0.0.1:8080:80 nginx:1.27-alpine

Two things that surprise people. First, containers on the same user-defined network reach each other on the container's internal port (80 in these examples) with no -p at all — publishing is only for host-to-container traffic. Second, "it works inside the container but I can't reach it from outside" almost always means you never published the port, or you published a different one than the app listens on.

host and none modes

--network host removes network isolation entirely: the container shares the host's network stack. No separate IP, no NAT, and -p is ignored because there's nothing to map — the container's port 80 is the host's port 80.

docker run -d --network host nginx:1.27-alpine
# nginx is now on the host's :80 directly, no -p needed

I reach for host mode in two situations: performance-sensitive workloads where NAT overhead matters, and tools that need to see the host's real interfaces (some monitoring agents, or anything doing raw/multicast networking). The tradeoffs are real, though — you lose port-mapping flexibility, you can collide with ports already in use on the host, and it's Linux-only in the way people expect (on Docker Desktop the semantics differ). Use it deliberately, not as a shortcut around a publish problem.

--network none gives the container a loopback interface and nothing else — no external connectivity at all. It's for batch jobs that process local data and should never touch the network, or as a security boundary. Handy, rarely needed.

Troubleshooting: the four usual suspects

When a connection fails, I check these in order. Almost every case is one of them.

1. Wrong network. The two containers aren't actually on the same network, so DNS never resolves. Confirm who's attached:

docker network inspect appnet --format '{{range .Containers}}{{.Name}} {{end}}'

app redis

If a container you expected isn't listed, that's your bug. Attach a running one with docker network connect appnet <container>.

2. App bound to 127.0.0.1 inside the container. This is the sneakiest one. If your app listens on 127.0.0.1:5000 instead of 0.0.0.0:5000, it only accepts connections from inside its own container — not from other containers, not from a published port. You'll get connection refused even though the process is clearly running. Check what it's actually bound to:

docker exec app sh -c "netstat -tlnp 2>/dev/null || ss -tlnp"

State   Recv-Q  Send-Q  Local Address:Port
LISTEN  0       128     127.0.0.1:5000       <-- the problem

The fix is in the app config, not Docker: bind to 0.0.0.0. Flask's --host=0.0.0.0, a HOST=0.0.0.0 env var, server.address=0.0.0.0 — whatever your framework calls it.

3. Port not published (or the wrong one). You're trying to reach the container from the host but never mapped a port, or mapped one the app doesn't listen on. Check the mapping:

docker port web

80/tcp -> 0.0.0.0:8080

No output means nothing is published. A mapping to a port your app isn't listening on means you'll connect to the host port and get refused at the container.

4. Right network, wrong port number. Container-to-container traffic uses the internal port. If Redis listens on 6379, you connect to redis:6379 regardless of any -p flag. Published ports are irrelevant between containers on the same network.

docker network inspect and docker exec ... getent hosts <name> resolve most of these in under a minute. If the name resolves and the port is right, the problem is inside the app.

Compose does the network for you

Everything above is what Docker Compose automates. When you docker compose up, Compose creates a user-defined bridge for the project and attaches every service to it, so services resolve each other by service name out of the box.

services:
  app:
    build: .
    ports:
      - "8080:5000"      # host:container, published to your machine
    environment:
      REDIS_URL: "redis://redis:6379"   # "redis" is the service name
  redis:
    image: redis:7.2-alpine

app reaches Redis at the hostname redis because Compose put both on the same network and named the DNS entry after the service. It's the exact same embedded-DNS mechanism from the manual example — you just don't type the docker network create. Note the ports entry is only needed for the service you want to reach from your host; app and redis talk internally with no published port.

If you want a set of worked reference setups to copy from — user-defined networks, published ports, and Compose files already wired up — I keep a collection of Docker stack patterns at devopsaitoolkit.com/stacks/docker that mirror the examples here.

Wrapping up

If you remember one thing: put your containers on a user-defined network and reference them by name, never by IP. That single habit gives you working DNS, clean isolation, and containers that keep talking to each other even as IPs churn underneath them — and it turns most "connection refused" mysteries into a quick check of network membership, bind address, and published port.

Docker Volumes vs Bind Mounts: Where Your Data Actually Lives

James Joyner — Sat, 11 Jul 2026 09:38:12 +0000

A container's writable layer feels like a filesystem, and that's exactly the trap. Write a database into it, remove the container, and the data is gone — no warning, no recovery. If you want anything to survive docker rm, it has to live outside the container, and Docker gives you three ways to do that: named volumes, bind mounts, and tmpfs. Knowing which one to reach for is most of the battle.

Why the writable layer betrays you

Every running container gets a thin read-write layer stacked on top of its image layers. It looks persistent because you can docker exec in and see your files. But that layer is bound to the container's lifecycle.

docker run --name scratch alpine sh -c 'echo hello > /data.txt; cat /data.txt'
# hello
docker rm scratch
# the layer — and /data.txt — no longer exists

There's no "oops." The writable layer is discarded with the container. Persistence is not a default you get; it's a decision you make. That decision is a volume, a bind mount, or tmpfs.

Named volumes: the default for state

A named volume is storage that Docker creates and manages for you. You give it a name, Docker keeps the actual bytes under its own directory, and you never have to care where that is.

docker volume create pgdata
docker run -d --name db \
  --mount type=volume,source=pgdata,target=/var/lib/postgresql/data \
  postgres:16

The container writes to /var/lib/postgresql/data, but those bytes land in a Docker-managed location on the host. Remove and recreate the container against the same volume and the data is still there.

docker rm -f db
docker run -d --name db \
  --mount type=volume,source=pgdata,target=/var/lib/postgresql/data \
  postgres:16
# same data, new container

Where do the bytes actually live? Under Docker's data root, typically /var/lib/docker/volumes/<name>/_data:

docker volume inspect pgdata --format '{{ .Mountpoint }}'
# /var/lib/docker/volumes/pgdata/_data

The point is that you're not supposed to reach into that path directly — Docker owns it. You address the data by volume name, not host path, which is why volumes are portable across hosts and the right default for databases and app state.

Bind mounts: your host directory, mapped in

A bind mount points a container path straight at a directory you control on the host. No Docker management, no abstraction — it's your filesystem, exposed inside the container.

docker run -d --name web \
  --mount type=bind,source=/srv/appdata,target=/app \
  node:20 npm run dev

This shines for local development. Mount your source tree in and edits on the host show up live inside the container, so a file-watcher reloads without a rebuild. It's also the natural way to feed in a config file.

The tradeoff is coupling. A bind mount hard-wires the container to your host's layout — /srv/appdata has to exist, with the right contents and permissions, on every machine that runs this. That portability cost is the whole reason volumes exist. My rule: bind mounts for dev convenience and config, named volumes for anything that's real state.

tmpfs: in-memory and gone on stop

Sometimes you want scratch space that never touches disk — a secret you don't want persisted, or a hot temp directory. A tmpfs mount lives in RAM and vanishes when the container stops.

docker run -d --name cache \
  --mount type=tmpfs,target=/tmp/scratch \
  alpine sleep 3600

Nothing is written to the host filesystem. Use it for sensitive or throwaway data, not for anything you expect to find later.

`-v` shorthand vs `--mount` long form

You'll see two syntaxes. The old -v shorthand packs everything into one colon-separated string:

# named volume
docker run -v pgdata:/var/lib/postgresql/data postgres:16
# bind mount
docker run -v /srv/appdata:/app node:20

The newer --mount form is explicit key=value:

docker run --mount type=volume,source=pgdata,target=/var/lib/postgresql/data postgres:16
docker run --mount type=bind,source=/srv/appdata,target=/app node:20

They do the same thing, but -v has a sharp edge: with a bind mount, if the host path doesn't exist, -v silently creates it as an empty directory owned by root. --mount errors out instead. I recommend --mount for anything non-trivial — the verbosity buys you clarity and a loud failure when you get a path wrong.

The permissions gotcha everyone hits

This is the one that eats an afternoon. The container process runs as some UID, and files on a bind mount are owned by whatever UID owns them on the host. Those two numbers don't have to agree, and when they don't, you get denied.

Say the container runs as UID 1000 but /srv/appdata is owned by root (UID 0):

docker run --rm \
  --user 1000:1000 \
  --mount type=bind,source=/srv/appdata,target=/app \
  alpine sh -c 'echo test > /app/out.txt'
# sh: can't create /app/out.txt: Permission denied

The container isn't confused — the kernel is enforcing ownership by number. Docker doesn't translate UIDs across the boundary. The fix is to make the numbers line up. Either match the host directory's ownership to the container's UID:

sudo chown -R 1000:1000 /srv/appdata

Or run the container as the UID that already owns the files:

id -u        # say this prints 1000
docker run --rm --user "$(id -u):$(id -g)" \
  --mount type=bind,source=/srv/appdata,target=/app \
  alpine sh -c 'echo test > /app/out.txt && echo ok'
# ok

The mental model that saves you: a UID inside the container is the same number as a UID on the host. There's no name mapping, only integers. Named volumes dodge much of this because Docker initializes their ownership from the image's expected user on first use — another reason they're the calmer default for stateful services.

Read-only mounts for config

If a mount only needs to be read — config files, certs, static assets — say so. Append :ro (or readonly) and the container can't modify it, which closes off a whole class of accidents.

docker run -d --name web \
  --mount type=bind,source=/srv/appdata/config.yaml,target=/app/config.yaml,readonly \
  node:20

Read-only by default for config is a good habit. If the app tries to write where it shouldn't, you find out immediately instead of silently corrupting a shared file.

Inspecting and lifecycle

List and inspect volumes:

docker volume ls
docker volume inspect pgdata

To see what a specific container has mounted, docker inspect its Mounts:

docker inspect db --format '{{ json .Mounts }}'
# [{"Type":"volume","Name":"pgdata","Source":"/var/lib/docker/volumes/pgdata/_data","Destination":"/var/lib/postgresql/data",...}]

Now the danger. Every time you run a container that declares a volume without naming it, Docker creates an anonymous volume — a random-hash name you'll never recognize. These pile up quietly.

docker volume ls -qf dangling=true
# 8f3c...REDACTED
# a91d...REDACTED

docker volume prune cleans up unused volumes, and that's genuinely useful for the anonymous cruft. But it does not discriminate between "junk I forgot about" and "the volume holding data I care about but isn't attached right now." A stopped-and-removed database whose named volume is momentarily unreferenced can be swept away. Read the prompt, and never wire prune into an unattended script.

Backing up a named volume

Because a named volume is just a directory Docker manages, you back it up by mounting it into a throwaway container alongside a backup target and tarring it up. No special tooling.

# back up pgdata to ./backup/pgdata.tar.gz
docker run --rm \
  --mount type=volume,source=pgdata,target=/data,readonly \
  --mount type=bind,source="$(pwd)/backup",target=/backup \
  alpine tar czf /backup/pgdata.tar.gz -C /data .

Restore is the mirror image — mount the (empty) target volume and unpack into it:

docker run --rm \
  --mount type=volume,source=pgdata,target=/data \
  --mount type=bind,source="$(pwd)/backup",target=/backup \
  alpine sh -c 'cd /data && tar xzf /backup/pgdata.tar.gz'

For a live database, quiesce or dump it first rather than tarring hot files — but as a pattern for volume data at rest, this is honest and dependency-free.

The same rules in Compose

Nothing changes conceptually in Compose; the syntax just moves into YAML. A top-level volumes: key declares named volumes, and a service can also bind-mount a host path — both resolve exactly as they do on the CLI.

services:
  db:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data          # named volume
      - /srv/appdata/config.yaml:/app/config.yaml:ro  # bind mount, read-only

volumes:
  pgdata:

If you want more end-to-end setups like this — services wired to the right kind of storage, with the Compose files laid out — there are worked Docker stack patterns to copy from.

Wrapping up

The one thing to carry with you: match the storage to the job. Named volumes for state you care about — databases, uploads, anything you'd cry over losing — because Docker manages them and they stay portable. Bind mounts for development and config, where being coupled to a host path is a feature, not a bug. tmpfs for secrets and scratch. And whatever holds your real data, prove you can restore it before you need to.

It Works on My Machine: A Docker War Story About exec format error

James Joyner — Fri, 10 Jul 2026 14:45:41 +0000

"It works on my machine" is the oldest joke in software, and containers were supposed to kill it. Same image everywhere, same behavior everywhere — that's the whole pitch. So there's a special kind of betrayal when a container that runs perfectly on your laptop lands in the cluster and dies instantly with four unhelpful words:

exec /app/server: exec format error

Here's the afternoon that error cost me, and the thing it turned out to be teaching.

The setup

Built the image locally on a shiny new laptop. Ran it locally — perfect. Pushed it, the deploy rolled out, and every pod went straight into CrashLoopBackOff. kubectl logs showed the line above and nothing else. No stack trace, no panic, no hint. The binary that ran fine thirty seconds ago on my machine refused to execute at all in prod.

The maddening part, same as it always is: the exact same image. That's the container promise. How can the same bytes run in one place and be unrunnable in another?

The tell I walked right past

exec format error is the kernel's way of saying "I tried to execute this file and I don't recognize the format." Not "permission denied," not "not found" — I literally cannot run this shape of binary.

And the shape of a binary that a kernel can or can't run is its CPU architecture. My shiny new laptop was Apple Silicon — arm64. The cluster nodes were amd64. I'd built an arm64 binary, wrapped it in an image, and shipped it to machines that speak a different instruction set. Locally it ran because I was running it on the architecture I built it for. The moment it hit an amd64 node, the kernel looked at my arm64 executable and said, correctly, "I don't know how to run this."

Nothing was broken. Docker did exactly what I asked — it built an image for the platform I was on and faithfully shipped it. I just never told it that "the platform I'm on" and "the platform this runs on" were different.

Confirming it

Two commands make it obvious:

# what architecture is this image built for?
docker image inspect myimage:tag --format '{{.Architecture}}'
# arm64   ← there's the problem

# what do the target nodes run?
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# amd64 amd64 amd64

arm64 image, amd64 nodes. Mystery over.

The fix

Stop building for "wherever I happen to be" and start building for where it runs. docker buildx builds multi-arch images from a single command:

docker buildx build --platform linux/amd64,linux/arm64 \
  -t registry/myimage:tag --push .

Now the registry holds both architectures under one tag, and every node pulls the variant it can actually run. If you only ever deploy to amd64, you can just pin that: --platform linux/amd64. Either way, the key is that the build platform is now a decision, not an accident of what laptop you bought.

What it was actually teaching

The container promise isn't "the same image runs everywhere." It's "the same image runs everywhere that shares the contract it was built against" — and CPU architecture is part of that contract, an invisible part that used to be uniform and quietly stopped being uniform the day ARM laptops got good.

That's the pattern behind almost every "works on my machine" that survives containerization: some assumption from your environment rode along inside the image without you noticing — an architecture, a mounted file that only exists locally, an env var your shell sets and prod doesn't. The container didn't lie. It faithfully packaged your assumptions and carried them somewhere the assumptions weren't true.

The fix is always the same discipline: make the invisible contract explicit. Build for the target, not the desk you're sitting at.

I keep the full library of Docker gotchas like this one — the diagnostic commands, the root cause, the prevention — for the next time one of them eats an afternoon:

The Docker troubleshooting toolkit, and the executable file not found in $PATH guide for its close cousin (the other "your binary won't run" error).

What's your favorite "same image, different result" story? The ARM-laptop-to-x86-cluster one has bitten a lot of people since about 2021 — I doubt I'm the last.

The 10 Docker Errors That Waste the Most Time (and the One-Line Fix)

James Joyner — Fri, 10 Jul 2026 01:54:11 +0000

Docker is fantastic right up until it throws one of its greasy, context-free error messages at you and you lose twenty minutes to a thing that has a one-line fix. I've been collecting these — the exact strings, and the first thing to check for each. Here are the ten that eat the most time.

1. `Cannot connect to the Docker daemon at unix:///var/run/docker.sock`

The engine isn't reachable. In order of likelihood: the daemon isn't running (systemctl status docker), you're not in the docker group (sudo usermod -aG docker $USER, then log out and back in), or you're pointing at the wrong DOCKER_HOST. It's almost never Docker being broken — it's Docker not being up or you not being allowed.

Full guide →

2. `no space left on device`

Docker hoards. Dangling images, stopped containers, unused volumes and build cache pile up on the Docker root disk. docker system df shows you where it went; docker system prune -a --volumes reclaims it (read what it'll delete first). If df -h says you have space but Docker disagrees, you may be out of inodes.

Full guide →

3. `Bind for 0.0.0.0:8080 failed: port is already allocated`

Something already owns that port — often a container you forgot was running. docker ps to find it, or ss -tlnp | grep 8080 for a non-Docker process. Stop the holder or map to a different host port.

Full guide →

4. `pull access denied ... repository does not exist or may require 'docker login'`

Three flavors: the image name/tag is wrong, it's a private registry and you're not authenticated (docker login), or you've hit Docker Hub's anonymous pull rate limit. The error says "does not exist OR requires login" for a reason — check both.

Full guide →

5. `exec format error`

You built the image for one CPU architecture and ran it on another — the classic Apple Silicon (arm64) build landing on an amd64 server. Build multi-arch with docker buildx, or pin --platform to match your target.

6. `OCI runtime create failed`

A low-level container-start failure. The useful part is always after the colon — a missing binary, a bad mount, a permissions problem. Read the full message; OCI runtime create failed itself tells you nothing.

Full guide →

7. `executable file not found in $PATH`

Your CMD or ENTRYPOINT points at a binary the image doesn't have — often because a slim/distroless base doesn't ship a shell, or you assumed a tool was installed. Check exec-form vs shell-form and confirm the binary actually exists in the final layer.

Full guide →

8. `TLS handshake timeout`

Usually not a cert problem — it's a network path issue (a proxy, MTU, or firewall) between you and the registry, masquerading as TLS. Test raw connectivity before you touch certificates.

Full guide →

9. `failed to compute cache key: ... not found` (COPY/ADD)

Your Dockerfile is trying to COPY a file that isn't in the build context — either the path is wrong, or .dockerignore is excluding it. Remember paths are relative to the context root, not the Dockerfile.

10. `Conflict. The container name "/x" is already in use`

A container with that name already exists (running or stopped). docker rm x to remove the old one, or use --rm / a fresh name. Common in CI where a previous run didn't clean up.

The pattern

Nearly every Docker error puts the useful information after the colon and a generic category before it. OCI runtime create failed is the category; the cause is the clause you skimmed past. Train yourself to read to the end of the line before you start googling.

I keep complete guides for all of these — and about eighty more Docker errors — each with the diagnostic workflow, a worked root-cause example, and the prevention checklist:

The Docker troubleshooting toolkit — the top errors, launcher, and runbooks in one place

Which Docker error has personally cost you the most hours? Genuinely curious which of these tops the list for other people.

How I Cut a Docker Image From 1.2GB to 180MB

James Joyner — Thu, 09 Jul 2026 02:32:56 +0000

A while back I inherited a service whose Docker image was 1.2GB. Pulls were slow, the CI cache was useless, and the deploy step took long enough that people context-switched away and forgot about it. I got it down to about 180MB without changing a line of application code. Here's exactly what moved the needle, roughly in order of impact.

1. Multi-stage builds (the big one)

The single biggest win. The original Dockerfile built the app and shipped the entire build toolchain along with it — compilers, dev headers, the full package cache. None of that is needed at runtime.

# build stage — has all the heavy tooling
FROM node:20 AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# runtime stage — starts clean, copies only the artifact
FROM node:20-slim
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]

The runtime image never contains the build tools. That alone took roughly 500MB off.

2. Pick a smaller base

node:20 is Debian with everything. node:20-slim drops a couple hundred MB. If your app is a static binary (Go, Rust) you can go all the way to distroless or scratch and ship just the binary — no shell, no package manager, no OS to speak of. Smaller base = smaller image and a smaller attack surface, which your security team will also thank you for.

The trade-off: distroless has no shell, so docker exec ... sh won't work for debugging. Know that going in.

3. Order layers by how often they change

Docker caches layers top-down and invalidates everything after the first change. If you COPY . . before installing dependencies, every code change busts your dependency cache and reinstalls everything.

Copy your lockfile and install deps first, then copy the rest of the source:

COPY package*.json ./
RUN npm ci          # cached until dependencies actually change
COPY . .            # changes every commit, but deps stay cached

This didn't shrink the final image much, but it turned a 4-minute rebuild into a 20-second one.

4. Add a real `.dockerignore`

Without it, COPY . . drags your entire .git history, node_modules, local .env files, test fixtures, and CI logs into the build context — bloating the image and leaking things you don't want baked into a layer.

.git
node_modules
*.log
.env*
dist
coverage

5. Collapse and clean up `RUN` layers

Every RUN is a layer, and deleting files in a later layer doesn't shrink the earlier one. Install, use, and clean up in a single RUN:

RUN apt-get update \
 && apt-get install -y --no-install-recommends some-tool \
 && rm -rf /var/lib/apt/lists/*

The rm has to be in the same RUN as the apt-get, or the cache still ships in the layer beneath it.

The results

	Before	After
Image size	1.2 GB	~180 MB
Cold pull	~90s	~12s
Cached rebuild	~4 min	~20s

None of this is exotic — it's multi-stage, a slimmer base, layer order, .dockerignore, and cleaning up in place. But together they turn a deploy you dread into one you don't think about.

If you want the deeper reference — including the Docker errors these optimizations sometimes surface (no space left on device, cache-key failures, and friends) — I keep a full set:

The Docker troubleshooting toolkit and the no space left on device guide for when the build disk fills up mid-optimization

What's the smallest you've gotten a real production image (not a hello-world)? Always looking for tricks I haven't tried.

DEV Community: James Joyner

Migrating from Terraform to OpenTofu: A Low-Risk Playbook

Step 0: Know what "low-risk" means here

Step 1: Check and pin your Terraform version first

Step 2: Install tofu alongside, not instead

Step 3: The parity check — init and plan, no apply

Step 4: Apply once, deliberately, in a safe place

Step 5: Swap CI, one pipeline at a time

Step 6: Watch for the one-way doors

Step 7: How to actually roll back

Takeaway

OpenTofu vs Terraform in 2026: What Actually Changed

A very short history of the fork

The CLI and state are (mostly) drop-in compatible

The real divergences in 2026

State and plan encryption, natively

Early variable evaluation

Provider-defined functions

.tofu and .tofu.json override files

The registry

A note on version numbers

So which do you pick, and should you switch?

Takeaway

The OpenTofu Errors You'll Actually Hit (and How to Fix Them Fast)

1. State lock: "Error acquiring the state lock"

2. Provider checksum mismatch / dependency lock

3. "Backend initialization required, please run tofu init"

4. "Invalid for_each argument" — unknown values

5. "Unsupported argument" / "Unsupported attribute"

6. Dependency cycle

7. "Provider produced inconsistent final plan"

8. Registry service discovery failure

When the message isn't in this list

Takeaway

The 12 DevOps Errors That Page Teams Most (And the First Thing to Check)

1. CrashLoopBackOff

2. ImagePullBackOff / ErrImagePull

3. OOMKilled (exit code 137)

4. No space left on device

5. DNS timeouts inside pods

6. FATAL: sorry, too many clients already (Postgres)

7. Connection refused

8. TLS handshake timeout

9. Read-only file system

10. Multi-Attach error for volume

11. 502 Bad Gateway

12. exec format error

The pattern

I Built Free Browser-Based Validators for YAML, Kubernetes and Terraform (No Upload, No Signup)

What they are

The one design decision that matters

Why I bother

Try them

Fix Docker Exit Code 137 (OOMKilled): Why It Happens and How to Stop It

What exit code 137 actually means

Confirm it was OOM, not something else

Why it happens

How I'd diagnose it (in order)

A worked example

What to watch out for

Making it repeatable

Wrapping up

Hardening Docker Containers: The Security Habits That Actually Matter

Don't run as root

Drop capabilities, add back only what you need

Read-only root filesystem

no-new-privileges

Never --privileged (and mind the Docker socket)

Secrets: not in ENV, not in layers

Pin base images by digest, and scan them

Limit the blast radius

Putting it together

Wrapping up

Docker Networking, Demystified: Bridge, Host, and Container DNS

The pieces: bridges, DNS, and published ports

Default bridge vs. user-defined bridge

A worked example: an app talking to Redis by name

Publishing ports: EXPOSE is not publish

host and none modes

Troubleshooting: the four usual suspects

Step 2: Install `tofu` alongside, not instead

`.tofu` and `.tofu.json` override files

1. `CrashLoopBackOff`

2. `ImagePullBackOff` / `ErrImagePull`

3. `OOMKilled` (exit code 137)

4. `No space left on device`

6. `FATAL: sorry, too many clients already` (Postgres)

7. `Connection refused`

8. `TLS handshake timeout`

9. `Read-only file system`

10. `Multi-Attach error for volume`

11. `502 Bad Gateway`

12. `exec format error`

Never `--privileged` (and mind the Docker socket)

`-v` shorthand vs `--mount` long form

1. `Cannot connect to the Docker daemon at unix:///var/run/docker.sock`

2. `no space left on device`

3. `Bind for 0.0.0.0:8080 failed: port is already allocated`

4. `pull access denied ... repository does not exist or may require 'docker login'`

5. `exec format error`

6. `OCI runtime create failed`

7. `executable file not found in $PATH`

8. `TLS handshake timeout`

9. `failed to compute cache key: ... not found` (COPY/ADD)

10. `Conflict. The container name "/x" is already in use`

4. Add a real `.dockerignore`

5. Collapse and clean up `RUN` layers