Janne Sinivirta

Posted on Jan 3

Stop Writing Shell Scripts for Container Health Checks

#docker #containers #devops #cli

It started as one of those “this will take 30 seconds” moments.

We ship a container that includes a tiny helper binary — something we compile in a builder stage and COPY into the runtime image. Think: config-render, migrate-db, probe, whatever your service depends on at startup.

I just wanted to make sure the image really contained the helper I thought it contained, so I added:

RUN config-render --version

Green build. Ship it. Done.

..until a pipeline failed later with an error that told me basically nothing useful.

Was it missing from PATH?
Did we copy it into the wrong directory?
Did it lose the executable bit?
Wrong architecture (hello exec format error)?
Or did it run but print something unexpected?

So I did what we all do: I started “hardening” the check.

First: “Is it on PATH?”
Then: “Can it execute?”
Then: “What version is it?”
Then: “Is that version acceptable?”

And suddenly my “simple validation” became a little bash pipeline that tried to be clever:

command -v config-render >/dev/null 2>&1 && \
  config-render --version 2>/dev/null | \
  grep -oE '[0-9]+\.[0-9]+\.[0-9]+' && \
  # ...some version comparison logic... \
  true || \
  (echo "config-render missing or wrong version"; exit 1)

It worked... for now.

But this is where validation quietly becomes a maintenance trap:

Output parsing breaks when formatting changes (“v1.2.3”, “1.2.3 (build abc)”, extra lines, etc.)
Redirect chains swallow the exact error you needed (permission denied vs exec format error vs missing)
Version comparisons drift into “good enough”
Every repo grows its own flavor of the same fragile scripts

At some point I realized: I don’t want “a shell script that hopefully detects the problem.”
I want a single check that tells me exactly why it failed.

One command, clear failure reasons

That’s what Preflight is for.

Instead of assembling checks out of grep + awk, you run:

preflight cmd config-render --min 1.4.0

When it passes:

[OK] cmd: config-render
     path: /usr/local/bin/config-render
     version: 1.6.2

When it fails, you get the reason:

not found in PATH
failed to execute (permission denied / exec format error / etc.)
version too old (with an explicit comparison)

Example:

[FAIL] cmd: config-render
       failed to execute: exec format error

or:

[FAIL] cmd: config-render
       version 1.2.0 < minimum 1.4.0

No guessing. No “why did this randomly break today”.

Why predictable checks matter (containers, CI, everywhere)

This isn’t just a “containers are minimal” problem. It’s a reliability problem.

Shell-based checks are notoriously sensitive to environment:

bash vs sh differences (and whatever /bin/sh happens to be today)
GNU vs BSD tool differences (grep, sed, awk behave just differently enough)
Linux vs macOS quirks in CI runners
inconsistent error messaging when commands fail inside pipelines

A small binary with a narrow job tends to behave the same way everywhere.

That consistency is what you actually want from validation: predictable pass/fail and predictable output.

Multi-stage builds bonus: executable documentation

There’s another benefit I didn’t appreciate until later: in a multi-stage Dockerfile, these checks become documentation that runs.

When you see:

RUN preflight cmd config-render --min 1.4.0
RUN preflight env DATABASE_URL
RUN preflight file /app/config.yaml --not-empty

you’re not just “testing stuff”. You’re encoding expectations:

“This image must contain this helper”
“This version is the minimum supported”
“This env var must be present”
“This config file must exist and be non-empty”

It reads like a contract - and it fails like one too.

What Preflight checks

The above command check was just the beginning. Preflight now supports:

cmd — exists on PATH, executes, extracts version, compares semver
env — required env vars, allowed values / patterns
file — existence, permissions, “not empty”, basic content checks
http / tcp — connectivity with retry + timeout
hash — checksum verification

Three places it immediately helps

1) Build-time validation (fail fast)

COPY --from=ghcr.io/vertti/preflight:latest /preflight /usr/local/bin/preflight

RUN preflight cmd config-render --min 1.4.0
RUN preflight env DATABASE_URL
RUN preflight file /app/config.yaml --not-empty

2) CI validation (same checks outside Docker)

preflight cmd config-render --min 1.4.0
preflight env DATABASE_URL

3) Healthchecks without `curl` (distroless-friendly)

HEALTHCHECK CMD ["/preflight", "http", "http://localhost:8080/health"]

When shell is fine — and when it bites you

Shell is totally fine when:

the check is genuinely trivial
you don’t care about consistent error reporting
you’re happy maintaining the script

Preflight pays off when:

you’re copying/downloading helper binaries into images
you want the same behavior in CI and in containers
you need real version constraints
you want consistent output across checks
you want checks that double as executable documentation

Repo: https://github.com/vertti/preflight

Top comments (1)

Sloan the DEV Moderator • Jan 9

We loved your post so we shared it on social.

Keep up the great work!