DEV Community

Cover image for Stop Writing Shell Scripts for Container Health Checks
Janne Sinivirta
Janne Sinivirta

Posted on

Stop Writing Shell Scripts for Container Health Checks

It started as one of those “this will take 30 seconds” moments.

We ship a container that includes a tiny helper binary — something we compile in a builder stage and COPY into the runtime image. Think: config-render, migrate-db, probe, whatever your service depends on at startup.

I just wanted to make sure the image really contained the helper I thought it contained, so I added:

RUN config-render --version
Enter fullscreen mode Exit fullscreen mode

Green build. Ship it. Done.

..until a pipeline failed later with an error that told me basically nothing useful.

  • Was it missing from PATH?
  • Did we copy it into the wrong directory?
  • Did it lose the executable bit?
  • Wrong architecture (hello exec format error)?
  • Or did it run but print something unexpected?

So I did what we all do: I started “hardening” the check.

  • First: “Is it on PATH?”
  • Then: “Can it execute?”
  • Then: “What version is it?”
  • Then: “Is that version acceptable?”

And suddenly my “simple validation” became a little bash pipeline that tried to be clever:

command -v config-render >/dev/null 2>&1 && \
  config-render --version 2>/dev/null | \
  grep -oE '[0-9]+\.[0-9]+\.[0-9]+' && \
  # ...some version comparison logic... \
  true || \
  (echo "config-render missing or wrong version"; exit 1)
Enter fullscreen mode Exit fullscreen mode

It worked... for now.

But this is where validation quietly becomes a maintenance trap:

  • Output parsing breaks when formatting changes (“v1.2.3”, “1.2.3 (build abc)”, extra lines, etc.)
  • Redirect chains swallow the exact error you needed (permission denied vs exec format error vs missing)
  • Version comparisons drift into “good enough”
  • Every repo grows its own flavor of the same fragile scripts

At some point I realized: I don’t want “a shell script that hopefully detects the problem.”
I want a single check that tells me exactly why it failed.

One command, clear failure reasons

That’s what Preflight is for.

Instead of assembling checks out of grep + awk, you run:

preflight cmd config-render --min 1.4.0
Enter fullscreen mode Exit fullscreen mode

When it passes:

[OK] cmd: config-render
     path: /usr/local/bin/config-render
     version: 1.6.2
Enter fullscreen mode Exit fullscreen mode

When it fails, you get the reason:

  • not found in PATH
  • failed to execute (permission denied / exec format error / etc.)
  • version too old (with an explicit comparison)

Example:

[FAIL] cmd: config-render
       failed to execute: exec format error
Enter fullscreen mode Exit fullscreen mode

or:

[FAIL] cmd: config-render
       version 1.2.0 < minimum 1.4.0
Enter fullscreen mode Exit fullscreen mode

No guessing. No “why did this randomly break today”.

Why predictable checks matter (containers, CI, everywhere)

This isn’t just a “containers are minimal” problem. It’s a reliability problem.

Shell-based checks are notoriously sensitive to environment:

  • bash vs sh differences (and whatever /bin/sh happens to be today)
  • GNU vs BSD tool differences (grep, sed, awk behave just differently enough)
  • Linux vs macOS quirks in CI runners
  • inconsistent error messaging when commands fail inside pipelines

A small binary with a narrow job tends to behave the same way everywhere.

That consistency is what you actually want from validation: predictable pass/fail and predictable output.

Multi-stage builds bonus: executable documentation

There’s another benefit I didn’t appreciate until later: in a multi-stage Dockerfile, these checks become documentation that runs.

When you see:

RUN preflight cmd config-render --min 1.4.0
RUN preflight env DATABASE_URL
RUN preflight file /app/config.yaml --not-empty
Enter fullscreen mode Exit fullscreen mode

you’re not just “testing stuff”. You’re encoding expectations:

  • “This image must contain this helper”
  • “This version is the minimum supported”
  • “This env var must be present”
  • “This config file must exist and be non-empty”

It reads like a contract - and it fails like one too.

What Preflight checks

The above command check was just the beginning. Preflight now supports:

  • cmd — exists on PATH, executes, extracts version, compares semver
  • env — required env vars, allowed values / patterns
  • file — existence, permissions, “not empty”, basic content checks
  • http / tcp — connectivity with retry + timeout
  • hash — checksum verification

Three places it immediately helps

1) Build-time validation (fail fast)

COPY --from=ghcr.io/vertti/preflight:latest /preflight /usr/local/bin/preflight

RUN preflight cmd config-render --min 1.4.0
RUN preflight env DATABASE_URL
RUN preflight file /app/config.yaml --not-empty
Enter fullscreen mode Exit fullscreen mode

2) CI validation (same checks outside Docker)

preflight cmd config-render --min 1.4.0
preflight env DATABASE_URL
Enter fullscreen mode Exit fullscreen mode

3) Healthchecks without curl (distroless-friendly)

HEALTHCHECK CMD ["/preflight", "http", "http://localhost:8080/health"]
Enter fullscreen mode Exit fullscreen mode

When shell is fine — and when it bites you

Shell is totally fine when:

  • the check is genuinely trivial
  • you don’t care about consistent error reporting
  • you’re happy maintaining the script

Preflight pays off when:

  • you’re copying/downloading helper binaries into images
  • you want the same behavior in CI and in containers
  • you need real version constraints
  • you want consistent output across checks
  • you want checks that double as executable documentation

Repo: https://github.com/vertti/preflight

Top comments (0)