DEV Community

SEN LLC
SEN LLC

Posted on

Your .env.example Is Not a Schema. Here's a 400-Line Python CLI That Fixes That

Your .env.example Is Not a Schema. Here's a 400-Line Python CLI That Fixes That

envcheck validates a .env file against a declarative YAML schema. Types, enums, regex patterns, length bounds, required/optional, with line-numbered grep-friendly errors. About 400 lines of Python, one runtime dependency (PyYAML), ships as a 62 MB Docker image that works from any language ecosystem.

📦 GitHub: https://github.com/sen-ltd/envcheck

Screenshot

The 2am deploy that started this

A staging deploy went out on a Friday evening. Container came up, bound to port :99999, crashed. Rolled back. Took fifteen minutes to find because the crash looked like "the app doesn't start" — nothing about the cause. The real problem was a copy-paste error in the .env file: someone had typed PORT=99999 instead of 8080, the app didn't bounds-check it, and the OS just refused to bind. A linter would have caught it. A type checker would have caught it. The thing we had — a .env.example file in the repo — was a list of keys with some hand-wavy placeholder values. It wasn't a schema. It couldn't express "port must be an integer in 1..65535." It couldn't express anything at all.

This is a surprisingly common class of bug. The .env.example convention is almost universal — but .env.example is a convention, not a contract. It can't tell you:

  • DATABASE_URL must be a valid URL (not localhost:5432, which isn't one).
  • NODE_ENV must be one of development|staging|production (not dev, which is a reasonable-looking typo).
  • JWT_SECRET must be at least 32 characters long (not the empty string that sneaks in when someone runs unset JWT_SECRET without realizing it).
  • PORT must be a valid TCP port.
  • ADMIN_EMAIL must be an email-ish string.

Every team ends up writing three or four of these checks inline in their startup code, in whichever language their service is written in. That code isn't shared, can't run in CI before deploy, and drifts between services.

I looked at what was already out there:

  • dotenv-linter (Rust) — fast and well-maintained, but it's a style linter. It catches duplicated keys and spaces around =. It doesn't do types or required-ness.
  • envalid (JavaScript) — type-safe, very nice, but the schema is code. You can't run it in CI without running your Node.js app; you can't share the schema with your Go service.
  • Framework-specific tools (Rails's dotenv, Python's environs, Spring Boot's @ConfigurationProperties) — each does its own thing, each is code-based, none are declarative, none are language-neutral.

None of them solve the "lint my .env against a declarative schema file in CI" problem for polyglot teams. So I wrote one. It's called envcheck, it's about 400 lines of Python, it has one runtime dependency (PyYAML), and it ships as a Docker image so you never have to install Python into your CI runner.

Design decisions

Why YAML for the schema

The schema file is what the team edits and reviews. It's not a program, it doesn't run, it just expresses constraints. That means we want:

  • Reviewability: a diff should be obvious in a GitHub PR. "required": true → "required": false needs to stand out.
  • Language-neutral: the same schema file should be editable by the Python team, the Go team, the TypeScript team, and the SRE reading it to understand what the service needs.
  • In-ecosystem: nobody installs a new parser just for this.

YAML hits all three. JSON is noisier. TOML is fine but less familiar for nested maps. A custom DSL would be the worst of all worlds — a thing to learn, a thing to parse, a thing to document. YAML is the boring right answer.

I did briefly consider JSON Schema (the real thing, with $ref and everything), but .env values are always strings, and JSON Schema's ergonomics are tuned for JSON documents with rich types. Forcing everything through "type": "string" with "pattern": "..." would be verbose and hide intent. A purpose-built DSL with type: port as a first-class option is much friendlier.

Why write a custom .env parser

The natural instinct is to pip install python-dotenv and move on. I did not do this, and the reason is one word: line numbers.

python-dotenv gives you a Dict[str, str]. That's all you need when you're loading env vars into your process. It's exactly the wrong shape when you want to produce an error like db-prod.env:14: DATABASE_URL: expected url, got "localhost:5432". The line number is what makes the error grep-friendly and editor-jumpable. Dropping it is a non-starter.

So envcheck has its own tiny parser that returns a list of (key, value, line_no) triples. It's maybe 50 lines of actual logic, and it's trivially testable because it takes a string and returns a list — no file I/O, no globals, no environment side effects:

@dataclass(frozen=True)
class EnvEntry:
    key: str
    value: str
    line_no: int  # 1-indexed, points at the source line


def parse_env(text: str) -> List[EnvEntry]:
    entries: List[EnvEntry] = []

    for idx, raw_line in enumerate(text.splitlines(), start=1):
        line = raw_line.strip()
        if not line or line.startswith("#"):
            continue
        if line.startswith("export "):
            line = line[len("export ") :].lstrip()
        if "=" not in line:
            raise EnvParseError(f"expected KEY=VALUE, got {raw_line!r}", idx)

        key, _, raw_value = line.partition("=")
        key = key.strip()
        if not _is_valid_key(key):
            raise EnvParseError(f"invalid key {key!r}", idx)

        entries.append(EnvEntry(key=key, value=_unquote(raw_value), line_no=idx))

    return entries
Enter fullscreen mode Exit fullscreen mode

The enumerate(..., start=1) is load-bearing: it's what lets the error message pin blame on a specific line of a specific file. A test asserts this directly — if you ever refactor the parser and silently introduce an off-by-one, it fails:

def test_line_numbers_survive_trailing_content():
    text = "\n".join([
        "# header",        # 1
        "DATABASE_URL=x",  # 2
        "# note",          # 3
        "PORT=8080",       # 4
    ])
    entries = parse_env(text)
    assert entries[0].line_no == 2
    assert entries[1].line_no == 4
Enter fullscreen mode Exit fullscreen mode

One .env quirk caught me during testing: trailing comments. FOO=bar #baz — is the value bar, bar, or bar #baz? I went with bar #baz, matching bash semantics. It's also the safer default: if a user's secret happens to contain #, we don't silently truncate it. A linter that corrupts secret values is worse than no linter.

Why no dependencies for color output

rich is lovely. colorama is fine. I used neither, because the whole selling point of this tool is "drop it into CI and forget about it." Every dependency is a thing that can break, a thing that pins a version, a thing that bloats the Docker image. Color output is six ANSI escape codes and a function that decides whether to emit them:

_RED = "\x1b[31m"
_YELLOW = "\x1b[33m"
_GREEN = "\x1b[32m"
_DIM = "\x1b[2m"
_RESET = "\x1b[0m"


def _color_enabled(ci_flag: bool, stream: TextIO) -> bool:
    if ci_flag:
        return False
    if os.environ.get("NO_COLOR"):
        return False
    return hasattr(stream, "isatty") and stream.isatty()


def format_error(err: ValidationError, env_path: str, *, color: bool) -> str:
    location = f"{env_path}:{err.line_no}" if err.line_no else env_path
    location_str = _paint(location, _DIM, color)
    message_str = _paint(err.message, _RED, color)
    return f"{location_str}: {message_str}"
Enter fullscreen mode Exit fullscreen mode

Three rules for color: --ci forces it off, NO_COLOR (the no-color.org convention) forces it off, and otherwise we check if stdout is a real TTY. Piping to a file or into grep gets plain text automatically. This matters: CI logs with \x1b[31m junk all over them are awful to read.

Why validators are plain functions

Type validation dispatches through a dict literal:

VALIDATORS: Dict[str, Callable[[str], Optional[str]]] = {
    "string": validate_string,
    "int": validate_int,
    "bool": validate_bool,
    "url": validate_url,
    "email": validate_email,
    "port": validate_port,
    "path": validate_path,
}
Enter fullscreen mode Exit fullscreen mode

Each validator is (value: str) -> Optional[str]: return None if the value is fine, return a short human-readable reason if not. That's it. No classes, no inheritance, no registration decorators. Adding a new type is: write a function, add it to the dict, add a test. I considered a plugin architecture. I don't need a plugin architecture. YAGNI, honestly. You can always refactor into one later if a real use case shows up — nobody has ever regretted deleting unnecessary abstraction.

The port validator is the one I'd have written three different wrong ways without a test:

def validate_port(value: str) -> Optional[str]:
    try:
        n = int(value)
    except ValueError:
        return f'expected port (1-65535), got "{value}"'
    if 1 <= n <= 65535:
        return None
    return f'expected port (1-65535), got "{value}"'
Enter fullscreen mode Exit fullscreen mode

Boundary cases: 0 is invalid (you can't bind to port 0, sort of), 65535 is valid, 65536 is invalid, 99999 is invalid, "abc" is invalid. All five are tests. The original 2am deploy would have tripped the 99999 case.

What you get at runtime

Given a .env with a handful of problems and a matching schema, envcheck produces the kind of output you can skim at 2am and immediately know what to fix:

$ envcheck --schema envcheck.yml --env .env
.env: missing required variable: JWT_SECRET
.env:3: DATABASE_URL: expected url, got "localhost:5432"
.env:7: PORT: expected port (1-65535), got "99999"
.env:12: LOG_LEVEL: expected one of [debug, info, warn, error], got "trace"
.env:15: ADMIN_EMAIL: expected email, got "ops-team"
.env:18: JWT_SECRET: length 12 < min_length 32
6 errors found in .env
Enter fullscreen mode Exit fullscreen mode

Each error line follows path:line: message, which is what grep -n, every compiler worth its salt, and vim +:cnext expect. That's not an accident — I wanted any existing editor or CI tool that knows how to parse compiler errors to be able to parse envcheck output without teaching it anything new.

Exit codes are the other half of CI integration:

  • 0 — .env is valid.
  • 1 — validation errors (missing required, wrong type, etc).
  • 2 — config error (schema file missing, YAML is malformed, .env file doesn't exist). This is distinct from 1 because "my schema is broken" is a different incident class from "my env vars are wrong" — you want to alert different people.

Tradeoffs and limitations (honest version)

envcheck does not do:

  • ${OTHER_VAR} interpolation. .env files in the wild sometimes reference other variables: DATABASE_URL=postgres://${DB_USER}:${DB_PASS}@host/db. Supporting this correctly means implementing a small expression language and worrying about cycles. envcheck treats the literal string ${DB_USER} as the value and will validate that against your schema. This is a deliberate choice: interpolation is usually done by the shell or by a runtime library, not by the file itself, and the moment you add interpolation you're writing a programming language interpreter. I'm not writing a programming language interpreter for this.
  • Multi-line values. The parser is single-line-only. If you have CERT="-----BEGIN CERTIFICATE-----\n...", you need to base64-encode it or put it in a file. This is already best practice.
  • Filesystem checks on path values. validate_path checks that the value is a non-empty string without NUL bytes. It does not check that the path exists — doing so would make your CI lint pass or fail depending on where it was run, and that's a nightmare. If you want existence checks, you want a different tool.
  • Cross-field constraints. You can't say "if FEATURE_X is true, then FEATURE_X_API_KEY must be set." That's a conditional, and conditionals are how schemas turn into code. If you need cross-field logic, reach for a code-based validator like envalid or pydantic.

When would a code-based validator be better? When you already have a single-language service and you'd rather express your env contract in the same language as everything else. pydantic-settings in a Python service is genuinely fine. envcheck's niche is the pre-deploy check — the one that runs in CI, in a container, before any app code starts, for a repo that might contain Python, Go, and TypeScript services sharing the same .env conventions. If you're only running one language, you probably don't need it.

On the "should this be a pre-commit hook plugin" question: probably both. A pre-commit config for local development, a docker run step in CI for the deploy gate. They're not the same check — local is developer convenience, CI is policy.

Try it in 30 seconds

git clone https://github.com/sen-ltd/envcheck
cd envcheck
docker build -t envcheck .

# 1. Generate a .env.example from the included schema:
docker run --rm -v "$PWD:/work" envcheck example \
  --schema envcheck.example.yml

# 2. Validate a good .env:
docker run --rm -v "$PWD/tests/fixtures:/work" envcheck \
  --schema schema.yml --env valid.env
# → OK  valid.env: 9 variables, all valid

# 3. Validate a broken one:
docker run --rm -v "$PWD/tests/fixtures:/work" envcheck \
  --schema schema.yml --env invalid.env
# → exits 1, grep-friendly colored errors

# 4. Run the test suite inside the image:
docker run --rm --entrypoint pytest envcheck -q
# → 51 passed
Enter fullscreen mode Exit fullscreen mode

A minimal schema to copy into your own repo as envcheck.yml:

DATABASE_URL:
  type: url
  required: true
  description: Primary Postgres connection string

PORT:
  type: port
  required: true

NODE_ENV:
  type: string
  required: true
  enum: [development, staging, production]

JWT_SECRET:
  type: string
  required: true
  min_length: 32

LOG_LEVEL:
  type: string
  required: false
  enum: [debug, info, warn, error]
Enter fullscreen mode Exit fullscreen mode

Then in your .github/workflows/ci.yml:

- name: Validate .env
  run: |
    docker run --rm -v "$PWD:/work" ghcr.io/sen-ltd/envcheck \
      --schema envcheck.yml --env .env.production --ci
Enter fullscreen mode Exit fullscreen mode

Exit code 1 fails the build; your deploy gate is now enforced by a file in the repo.

Closing

Entry #103 in a 100+ portfolio series by SEN LLC. If this is useful, fork it and add the types your team needs — cron, aws-region, semver, whatever. The validator table is one file and five lines per new type.

Feedback welcome.

Top comments (0)