DEV Community

Cover image for Schema-Aware Gates: Validate, Don't Coerce
Niraj Kumar
Niraj Kumar

Posted on • Originally published at amanerp.com

Schema-Aware Gates: Validate, Don't Coerce

TLDR:

Schema-aware gates should validate and fail closed. Defaulting missing fields to zero turns a correctness gate into false evidence.

  • A default on a missing required field turns a correctness gate into a completeness gate.
  • Ban field, or 0 in schema parsers; reject with the field name and source file.
  • Commit a test that feeds bad input and proves the gate says no.

A disaster-recovery drill that restored nothing passed as the fastest recovery in the project's history. The receipt said recovery time: zero seconds, outcome: pass. Nothing failed. No alert fired. The gate that was supposed to catch exactly this waved it through — because one line read a missing field with a default and turned absence into success.

I now use a blunt rule for it: a schema-aware gate validates; it does not coerce. Defaulting a missing field does not make the check safer - it swaps a correctness question ("is this document right?") for a completeness question ("does this document exist?"). The damage is quiet, which is what makes it dangerous.

Q: Does defaulting a missing field really turn a correctness gate into a completeness gate?

A: Yes — a default invents a reading for every absent field, so the gate keeps approving documents it was built to refuse.


The gate that approved a disaster that didn't recover

A disaster-recovery drill is supposed to prove one thing: that you can destroy your infrastructure and rebuild it.

The proof is a committed receipt — a structured file recording the outcome, the recovery time, and the scope of what was validated. A freshness gate reads that receipt and asserts that the drill actually happened and actually passed.

The gate had a single innocent-looking line. It read the recovery-time field with a default: if the field was missing, treat it as zero. One parameter, one fallback value, the kind of defensive default you write without thinking.

# Banned — turns absence into a pass
rto = receipt.get("rto_seconds", 0)

# Required — fail loud, name the field
if not receipt.get("rto_seconds"):
    reject("missing rto_seconds in drill receipt")
if receipt["scope"] not in QUALIFYING_SCOPES:
    reject(f"scope {receipt['scope']!r} not in qualifying set")
Enter fullscreen mode Exit fullscreen mode

Then a different kind of receipt arrived — one from a drift-detection-only run, which checks for configuration drift but never destroys or restores anything. It had no recovery-time field because no recovery happened. The gate read the missing field, substituted zero, and signed off: zero-second recovery, pass.

The gate did not fail. It approved a non-event as a success — and filed it as the best recovery number on record.


The category error underneath it

The bug looks like a typo. It is actually a category error, and naming it is what makes it fixable.

A correctness gate asks: is this document right?
A completeness gate asks: does this document exist?

The moment a gate defaults a missing field, the question it answers changes without anyone noticing. It stops asking whether the document is right and starts asking only whether a document showed up. Once that swap happens, the gate will accept present-but-wrong inputs, because it just invented a reading for the absent parts.

No gate at all is at least an honest admission that you don't know.

The coercing version hands you a green check that says the thing you cared about was verified when the exact failure case sailed through. You will trust the check. That is the trap.

The reason the bug is everywhere is that the language makes it easy. get("field", 0) reads as defensive. It feels safe. It never throws.

But in a validation context, failing loudly is the entire feature.

The fallback that protects you in application code is the thing that blinds you at the gate. The same shape hides everywhere a schema gets parsed: coercing an enum to "unknown", substituting "default" for a missing mode, filling an absent status. Each one converts a failure into a silent pass.


Two gates, one mistake

If this were a single careless line, it would not be worth a name. It is worth a name because it recurred.

The same family of bugs appeared in a second gate in the same codebase: a backup-verification path where the reader's expected format silently diverged from the writer's actual format. The verification step matched a manifest it would never receive, found nothing to check, and reported success. It ran as a silent no-op for thirteen days while reporting pass the entire time.

Two different gates, two different authors, the same underlying error: a check that resolves an absent or mismatched input into a passing result instead of refusing it.

When the same mistake shows up twice, it isn't carelessness — it's a pattern the language and your instincts actively push you toward.

That's when you name the rule.


The rule, in one line

Validate explicitly, never coerce to defaults.

Concretely: when a gate parses a versioned schema, every required field must be checked for presence and for a qualifying value before anything else runs. If the field is missing, or its value falls outside the allowed set, the gate rejects — and the rejection message names the offending field and the file it came from, so the failure is diagnosable in one read.

The banned shapes are specific: reading a numeric field with a zero default, coercing an enum to "unknown," substituting any string for an absent required value. The required shape is equally specific: if the field is absent or its scope isn't in the qualifying set, reject with a clear message. The discipline is narrower than "validate more" — never let an absence resolve into a pass.


Where this rule breaks

Validate don't coerce is not free, and there is a place it stops being worth it: a coercing gate beats a missing gate, even if it loses to a strict one.

If the alternative to a defaulting check is no check at all — because writing the strict version blocks a release you need today — a default that catches the common case is a defensible interim, as long as you name it honestly. It is a completeness gate wearing a correctness gate's badge.

The danger is not the fallback you can see; it is the fallback you have forgotten is there. Treat every fallback in a schema-parsing gate as debt with a name, not as finished work.

There is also a real cost to strictness: a gate that rejects on every absent optional field becomes brittle in a different direction, failing loudly on documents that were fine.

The rule applies to required fields and qualifying values — the ones a pass actually depends on. Defaulting a genuinely optional field is not the bug; defaulting a field the verdict hinges on is. Match the strictness to what the pass is asserting, not to every key in the schema.


What to actually change on Monday

Grep your gate code for the coercion shape — get("field", 0), or "", or "unknown" — wherever you parse a schema you trust to make a pass/fail call. Each one is a candidate completeness gate in disguise. For every required field, replace the fallback with an explicit " presence and validity" check that fails loudly and names the field and source file.

Then add the test that proves it: feed the gate a document with the field missing and confirm it rejects. A gate with no committed test proving it rejects bad input is a green light you haven't earned. The whole value of a gate is that it says no to the right things.

A gate that can't say no is decoration.


About AmanERP

AmanERP is an ERP for SMBs built in public. We ship gates that fail loud — green checks are supposed to mean something, not decorate a pipeline. www.amanerp.com

Top comments (0)