Gate Zero: stop unfalsifiable prompts before they canonicalize as specs

#claudecode #ai #promptengineering #llm

A durable artifact is any rule file, CLAUDE.md entry, architecture doc, or checked-in spec that outlives the session that created it. The fastest way to corrupt one is to let an unfalsifiable phrase into the prompt that wrote it.

"Ship a production-ready version." "Make it 99% reliable." "Better than Claude Code does by default." Each one sounds like a specification and is actually a vibe. The model executes something plausible. It gets committed. Six weeks later the phrase is load-bearing — other rules reference it, other prompts inherit it — and nobody can say what it means.

Gate Zero is the blocking check that stops this at entry. It fires before Requirements Analysis, before Architecture, before any file read or tool call. If the prompt contains a falsifiability hazard, the agent halts, names the hazard verbatim, and refuses to proceed until the phrasing is replaced with something measurable.

I run this gate in my own global ~/.claude/CLAUDE.md. It applies to me as strictly as it applies to anyone else using the framework. Public framework: claude-code-agent-skills-framework.

Why catch it at the prompt, not at review

The cost of catching an unfalsifiable phrase grows exponentially with where you catch it.

At the prompt: 30 seconds. Rewrite the sentence. Move on.
In a PR review: 20 minutes. You read the diff, realize the spec was vague, rewrite, re-run.
After context compaction: hours. The phrase has canonicalized into a rule file, another prompt inherited it, another agent generated code against it. You trace the chain of inheritance and unwind it.
After ship: the one that ends careers. The vague internal spec became a vague user promise.

Researchers call the failure Semantic Displacement — the model substitutes its own probabilistic priors when the prompt does not constrain the target. The output looks plausible, passes superficial review, and silently corrupts everything downstream.

The trigger list

Gate Zero fires on any of these patterns:

Unfalsifiable benchmarks. "N% better," "top N%," "99% of X output," "best-in-class," "state-of-the-art."
Tool-name-as-yardstick. "Better than Claude Code / GPT / Copilot does by default." Self-contradictory when the executing agent is that tool. No objective delta available.
Adjective-driven standards. "Robust," "seamless," "world-class," "production-ready," "production-grade," "enterprise-grade," "scalable," "performant" — used without a measurable replacement.
Missing success definition. A task with no testable "done" state. No acceptance criteria, no failure condition.
Implicit shared-context assumptions. "You know what I mean." Unqualified comparatives ("make it cleaner," "make it faster") with no baseline.

The gate explicitly does NOT fire on:

Trivial asks ("rename X to Y," "what does this function do").
Explicit well-scoped technical tasks ("add a type hint here," "run the test suite").
Direct quotes from external sources — reporting a third-party claim is reporting, not a spec.
Aesthetic preferences that do not touch durable artifacts.

The four-step procedure (blocking)

When a hazard is detected, halt all tool execution and execute in strict order:

1. Intercept and name the hazard verbatim. Quote the exact phrase from the prompt. No paraphrasing. The user must see the specific linguistic failure — otherwise they will write it again in the next prompt.

2. Explain the epistemological failure. "This is a vibe, not a falsifiable standard. I cannot tell when it is met or violated. Proceeding would be Semantic Displacement: I would substitute my own priors for your actual requirement."

3. Propose two or three measurable replacements. Draw from the operational-definition reference table below. Offer at least one named engineering gate and one concrete boolean failure condition.

4. Enforce the hold. Do not proceed. Wait for the user to choose a replacement or propose a new operational definition. Explicitly refuse to execute the vague version.

The operational-definition reference table

Starting points. Adapt to the domain.

Hazard phrase	Operational replacement
Production-ready	CI green, ≥80% unit-test coverage, zero critical static-analysis findings, Definition-of-Done criteria met
Robust	Fallback/retry paths defined, p99 latency target, exponential backoff on failing calls, named error taxonomy
Seamless integration	Real-time bidirectional data flow, zero manual reconciliation steps, specific latency ceiling
World-class / best-in-class	Concrete measurable target (Lighthouse 90+, p99 <200ms, test coverage ≥80%)
Better than $TOOL	ESLint/ruff/mypy clean, cyclomatic complexity <10, named design patterns, specific code-quality gates
Clean / clear code	Named style guide enforced, no functions >50 lines, no files >500 lines, all public APIs documented

The specific numbers are not the point. The point is that the prompt now names a number or a boolean a test can measure.

The "under protest" exception

If the user insists on the vague phrasing after the challenge is issued, the gate permits proceeding — but on record.

Execute the best-effort interpretation.
Log the protest explicitly in the response: "Proceeding under protest — original prompt contained unfalsifiable phrase ''. Substituting interpretation: . This interpretation is not canonical; do not treat as a specification."
The protest is what prevents the vague standard from silently becoming canonical in durable artifacts.

Sometimes the user genuinely knows what they mean and the interpretation will be thrown away. The protest log preserves the falsifiability property: a future reader sees the phrase was flagged, not endorsed.

The gate applies to me

The same prompt author who benefits from the gate is the one most likely to write a vague prompt at 11 PM after a long session. Cognitive fatigue is not a workaround. The gate is protection, not insubordination.

I have watched myself, on record, write "make this robust" into a draft and get stopped by the gate. The friction is the point. If the gate ever stops firing on my own prompts, either (a) my prompt hygiene has genuinely improved — measurable by the drop in trigger rate over time — or (b) the gate has decayed and needs re-auditing against the current frontier model. Both are worth knowing.

Gate Zero pairs with a second discipline I wrote about earlier: every rule in the framework carries a WHY tag and a Retire-when clause, so rules decay cleanly as models improve. A gate that never retires is itself scaffolding debt.

Why this earns its presence

Gate Zero is the prompt-engineering equivalent of static typing. It does not make you smarter; it makes the same specific mistake visible every time you make it, so you stop making it. The specific mistake it catches — canonicalizing a vibe as a spec — is the one that most quietly compounds across a session, a project, and a career.

If you run any framework of durable artifacts (CLAUDE.md, rules, prompts-as-code), the gate's cost is one habit and a short trigger list. The payoff is that six months from now, no rule in your directory says "production-ready" with nobody remembering what that means.

Pick one prompt you are about to send. Scan it for the trigger list above. If any trigger fires, rewrite before sending. That is the whole practice.

Aman Bhandari. Operator of an AI-engineering research lab running Claude Opus as the coaching partner, plus a QA-automation surface shipping against a real sprint workload. Public artifacts: claude-code-agent-skills-framework and claude-code-mcp-qa-automation. github.com/aman-bhandari.