Pre-Build Existence Audit Rule : looking for the failure modes I'm still missing

#ai #llm #programming #claude

I wrote a rule for AI coding agents two days ago. It is untested in production sessions. I am posting it here to find its failure modes faster than I would by waiting for my own future mistakes to surface them.

The rule is below first. Story and reasoning after.

Pre-Build Existence Audit Rule (v1, structural verification)

Status: Untested in production sessions. Test on a new project for 2-3 weeks
before considering global rollout.

Before claiming "feature X is not built / not implemented / missing":

1. Map
   rg -li "<keyword>" .                            # project repo
   rg -li "<keyword>" ~/.claude/projects/*/memory/ # agent memory
   If either >5 files match, use the file list to scope which to read.

2. Structural footprint scan (NOT just synonyms)
   Identify architectural invariants this feature class would require:
   - Integration/API → router definitions, endpoint registrations,
     plugin tool lists
   - Data → schema files, migrations, type definitions, persisted-entity fields
   - Background → cron entries, queue handlers, scheduled job registrations
   - Cross-service → service registry, infra config, IPC handlers
   - Memory/decisions → project_*.md files documenting prior shipment

   Stack discipline: footprints must be stack-appropriate. If unsure which
   architectural pattern applies, list 2-3 plausible alternatives
   (REST/GraphQL/RPC; cron/queue/webhook) and search each. Wrong-ontology
   audits feel rigorous but miss truth.

   Grep each invariant. If ANY return matches, "not built" is contradicted
   until you've read those matches.

3. Epistemic categorization. Label each match as ONE of:
   - Direct Proof: read the exact logic for the dimension being asked
   - Infrastructure Hint: schema/hooks/types only, not the specific logic
   - Partial Implementation: some footprints present, others missing
   - Global Absence: searched ALL relevant invariants across ENTIRE repo,
     found no footprint

4. Cite without fabricating
   Quote 3-5 lines of actual matched code. Include path + line range IF
   the tool provided them. Never invent line numbers.

5. Conclusion leads with epistemic status:
   "For the [dimension], evidence = [Direct Proof / Infrastructure Hint /
   Partial Implementation / Global Absence]; matches in [files] show [what];
   structural footprint scan of [invariants] returned [result]."

Fallback (Safe Mode):
Answer is "let me check first", NOT "X isn't built", if any of:
- Unable to name the dimension precisely
- Footprint scan returned matches you haven't read
- Unsure which architectural pattern applies AND haven't searched alternatives
- The user pushed back on a similar claim recently

Self-check triggers:
- "I'd remember if we built this"
- "BACKLOG looks confident"
- "I just need to check one file"
- "My mental model of this system feels obvious" (← especially this one)

Honest limits:
- Wrong mental model of the architecture can still produce structurally
  rigorous wrong audits. The stack-discipline sub-step is a hedge, not a fix.
- Generated code, external services, dynamic dispatch, and indirection can
  evade footprint scans even when the feature exists.
- "Global" means global-within-visible-code, not global-within-system.
- Discipline is in the practice, not the prose. A 700-token rule
  half-followed is worse than a 200-token rule actually followed.
- This rule reduces but does not eliminate misclaims.
- When the architectural ontology is unclear, ask the user before concluding.

That is the rule. Now the reasoning.

I had Claude Code as my coding agent on a personal automation project. By 11 AM one morning, the agent had confidently claimed "feature X is not built" four times in a row, each time wrong, each time caught only by my pushback. The pattern was identical: trust the project's BACKLOG framing, do a narrow grep, miss adjacent layers, declare absence.

The standard hallucination story does not fit. The agent searched. It searched fine. It just searched for the wrong shapes.

What I observed: the agent was searching by name when it should be searching by shape. A feature can be called anything. A feature cannot exist without leaving structural residue. There has to be a route, a schema, a registered tool, a scheduled job, a documented decision. When the agent searches by name, it is asking what string would this feature use (a question about vocabulary). When it searches by shape, it is asking what artifact would this feature require (a question about architecture).

The rule above forces the second question.

I ran the rule through eight critiques across four rounds before settling on this version. The biggest substantive shift was structural-footprint-vs-synonyms. The earlier draft had me generating better synonyms when stuck. That just relocates the dependency on the agent's imagination. The structural-footprint version asks a different question: what artifact would prove the feature exists? Then grep for that artifact. The dependency moves from imagination to architectural knowledge, which is more reliable.

The other major addition was the absence-scope distinction: "I searched module X and found nothing" is a scope claim, not a fact claim. The fix is making absence claims global on the architectural invariant.

The rule has known limits. They are listed in the rule itself. The biggest one is wrong-ontology rigor: an agent could generate a structurally rigorous footprint search against the wrong architectural pattern (e.g., search GraphQL patterns on a REST system) and confidently confirm absence. The stack-discipline sub-step is a hedge, not a fix.

What I want from you:

Try it. Run it as a system rule on a project where you use an AI coding agent for a few sessions.
Tell me what breaks. Specifically: hallucination shapes the structural footprint search would NOT catch, audit-theater patterns where the form is satisfied without the substance, over-triggering on questions that were not actually absence claims.
Tell me what you have written. If you have rules in your own CLAUDE.md or system prompt that solve adjacent problems, I want to read them.

I am running this on a separate project for two to three weeks before deciding whether to graduate it to my global agent configuration. After that I will know whether to keep it, refine it, or archive it. Your test reports compress that timeline.

Reply or DM on whichever platform you found this.

The misspelling stays.

DEV Community

Pre-Build Existence Audit Rule : looking for the failure modes I'm still missing

Top comments (0)