The Problem Is Not AI Hallucination - It's Missing Evidence

#agilev #aiagents #aiengineering #traceability

The problem is not AI hallucination. The problem is missing evidence.

When AI makes a mistake, we often call it a hallucination.

But in engineering, the deeper issue is usually simpler:

There was no controlled process around the output.

No clear requirement.
No independent verification.
No traceability.
No risk record.
No approval gate.
No evidence bundle.

That means the problem is not only that AI can be wrong.

The problem is that teams often have no reliable way to prove when AI is right.

Agile V is designed around that gap

Instead of letting an AI agent jump directly from prompt to code, Agile V introduces a verified engineering loop:

Clarify intent.
Create traceable requirements.
Build only against approved requirements.
Generate tests independently.
Red-team the result.
Capture decisions and risks.
Accept only with evidence.

This changes the role of AI

AI is no longer just a code generator.

It becomes part of a controlled engineering workflow.

That matters especially for regulated, safety-relevant, long-lived, or high-risk systems where correctness alone is not enough. You also need to explain how correctness was established.

Explore the projects

agile-v-skills

Agent skills for traceable requirements, independent Red Team verification, human gates, and compliance-ready evidence.

→ github.com/Agile-V/agile_v_skills

agentic-agile-v

A practical scaffold for running AI engineering with structured briefs, evidence bundles, validation gates, and risk-based workflows.

→ github.com/Agile-V/agentic_agile_v

From vibe coding to verified engineering.

Top comments (1)

Harjot Singh • May 31

This reframe is exactly right and it changes what you build. "Hallucination" frames it as a model defect to be cured; "missing evidence" frames it as a system design problem you can actually solve - the model fills gaps with plausible fabrication because you gave it a gap to fill, so the fix isn't a better model, it's making sure every claim is anchored to retrievable evidence and the system can say "I don't have support for this" instead of inventing. Once you treat it as an evidence problem, the engineering becomes concrete: provenance per claim, an abstain path when evidence is thin, and verification against the source rather than against the model's confidence.

This is the core of how I build - don't try to make the model stop hallucinating, make the system refuse to act on unsupported output. It's the spine of Moonshift, the thing I work on: a multi-agent pipeline that takes a prompt to a deployed SaaS, where a verify layer checks each step against evidence/expected behavior before it propagates, so an unsupported claim gets caught instead of shipped. Multi-model routing keeps a build ~$3 flat, first run free no card. Genuinely good framing, this should be the default mental model. Do you enforce the evidence requirement structurally (the system can't answer without a citation), or is it a prompt-level instruction? The structural version is the only one that actually holds under pressure.