0% vs 50%: Making a RAG Agent Refuse to Hallucinate

#ai #machinelearning #rag #llm

0 % vs 50 %: making a RAG agent refuse to hallucinate

2026-05-31 · LLM / RAG

A retrieval-augmented agent is only as trustworthy as its behaviour on questions whose answer
isn't in the corpus. The failure mode is quiet: instead of saying "I don't know," the model
invents a confident, well-formed, wrong answer. This post shows a single guardrail that takes
that from common to never — and, crucially, measures it.

Reference architecture:
nim-agent-blueprint — agentic RAG on
the NVIDIA NIM stack with a built-in eval harness.

The ablation

The agent loop is plan → retrieve → generate → validate. The interesting variable is the
generation prompt's contract with the retrieved context:

Configuration	Out-of-corpus hallucination rate
Generate freely from context	~50 %
Guarded prompt (answer only from context; otherwise abstain)	0 %

Same model, same retriever, same questions. The only change is a prompt that makes "I can't
answer that from the provided sources" a first-class, rewarded output — plus a validate
step that checks the answer is grounded in retrieved spans before returning it. On in-corpus
questions, retrieval recall@3 stayed at 94–100 %, so the guardrail buys safety without
costing coverage.

Why "just prompt better" isn't the lesson

The lesson isn't the prompt — it's that the difference between 50 % and 0 % is invisible
without an eval harness. A demo that only asks in-corpus questions looks perfect in both
configurations. You only see the 50 % when you deliberately ask things the corpus can't
answer and score groundedness. So the blueprint ships with:

retrieval hit-rate (is the answer even retrievable?),
answer groundedness via LLM-as-judge (is the answer supported by what was retrieved?),
latency, and OpenTelemetry traces per agent step.

That's the difference between "it works on my five questions" and "here is the number a
partner can hold me to."

Takeaway

For enterprise RAG, abstention is a feature, not a failure. Make "I don't know" a rewarded
output, validate groundedness before returning, and measure the out-of-corpus rate — it's
the number that separates a demo from something you'd put in front of a customer.

→ Runnable blueprint + eval harness:
github.com/waynehacking8/nim-agent-blueprint

Top comments (1)

Tae Kim • Jun 2

On a RAG I shipped, LLM-as-judge groundedness still leaked a few fabricated quotes. What pinned it at hard zero was a commit-time substring check: the evidence text has to appear verbatim in a retrieved chunk, otherwise drop the claim.