DEV Community

member_2e5ba30f
member_2e5ba30f

Posted on • Originally published at waynehacking8.github.io

0% vs 50%: Making a RAG Agent Refuse to Hallucinate

0 % vs 50 %: making a RAG agent refuse to hallucinate

2026-05-31 · LLM / RAG

A retrieval-augmented agent is only as trustworthy as its behaviour on questions whose answer
isn't in the corpus. The failure mode is quiet: instead of saying "I don't know," the model
invents a confident, well-formed, wrong answer. This post shows a single guardrail that takes
that from common to never — and, crucially, measures it.

Reference architecture:
nim-agent-blueprint — agentic RAG on
the NVIDIA NIM stack with a built-in eval harness.

The ablation

The agent loop is plan → retrieve → generate → validate. The interesting variable is the
generation prompt's contract with the retrieved context:

Configuration Out-of-corpus hallucination rate
Generate freely from context ~50 %
Guarded prompt (answer only from context; otherwise abstain) 0 %

Same model, same retriever, same questions. The only change is a prompt that makes "I can't
answer that from the provided sources" a first-class, rewarded output — plus a validate
step that checks the answer is grounded in retrieved spans before returning it. On in-corpus
questions, retrieval recall@3 stayed at 94–100 %, so the guardrail buys safety without
costing coverage.

Why "just prompt better" isn't the lesson

The lesson isn't the prompt — it's that the difference between 50 % and 0 % is invisible
without an eval harness
. A demo that only asks in-corpus questions looks perfect in both
configurations. You only see the 50 % when you deliberately ask things the corpus can't
answer and score groundedness. So the blueprint ships with:

  • retrieval hit-rate (is the answer even retrievable?),
  • answer groundedness via LLM-as-judge (is the answer supported by what was retrieved?),
  • latency, and OpenTelemetry traces per agent step.

That's the difference between "it works on my five questions" and "here is the number a
partner can hold me to."

Takeaway

For enterprise RAG, abstention is a feature, not a failure. Make "I don't know" a rewarded
output, validate groundedness before returning, and measure the out-of-corpus rate — it's
the number that separates a demo from something you'd put in front of a customer.

→ Runnable blueprint + eval harness:
github.com/waynehacking8/nim-agent-blueprint

Top comments (0)