A RAG (retrieval-augmented generation) assistant is supposed to answer from your documents, not from the model's imagination. But teams keep shipping bots that confidently invent prices, policies, and features that don't exist. The hallucination usually isn't the model "being creative" — it's a retrieval and prompting problem you can fix. Here's the checklist I actually use.
Hallucination is usually a retrieval failure, not a model failure
If the right chunk never makes it into the context, the model fills the gap by guessing. So before blaming the LLM, ask: did retrieval even return the correct passage? Most "the AI lied" bugs are really "we fed it the wrong context" bugs. Fix retrieval first; it's where the biggest wins are.
Chunk for meaning, not for byte count
Splitting documents every N characters cuts sentences in half and scatters one answer across chunks.
- Split on semantic boundaries — headings, sections, list items — not arbitrary lengths.
- Keep chunks focused: one idea per chunk retrieves far more accurately than a wall of mixed topics.
- Add a little overlap so context isn't lost at the seams.
- Attach metadata (source, title, date) to every chunk — you'll need it for filtering and citations.
Retrieve better than plain vector search
Pure embedding search misses exact terms (product codes, names, numbers).
- Use hybrid search: combine semantic (vector) with keyword (BM25) so both meaning and exact matches are covered.
- Add a reranker to reorder the top candidates before they hit the prompt — it consistently lifts answer quality.
- Tune how many chunks you pass. Too few starves the model; too many buries the answer in noise.
Make "I don't know" a first-class answer
This single instruction prevents most embarrassing hallucinations: tell the model to answer only from the provided context, and to say it doesn't know when the context doesn't contain the answer.
If the answer is not in the provided context, say you don't have that information — do not guess.
A bot that admits a gap is trustworthy. A bot that confidently makes something up costs you a customer.
Demand citations and ground every claim
Have the model cite the source chunk for each statement. Two benefits: users can verify, and you can automatically flag answers where the cited text doesn't actually support the claim. If it can't cite, it shouldn't assert.
Evaluate, don't vibe-check
You can't improve what you don't measure.
- Build a small test set of real questions with known correct answers.
- Track groundedness (is the answer supported by retrieved context?) and retrieval hit rate (did the right chunk show up?).
- Re-run it on every change to chunking, embeddings, or prompts so you catch regressions before users do.
A reliable default stack
Clean, semantically chunked docs with metadata → hybrid retrieval + reranker → a strict "answer only from context, cite sources, admit unknowns" prompt → an eval set guarding the whole thing. Boring, measurable, and it stops the confident nonsense.
Done right, a RAG assistant becomes the thing customers trust for accurate answers 24/7 instead of a liability. I build RAG assistants and AI automation that stay grounded in real data as a freelancer — you can see examples at vengstudio.online. Questions about retrieval or grounding? Drop them in the comments.
Top comments (0)