Stop Sending the Raw User Prompt Straight to Your Retriever

#ai #llm #rag #machinelearning

A user types a question into your RAG system. Before anything is retrieved, something decides what string to actually search with. In a lot of systems, nobody decided that on purpose — the raw prompt goes straight to the retriever and whether it works is left to luck.

What everyone tries

Query rewriting: have an LLM expand or rephrase the question into better search strings. It's standard now, and it genuinely helps with vocabulary gaps — user says "login broken," docs say "authentication failure," a rewrite bridges them.

Why it doesn't fully work

A rewrite that improves retrieval can also quietly change the question. The user asked one thing; the rewritten query retrieves great documents for a slightly different thing; the answer is fluent, grounded, and not what they asked. Recall went up, fidelity went down, and nothing noticed because retrieval metrics improved.

The real issue: there are two languages in the room — the user's task language and the corpus's author language — and the retriever sits on the boundary. Treating it as "just rewrite the query" hides that you're making a translation decision with consequences.

The one shift

Treat the gap between user question and search query as an explicit interface boundary, not a hidden preprocessing step. Decide on purpose: what does the retriever search for, how is user language reconciled with document language, and does the rewritten query still answer what was asked? Make the translation visible and checkable, so "we rewrote the query" doesn't silently become "we answered a different question."

This is the query boundary — one of three places production RAG dies. Full version on my blog; the diagnostic questions for all three boundaries are in a RAG Failure Diagnosis Kit for teams debugging production RAG. Link in the canonical post.

Top comments (3)

Alex Shev • Jun 23

This is a useful framing. Query rewriting is often treated as a magic cleanup step, but the real job is deciding what evidence the system should search for. I like the idea of making retrieval intent explicit before the retriever sees anything.

mofuteq • Jun 23

Exactly — once you frame it as “what evidence should we search for” rather than “clean up the query,” the whole step changes shape.
The part that bit me in production: making the intent explicit and making it faithful are two different jobs. A rewrite/expansion step will happily turn the user’s question into a sharper, well-formed question that isn’t the one they asked — and then the retriever does its job perfectly against the wrong target. Everything downstream looks healthy; the answer just quietly addresses a question nobody asked.

Alex Shev • Jun 23

Yes, that is the dangerous failure mode: every component looks healthy while the system has quietly aimed at the wrong question. I like separating rewriting into two checks: did we make the search target clearer, and did we preserve the user's actual intent?