Why “Please Don’t Make Recommendations” Is Not a Guardrail for RAG

#ai #llm #rag #machinelearning

You built a system to surface information so a person could decide. Somewhere it started deciding for them — the output stopped saying "here's what the documents show" and started saying "you should do X." Nobody designed that drift. An LLM, when asked a question, produces an answer-shaped thing, and an answer easily becomes a verdict.

What everyone tries

A prompt instruction: "Don't make recommendations." "Only state what's in the documents." People add the line and assume the boundary is enforced.

Why it doesn't work

A prompt instruction is a request, not a guardrail. The model follows it most of the time, then on the input that matters produces a confident recommendation anyway, because nothing structurally prevents it. "Please don't make recommendations" is to a guardrail what a sticky note saying "please don't enter" is to a locked door.

And the stakes are higher than they look. When output drifts from evidence to verdict, accountability moves. As long as the system returns evidence and a human decides, the human owns the decision. The moment the system returns a verdict and the human defers, the system is deciding things it was never validated to decide — and when one is wrong, accountability is a blank. High-stakes fields separate evidence extraction from judgment on purpose; most RAG systems erase that line by default.

The one shift

Decide what the output is and enforce it structurally. An output should declare itself: answer, evidence, missing facts, or out-of-scope. "Return decision material, not a decision" has to live in the output contract and in gates — not in a polite request to the model. The system supplies frames; the human supplies verdicts.

This is the output boundary — one of three places production RAG dies.

Read the full version on my blog, where this connects to the RAG Failure Diagnosis Kit for teams debugging production RAG.

Top comments (2)

Tae Kim • Jul 3

The sticky-note-vs-locked-door framing is exactly right. The structural fix we landed on was defining the output schema before writing any prompts: each field was named for what it represents — "evidence_quotes," "coverage_gaps," "out_of_scope_flag" — so the model had to populate those slots rather than produce free-form text where a verdict could hide. The output boundary lives in the Pydantic model, not in a polite instruction, and a post-generation check validates that no verdict language appears in the evidence slot.

mofuteq • Jul 3

Exactly. Once the output contract has named slots like evidence_quotes, coverage_gaps, and out_of_scope_flag, there are fewer places for a conclusion to hide.

Prompts can still guide behavior, but the schema and post-generation checks are where the boundary becomes enforceable. That is the difference between asking a model to return evidence and requiring the system to return it.