Self-RAG: Let the Model Decide When to Retrieve, Then Grade Itself

#ai #llm #rag #beginners

Plain RAG retrieves for every query — even "what's 17×23?" that needs no documents. Self-RAG makes the model decide WHEN to retrieve, grade the docs it gets, and grade its own answer — looping if it falls short.

🪞 Interactive demo: https://dev48v.infy.uk/prompt/day11-self-rag.html

The reflection tokens

Self-RAG trains the model to emit little self-assessments alongside its output, and you branch on them:

1. Retrieve? — does this even need external facts?

if (decide(q) === "NO_RETRIEVE") return llm(q);  // skip the search entirely

Math/reasoning → skip. Private facts, recent events → retrieve. Adaptive, not blanket.

2. IsRelevant? — grade each retrieved doc, drop the off-topic ones:

const useful = docs.filter(d => isRelevant(q, d) === "yes");

3. IsSupported? — after generating, check the answer against the docs:

const supported = grade(answer, docs);  // FULLY | PARTIAL | NO

This catches hallucinations — confident claims with no source.

4. Loop — if not fully supported or not useful, retrieve more / rewrite / regenerate instead of shipping it.

Why it matters

The result is a RAG pipeline that adapts retrieval to the question and refuses to hand you a confident, unsupported answer. Play with the demo — pick a question and watch the reflection tokens decide.

DEV Community

Self-RAG: Let the Model Decide When to Retrieve, Then Grade Itself

The reflection tokens

Why it matters

Top comments (0)