RAG Explained: Retrieve, Then Answer (the Prompt That Kills Hallucinations)

#ai #beginners #llm #tutorial

An LLM only knows what it saw in training. It doesn't know your company wiki, last week's news, or the PDF you just uploaded. Ask it anyway and it either refuses or — worse — confidently makes something up.

RAG (Retrieval-Augmented Generation) fixes that, and it's far simpler than the name suggests. This is Day 5 of my PromptFromZero series.

RAG in one sentence

Fetch the relevant facts at question time, and hand them to the model to read.

You're not asking the model to remember. You're giving it the page to read.

The three moves

1. Retrieve

Embed the question, find the closest document chunks (vector search), grab the top few:

const hits = await search(question, { k: 3 }); // the 3 most relevant chunks

(The retrieval half is its own topic — embeddings + a vector database. I built exactly that in TechFromZero Day 45 with Postgres + pgvector.)

2. Augment — the prompt that does the heavy lifting

This template is 80% of RAG quality:

const prompt = `Answer using ONLY the context below.
If the answer isn't there, say "I don't know."

Context:
${hits.map(h => "- " + h.text).join("\n")}

Question: ${question}`;

The words "ONLY the context" matter. Without them, the model blends its own (possibly wrong) memory back in. With them, it sticks to the source you gave it.

3. Generate

Send that prompt to the LLM. Done. The answer is now grounded in your documents.

The two knobs

top-k: too small (k=1) and you miss the answer; too big (k=20) and you bury it in noise and pay for tokens. Start at k=3.
chunk size: too big and irrelevant text rides along; too small and meaning is lost. ~300 tokens is a good default.

Make it refuse and cite

Hallucinations mostly happen when the context doesn't contain the answer but the model answers anyway. Two instructions turn a guesser into a librarian:

"If the context lacks the answer, reply exactly: I don't know."
"Quote the chunk you used."

That's it. Retrieve → Augment → Generate. Pair this prompt half with a vector store (pgvector, Pinecone, Chroma...) and you've built "chat with your docs."

📎 Try the interactive RAG playground — watch retrieval + the prompt + the answer: https://dev48v.infy.uk/prompt/day5-rag-basic.html

Day 5 of PromptFromZero. One prompting technique a day, explained for beginners.

DEV Community