“RAG” (retrieval-augmented generation) sounds fancy, but in practice it’s just this:
- You have a pile of docs.
- You want the model to answer questions using those docs.
- You want it to stop hallucinating.
Most “ask your docs” projects fail for boring reasons:
- bad chunking
- missing citations
- unclear answer policy
- no fallback when docs don’t cover it
Here’s the checklist I use when building small internal doc assistants.
The mental model: search first, answer second
A basic RAG loop is:
- Retrieve: find relevant chunks (search)
- Compose: assemble a context pack (top-k snippets)
- Answer: prompt the model with those snippets
- Guardrails: require citations + “I don’t know” behavior
The model is the last step, not the first.
Chunking checklist (aka: where quality is won)
Chunking is not a theoretical exercise. It’s an engineering tradeoff.
My rules of thumb:
- Chunk by meaning, not by token count (headings > arbitrary splits)
- Target 200–500 tokens per chunk for dense docs
- Add overlap (10–20%) for continuity
- Store metadata: doc title, section heading, URL, last updated
- Prefer stable IDs so you can re-index safely
If you chunk badly, retrieval returns nonsense and the model does what it always does: improvises.
The “context pack” pattern (copy/paste)
When you feed chunks to the model, don’t just dump them.
Create a context pack that looks like a mini knowledge base.
You are answering using ONLY the sources below.
SOURCES:
[1] (title: <...>, url: <...>)
<chunk text>
[2] (title: <...>, url: <...>)
<chunk text>
RULES:
- If the answer is not in SOURCES, say: "I don't know based on the provided docs."
- Cite sources like [1], [2] next to the relevant sentences.
- Be concise and practical.
QUESTION:
<user question>
This single prompt format change eliminates a shocking amount of hallucination.
Answer policy: force “I don’t know” when needed
If you don’t explicitly allow the model to say “I don’t know”, it will try to be helpful by making things up.
Add a hard rule:
- “If not in sources: say you don’t know.”
And (important) test it with questions you know aren’t covered.
A tiny implementation (Python) you can build on
Here’s a minimal Python sketch using embeddings + cosine similarity. Swap in your favorite vector DB later.
# rag.py
import numpy as np
def cosine(a, b):
a = a / (np.linalg.norm(a) + 1e-9)
b = b / (np.linalg.norm(b) + 1e-9)
return float(np.dot(a, b))
def top_k(query_emb, chunks, k=5):
scored = []
for c in chunks:
scored.append((cosine(query_emb, c["emb"]), c))
scored.sort(key=lambda x: x[0], reverse=True)
return [c for _, c in scored[:k]]
def build_context_pack(retrieved):
out = ["SOURCES:"]
for i, c in enumerate(retrieved, 1):
out.append(f"[{i}] (title: {c['title']}, url: {c['url']})\n{c['text']}\n")
return "\n".join(out)
The secret sauce isn’t the math. It’s the prompt policy + chunk quality.
Debugging retrieval: 3 quick sanity checks
When answers are bad, I run these checks:
-
Print retrieved chunk titles for each query
- Are you getting the right doc? If not, your embeddings/search are off.
-
Show the query + retrieved text
- Are the chunks semantically related or just keyword matches?
-
Log “no answer” rate
- If the bot never says “I don’t know”, your guardrails aren’t working.
A prompt for “doc-aware” answers (copy/paste)
You are a documentation assistant for developers.
You MUST follow these rules:
- Use ONLY the SOURCES provided.
- If SOURCES do not contain the answer, say: "I don't know based on the provided docs."
- Cite sources as [1], [2] after each relevant sentence.
Write the answer as:
- TL;DR (2–3 bullets)
- Details (short paragraphs)
- Next steps (checklist)
SOURCES:
<insert context pack>
QUESTION:
<insert question>
This format yields answers people can actually act on.
Common pitfalls (seen in the wild)
- Too many chunks: the model drowns, and ignores sources.
- No citations: you can’t trust or debug the system.
- Stale docs: retrieval works, but answers are wrong.
- Mixing policies: “use only sources” + “also answer from general knowledge” = confusion.
Pick a policy and enforce it.
Want more workflow templates?
I share the prompt patterns I use for real dev workflows in my Prompt Engineering Cheatsheet.
- Free sample: https://getnovapress.gumroad.com/l/prompt-sample
- Full shop (Cheatsheet $9+): https://getnovapress.gumroad.com
Top comments (0)