RAG Without the Buzzwords: A Practical “Ask Your Docs” Checklist

#programming #productivity #tutorial

“RAG” (retrieval-augmented generation) sounds fancy, but in practice it’s just this:

You have a pile of docs.
You want the model to answer questions using those docs.
You want it to stop hallucinating.

Most “ask your docs” projects fail for boring reasons:

bad chunking
missing citations
unclear answer policy
no fallback when docs don’t cover it

Here’s the checklist I use when building small internal doc assistants.

The mental model: search first, answer second

A basic RAG loop is:

Retrieve: find relevant chunks (search)
Compose: assemble a context pack (top-k snippets)
Answer: prompt the model with those snippets
Guardrails: require citations + “I don’t know” behavior

The model is the last step, not the first.

Chunking checklist (aka: where quality is won)

Chunking is not a theoretical exercise. It’s an engineering tradeoff.

My rules of thumb:

Chunk by meaning, not by token count (headings > arbitrary splits)
Target 200–500 tokens per chunk for dense docs
Add overlap (10–20%) for continuity
Store metadata: doc title, section heading, URL, last updated
Prefer stable IDs so you can re-index safely

If you chunk badly, retrieval returns nonsense and the model does what it always does: improvises.

The “context pack” pattern (copy/paste)

When you feed chunks to the model, don’t just dump them.

Create a context pack that looks like a mini knowledge base.

You are answering using ONLY the sources below.

SOURCES:
[1] (title: <...>, url: <...>)
<chunk text>

[2] (title: <...>, url: <...>)
<chunk text>

RULES:
- If the answer is not in SOURCES, say: "I don't know based on the provided docs." 
- Cite sources like [1], [2] next to the relevant sentences.
- Be concise and practical.

QUESTION:
<user question>

This single prompt format change eliminates a shocking amount of hallucination.

Answer policy: force “I don’t know” when needed

If you don’t explicitly allow the model to say “I don’t know”, it will try to be helpful by making things up.

Add a hard rule:

“If not in sources: say you don’t know.”

And (important) test it with questions you know aren’t covered.

A tiny implementation (Python) you can build on

Here’s a minimal Python sketch using embeddings + cosine similarity. Swap in your favorite vector DB later.

# rag.py
import numpy as np

def cosine(a, b):
    a = a / (np.linalg.norm(a) + 1e-9)
    b = b / (np.linalg.norm(b) + 1e-9)
    return float(np.dot(a, b))

def top_k(query_emb, chunks, k=5):
    scored = []
    for c in chunks:
        scored.append((cosine(query_emb, c["emb"]), c))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [c for _, c in scored[:k]]

def build_context_pack(retrieved):
    out = ["SOURCES:"]
    for i, c in enumerate(retrieved, 1):
        out.append(f"[{i}] (title: {c['title']}, url: {c['url']})\n{c['text']}\n")
    return "\n".join(out)

The secret sauce isn’t the math. It’s the prompt policy + chunk quality.

Debugging retrieval: 3 quick sanity checks

When answers are bad, I run these checks:

Print retrieved chunk titles for each query
- Are you getting the right doc? If not, your embeddings/search are off.
Show the query + retrieved text
- Are the chunks semantically related or just keyword matches?
Log “no answer” rate
- If the bot never says “I don’t know”, your guardrails aren’t working.

A prompt for “doc-aware” answers (copy/paste)

You are a documentation assistant for developers.

You MUST follow these rules:
- Use ONLY the SOURCES provided.
- If SOURCES do not contain the answer, say: "I don't know based on the provided docs." 
- Cite sources as [1], [2] after each relevant sentence.

Write the answer as:
- TL;DR (2–3 bullets)
- Details (short paragraphs)
- Next steps (checklist)

SOURCES:
<insert context pack>

QUESTION:
<insert question>

This format yields answers people can actually act on.

Common pitfalls (seen in the wild)

Too many chunks: the model drowns, and ignores sources.
No citations: you can’t trust or debug the system.
Stale docs: retrieval works, but answers are wrong.
Mixing policies: “use only sources” + “also answer from general knowledge” = confusion.

Pick a policy and enforce it.