DEV Community

Nova
Nova

Posted on

RAG Without the Buzzwords: A Practical “Ask Your Docs” Checklist

“RAG” (retrieval-augmented generation) sounds fancy, but in practice it’s just this:

  • You have a pile of docs.
  • You want the model to answer questions using those docs.
  • You want it to stop hallucinating.

Most “ask your docs” projects fail for boring reasons:

  • bad chunking
  • missing citations
  • unclear answer policy
  • no fallback when docs don’t cover it

Here’s the checklist I use when building small internal doc assistants.


The mental model: search first, answer second

A basic RAG loop is:

  1. Retrieve: find relevant chunks (search)
  2. Compose: assemble a context pack (top-k snippets)
  3. Answer: prompt the model with those snippets
  4. Guardrails: require citations + “I don’t know” behavior

The model is the last step, not the first.


Chunking checklist (aka: where quality is won)

Chunking is not a theoretical exercise. It’s an engineering tradeoff.

My rules of thumb:

  • Chunk by meaning, not by token count (headings > arbitrary splits)
  • Target 200–500 tokens per chunk for dense docs
  • Add overlap (10–20%) for continuity
  • Store metadata: doc title, section heading, URL, last updated
  • Prefer stable IDs so you can re-index safely

If you chunk badly, retrieval returns nonsense and the model does what it always does: improvises.


The “context pack” pattern (copy/paste)

When you feed chunks to the model, don’t just dump them.

Create a context pack that looks like a mini knowledge base.

You are answering using ONLY the sources below.

SOURCES:
[1] (title: <...>, url: <...>)
<chunk text>

[2] (title: <...>, url: <...>)
<chunk text>

RULES:
- If the answer is not in SOURCES, say: "I don't know based on the provided docs." 
- Cite sources like [1], [2] next to the relevant sentences.
- Be concise and practical.

QUESTION:
<user question>
Enter fullscreen mode Exit fullscreen mode

This single prompt format change eliminates a shocking amount of hallucination.


Answer policy: force “I don’t know” when needed

If you don’t explicitly allow the model to say “I don’t know”, it will try to be helpful by making things up.

Add a hard rule:

  • “If not in sources: say you don’t know.”

And (important) test it with questions you know aren’t covered.


A tiny implementation (Python) you can build on

Here’s a minimal Python sketch using embeddings + cosine similarity. Swap in your favorite vector DB later.

# rag.py
import numpy as np

def cosine(a, b):
    a = a / (np.linalg.norm(a) + 1e-9)
    b = b / (np.linalg.norm(b) + 1e-9)
    return float(np.dot(a, b))

def top_k(query_emb, chunks, k=5):
    scored = []
    for c in chunks:
        scored.append((cosine(query_emb, c["emb"]), c))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [c for _, c in scored[:k]]

def build_context_pack(retrieved):
    out = ["SOURCES:"]
    for i, c in enumerate(retrieved, 1):
        out.append(f"[{i}] (title: {c['title']}, url: {c['url']})\n{c['text']}\n")
    return "\n".join(out)
Enter fullscreen mode Exit fullscreen mode

The secret sauce isn’t the math. It’s the prompt policy + chunk quality.


Debugging retrieval: 3 quick sanity checks

When answers are bad, I run these checks:

  1. Print retrieved chunk titles for each query
    • Are you getting the right doc? If not, your embeddings/search are off.
  2. Show the query + retrieved text
    • Are the chunks semantically related or just keyword matches?
  3. Log “no answer” rate
    • If the bot never says “I don’t know”, your guardrails aren’t working.

A prompt for “doc-aware” answers (copy/paste)

You are a documentation assistant for developers.

You MUST follow these rules:
- Use ONLY the SOURCES provided.
- If SOURCES do not contain the answer, say: "I don't know based on the provided docs." 
- Cite sources as [1], [2] after each relevant sentence.

Write the answer as:
- TL;DR (2–3 bullets)
- Details (short paragraphs)
- Next steps (checklist)

SOURCES:
<insert context pack>

QUESTION:
<insert question>
Enter fullscreen mode Exit fullscreen mode

This format yields answers people can actually act on.


Common pitfalls (seen in the wild)

  • Too many chunks: the model drowns, and ignores sources.
  • No citations: you can’t trust or debug the system.
  • Stale docs: retrieval works, but answers are wrong.
  • Mixing policies: “use only sources” + “also answer from general knowledge” = confusion.

Pick a policy and enforce it.


Want more workflow templates?

I share the prompt patterns I use for real dev workflows in my Prompt Engineering Cheatsheet.

Top comments (0)