If you've built an AI-powered documentation assistant, you've probably hit the same wall I did about six months ago. Your RAG pipeline works fine for simple questions, but the moment someone asks something that spans multiple pages — like "how do I set up authentication with custom middleware?" — the answers start falling apart.
I spent weeks tuning chunk sizes, overlap parameters, and reranking models before realizing the problem wasn't my implementation. It was the paradigm.
The Core Problem With RAG for Docs
RAG (Retrieval-Augmented Generation) treats your documentation like a bag of text chunks. You split everything into pieces, embed them into vectors, and retrieve the most semantically similar ones when a question comes in. For a lot of use cases, this works great.
But documentation isn't a bag of text. It's a tree.
Think about how docs are actually structured:
- Pages live in sections
- Sections have a deliberate ordering
- Pages reference other pages
- A "Getting Started" guide assumes you'll read pages in sequence
- API references are organized by resource, not by semantic similarity
When you chunk all of this into 512-token blocks and toss them into a vector database, you lose the structural relationships that make documentation navigable. The LLM gets fragments without context — like ripping pages out of a manual and shuffling them.
Here's what goes wrong concretely:
-
Lost hierarchy: A chunk about
config.auth.providerloses its relationship to the parent "Configuration" section - Broken cross-references: "See the section above" means nothing when there's no "above"
- Missed multi-page answers: Questions that require synthesizing info from related pages only get fragments from one
- Redundant retrieval: You pull the same content repeatedly because multiple chunks score similarly
The Virtual Filesystem Approach
The idea is deceptively simple: instead of embedding chunks, represent your documentation as a filesystem that the LLM can navigate. Give the model a directory listing and let it decide which files to read — just like a developer would browse docs.
# Instead of: query -> vector search -> chunks -> LLM
# Try: query -> filesystem map -> LLM picks files -> LLM reads files -> answer
doc_tree = {
"getting-started/": {
"_meta": {"title": "Getting Started", "order": 1},
"installation.md": {"summary": "Install via npm/pip, system requirements"},
"quickstart.md": {"summary": "Build your first app in 5 minutes"},
"configuration.md": {"summary": "Config file options, env vars, auth setup"},
},
"api-reference/": {
"_meta": {"title": "API Reference", "order": 2},
"authentication.md": {"summary": "API keys, OAuth, token refresh"},
"endpoints/": {
"users.md": {"summary": "CRUD operations for user resources"},
"projects.md": {"summary": "Project management endpoints"},
}
}
}
The key insight: you're trading vector similarity search for the LLM's own judgment about what's relevant. And honestly? LLMs are surprisingly good at navigating file trees when you give them decent summaries.
Building It Step by Step
Step 1: Generate the File Map
First, parse your documentation into a tree structure with short summaries for each node. These summaries are critical — they're what the LLM uses to decide which files to open.
import os
from pathlib import Path
def build_doc_tree(docs_dir: str) -> dict:
"""Walk the docs directory and build a navigable tree."""
tree = {}
for root, dirs, files in os.walk(docs_dir):
rel_path = os.path.relpath(root, docs_dir)
current = tree
if rel_path != ".":
for part in Path(rel_path).parts:
current = current.setdefault(part + "/", {})
for fname in sorted(files):
if not fname.endswith(".md"):
continue
filepath = os.path.join(root, fname)
content = open(filepath).read()
# Generate a 1-2 sentence summary
# (use an LLM for this during build time — it's a one-time cost)
summary = generate_summary(content)
current[fname] = {
"summary": summary,
"path": filepath,
"tokens": len(content.split()) # rough token estimate
}
return tree
The summaries are the secret sauce here. Spend time making them good. I typically run each page through a smaller model with a prompt like: "Summarize this documentation page in one sentence, focusing on what specific topics and features it covers."
Step 2: Let the LLM Navigate
This is where it gets interesting. Instead of one retrieval step, you give the LLM the tree and let it request files in a tool-use loop.
def answer_question(question: str, doc_tree: dict, llm_client) -> str:
system_prompt = """You are a documentation assistant. You have access to
a virtual filesystem of documentation. Use the available tools to:
1. List directories to see what's available
2. Read specific files to find answers
3. Answer the user's question based on what you've read
Always check the most relevant directories first. You can read
multiple files if needed."""
tools = [
{
"name": "list_directory",
"description": "List contents of a directory with summaries",
"parameters": {"path": "string"}
},
{
"name": "read_file",
"description": "Read the full contents of a documentation file",
"parameters": {"path": "string"}
}
]
# Seed the conversation with the root directory listing
initial_context = format_directory(doc_tree, "/")
# Run the tool-use loop — the LLM decides what to read
return llm_client.run_agent_loop(
system=system_prompt,
user_message=f"Directory listing:\n{initial_context}\n\nQuestion: {question}",
tools=tools,
tool_handlers={
"list_directory": lambda p: format_directory(doc_tree, p),
"read_file": lambda p: read_doc_file(doc_tree, p)
}
)
Step 3: Add a Hybrid Fallback
Pure filesystem navigation can miss things. If the LLM's answer seems uncertain or the question is very specific (like searching for an exact config key), fall back to a traditional search.
def hybrid_answer(question: str, doc_tree: dict, search_index, llm_client):
# Try filesystem navigation first
result = answer_question(question, doc_tree, llm_client)
# If confidence is low, supplement with keyword search
if result.confidence < 0.7:
# Simple BM25 or even grep works here — you don't need vectors
search_hits = search_index.search(question, top_k=3)
supplemental = "\n".join([hit.content for hit in search_hits])
# Re-answer with additional context
result = llm_client.complete(
f"Based on the docs you browsed AND this additional context:"
f"\n{supplemental}\n\nRevise your answer to: {question}"
)
return result
Notice I used BM25 (keyword search) for the fallback, not vector search. For documentation — where users often search for exact function names, config keys, or error messages — keyword matching frequently outperforms semantic similarity.
When This Works (and When It Doesn't)
This approach shines when:
- Your docs have clear hierarchical structure
- Questions often require context from multiple related pages
- The documentation set is moderate-sized (under ~1000 pages)
- Users ask conceptual "how do I" questions
It struggles when:
- Your docs are flat with no meaningful structure
- The doc set is enormous (the file map itself becomes too large for context)
- Questions are hyper-specific keyword lookups (use search for these)
- You need sub-second response times (the multi-turn navigation adds latency)
Performance Observations
After running both approaches side-by-side on the same doc set for a few weeks, here's what I noticed:
- Multi-page questions: The filesystem approach produced noticeably better answers — it could pull from 3-4 related pages naturally
- Latency: Slower on average (2-4 LLM calls vs 1), but the hybrid approach kept simple questions fast
- Token usage: Higher per query, but fewer retries from users asking follow-ups because the first answer was incomplete
- Maintenance: Way simpler than tuning chunking strategies and reranking pipelines
Prevention Tips
If you're still early in building a doc assistant, save yourself some pain:
- Start with your doc structure, not your embedding model. If your docs are poorly organized, neither RAG nor filesystem navigation will save you. Fix the information architecture first.
- Generate summaries at build time. Don't try to summarize on the fly. Pre-compute good summaries for every page and directory, and refresh them when content changes.
- Keep a search index as backup. BM25 is trivially cheap to maintain alongside the filesystem approach. Use it for exact-match queries.
- Measure what users actually ask. I was surprised how many questions were multi-page — if yours aren't, vanilla RAG might be fine.
The filesystem approach isn't a silver bullet. But if you've been fighting chunking strategies and your answers still feel like they're missing context, it's worth trying a fundamentally different retrieval model. Sometimes the best search is no search at all — just let the LLM browse.
Top comments (0)