When working on integrations (Stripe, APIs, SDKs, etc.), I kept running into the same problem.
You search something simple like:
“how to generate an API key”
And then:
- you open 3–5 documentation pages
- each page explains a different piece
- code examples are there… but not clearly tied to what you need
- you end up stitching everything together manually
Even with AI tools, it’s not much better:
- answers are often generic
- sometimes not grounded in the actual docs
- or missing key implementation details
After going through this over and over again, I decided to build something for it.
🚀 Introducing DocsRAG
DocsRAG is an open-source, local-first platform that turns public documentation into a grounded reasoning layer.
Instead of treating docs as just text to search, it tries to understand their structure and answer questions like an engineer would:
- explanation first
- then relevant code examples
- backed by actual documentation
- with citations
👉 Repo: https://github.com/Ando22/rag-docs
💡 The idea
Most documentation is written for browsing, not for implementation-time reasoning.
But when you’re coding, you don’t want:
- long explanations
- full pages
- or unrelated examples
You want:
👉 “What exactly do I need to do?”
DocsRAG tries to bridge that gap.
😵 The problem
From my experience (and probably yours too):
- Docs are large and fragmented
- Useful answers are spread across multiple pages
- Code examples and explanations are not tightly connected
- AI tools hallucinate when context is weak
- Docs chatbots return “related” answers, not precise ones
And most RAG systems treat everything the same:
- explanation
- reference
- examples
Which leads to… mediocre answers.
🛠️ The approach
DocsRAG is built as a multi-stage pipeline, not just “retrieve and prompt”.
Ingestion
- Crawl public documentation
- Extract structured sections
-
Separate:
- explanation chunks
- code examples
Keep them linked by section/page
Ask flow
- Analyze intent
- Retrieve explanation-first
- Rerank results
- Attach only relevant code examples
- Validate whether the docs actually support the answer
- Generate grounded response with citations
🧠 High-level architecture
⚙️ Tech stack
- FastAPI (backend)
- Next.js + React (frontend)
- Chroma (vector DB)
- SQLite (metadata)
- Trafilatura + BeautifulSoup (extraction)
- OpenAI-compatible models (BYO provider)
- Typer CLI
🔥 Why I think this matters (especially now)
With AI-assisted coding (or what people call “vibe coding”), we move faster.
But:
- docs are still the source of truth
- AI answers are not always reliable
DocsRAG tries to combine both:
👉 fast iteration + grounded answers
🧪 Current state
This is still early, but already working:
- ingest public docs
- ask questions
- get explanation-first answers
- see citations
- attach relevant code examples
🤝 Looking for contributors
If this resonates with you, I’d love contributions.
Interesting areas:
- better parsing (docs are messy 😅)
- retrieval improvements
- code example ranking
- UI/UX improvements
- evaluation / benchmarking
- MCP / agent integrations
Even small contributions (docs, testing, feedback) are super helpful.
🎯 Goal
The long-term goal is simple:
👉 Make documentation actually usable during coding
Not just readable — but actionable.
🙌 Closing
This started from a very personal frustration:
constantly jumping between docs while coding
If you’ve experienced the same thing, I’d love your thoughts
Top comments (0)