DEV Community

Cover image for I got tired of writing 30 lines of LangChain boilerplate every time. So I published a fix.
Aman Pandey
Aman Pandey

Posted on

I got tired of writing 30 lines of LangChain boilerplate every time. So I published a fix.

Every time I started a new project that needed RAG, I wrote the same 30 lines.

Load documents. Split them. Embed them. Store them. Build a retriever. Wire up a prompt template. Build a chain. Handle the response format. Add reranking later when results were bad. Add GraphRAG even later when cross-document queries failed. Add a watchdog when the index went stale.

Every single project. From scratch. Every time.

I got tired of it. So I built ragbox-core and published it to PyPI.

pip install ragbox-core
Enter fullscreen mode Exit fullscreen mode
from ragbox import RAGBox

rag = RAGBox("./docs")
print(rag.query("What is the vacation policy?"))
Enter fullscreen mode Exit fullscreen mode

3 lines. Everything else runs automatically.


What "automatically" actually means

When you point RAGBox at a folder, here's what runs without you touching it:

Document parsing — PDFs, text files, PowerPoints, Python files with AST parsing. It figures out the file type and routes accordingly.

Chunking — late chunking with context awareness, not naive 1000-token splits. The chunk boundary problem is real and most tutorials ignore it.

Embedding + FAISS indexing — Sentence-BERT embeddings, FAISS ANN index, TTL-cached so repeat queries hit cache instead of re-embedding.

Knowledge graph construction — the non-obvious one. RAGBox runs entity extraction on every document using an LLM, builds a Leiden-clustered knowledge graph, and persists it. This is what makes cross-document queries work.

Dual-mode routing — simple factual query goes fast path, skips the graph, ~12ms. Complex relationship or multi-hop query goes deep path: graph traversal, cross-encoder reranking, multi-query expansion.

Self-healing watchdog — background process watches the source folder. File changes? Re-chunks, re-embeds, updates the graph. Index never goes stale.


The thing that actually makes cross-document reasoning work

Most RAG tutorials give you vector search. Vector search is great for factual lookups.
It fails on questions like:

  • "Who does Maria Santos report to?" — requires connecting two documents
  • "What caused the Q4 revenue miss and who was responsible?" — requires 3+ documents
  • "How did the infrastructure outage relate to the deployment decision?" — requires causal reasoning across docs

Vector search retrieves the most semantically similar chunks. It doesn't reason about relationships between entities across documents. GraphRAG does.

Here's the honest benchmark result:

Relationship Questions (Cross-Document)

"Who does Maria Santos report to?"
  RAGBox:  0.767
  Vanilla: 0.959   ← vanilla wins here

"Which executive is responsible for both security and compliance?"
  RAGBox:  0.836
  Vanilla: 0.819   ← RAGBox wins here

Multi-Hop Questions (3+ Documents)

"Relationship between deployment strategy and the SEV1 incident?"
  RAGBox:  0.000
  Vanilla: 0.802   ← vanilla wins badly

"Plan to grow from $185M to $250M ARR?"
  RAGBox:  0.614
  Vanilla: 0.609   ← effectively tied
Enter fullscreen mode Exit fullscreen mode

I published these results including the ones where RAGBox loses badly. Because if you're deciding whether to use a library, you need real numbers, not cherry-picked wins.

Honest summary: vanilla ChromaDB beats RAGBox on simple factual lookups and some multi-hop queries where graph extraction fails. RAGBox wins when the answer genuinely requires connecting entities across documents. Know what you're optimizing for.


The decisions that weren't obvious

Why Cross-Encoder reranking?

Bi-encoder similarity is fast but blunt — it scores query-document similarity in embedding space. Cross-encoders read the query and document together and produce a fine-grained relevance score. Slower, but dramatically more precise.

RAGBox uses bi-encoder for retrieval speed and ms-marco Cross-Encoder for reranking the top-k results. Wrong results at 5ms are worse than right results at 12ms.

Why Leiden instead of Louvain?

Leiden guarantees well-connected communities. Louvain can generate disconnected communities in practice. For document knowledge graphs, this shows up in multi-hop queries where the traversal path matters.

Why not just wrap LangChain?

I tried. When something goes wrong in a LangChain chain, the traceback is useless. RAGBox is a direct implementation — every component is inspectable, every failure has a clear source.

Why publish the comparison table that includes where you lose?

Because I'm a library user too. The COMPARISON.md in the repo has the full side-by-side including where LlamaIndex or LangChain is the right call. Use the right tool.


When to use this vs. when not to

Use RAGBox if:

  • You want a working RAG system today, not after three days of wiring LangChain
  • You need cross-document reasoning without building GraphRAG from scratch
  • You're building internal tools, prototypes, or MVPs
  • You want honest benchmarks you can reproduce yourself

Don't use RAGBox if:

  • You need custom retrieval pipelines with specific SLAs
  • You're building a commercial product and need to control every component
  • Your queries are purely simple factual lookups — vanilla vector search will be faster

Reproduce the benchmarks yourself

git clone https://github.com/ixchio/ragbox-core
cd ragbox-core
export GROQ_API_KEY="gsk_..."   # free tier works
python benchmarks/run_benchmark.py
Enter fullscreen mode Exit fullscreen mode

15 questions across 8 interconnected documents. 5 factual, 5 relationship, 5 multi-hop. Scored with sentence-transformer cosine similarity. Real LLM calls, no mocks.

If you get different results, open an issue. I want to know.


pip install ragbox-core

github.com/ixchio/ragbox-core

pypi.org/project/ragbox-core

MIT license. PRs welcome. If it saves you the boilerplate, give it a star.

Top comments (2)

Collapse
 
klement_gunndu profile image
klement Gunndu

The dual-mode routing between simple factual queries and multi-hop is clever — that 12ms fast path makes a real difference in UX. Worth noting the late chunking approach pairs well with reranking when your corpus has inconsistent document lengths.

Collapse
 
byteakp profile image
Aman Pandey

exactly right the fast path exists specifically because GraphRAG adds -1.5s on simple queries where it's overkill and u nailed the chunking-reranking relationship inconsistent doc lengths are where naive fixed-size chunking breaks hardest the reranker catches what slips through.
what corpus are you working with?