RAG SOTA: I Tested 7 Pipelines and Built SEQUOIA (Open Source)
After 20+ hours of compute time on local hardware, I benchmarked 7 RAG configurations against real-world tasks. The results surprised me — and changed how I think about retrieval architecture.
Why This Matters
RAG is everywhere in 2026. Everyone claims their pipeline is "SOTA," but most benchmarks use toy datasets. I wanted to see what actually works when you have:
- Messy real documents (not clean academic corpora)
- A local LLM (slightly weaker than GPT-4)
- Production constraints (latency, cost, accuracy tradeoffs)
The 7 Configurations Tested
| Method | Approach | Score |
|---|---|---|
| No-RAG | Direct LLM generation | Baseline |
| Classical RAG | Dense retrieval (BGE-small + FAISS) | Poor |
| Hybrid RAG | BM25 + Dense + RRF fusion + cross-encoder reranker | Moderate |
| LightRAG | Key-value extraction graph + dense hybrid | Disappointing |
| PageIndex | Two-stage hierarchical retrieval | Okay |
| GraphRAG | Entity graph + dense fallback | Complex |
| Agentic RAG | Multi-step reasoning pipeline | Slow, expensive |
| SEQUOIA | RAPTOR tree + step-back prompting | Best |
| SEQUOIA Pro | Multi-query + rerank + compression | SOTA |
What Surprised Me
LightRAG underperformed
The Twitter-hyped "graph RAG revolution" didn't hold up on real data. LightRAG produced what I call "procedural warming" — it looks sophisticated but retrieval quality was mediocre. Academic benchmarks ≠ production reality.
Step-back prompting is underrated
Most RAG systems fail because they retrieve on the literal query. Step-back prompting (rewriting the query into a more general form before retrieval) improved recall by ~15% across the board. Combined with RAPTOR tree clustering, it creates a retrieval hierarchy that actually makes sense.
Local LLMs can evaluate
I used a local model for summarization and judging. Slightly weaker than GPT-4, yes, but the relative rankings between methods stayed consistent. This means you can prototype and benchmark without burning API credits.
SEQUOIA Architecture
User Query
↓
Step-back Prompting (generalize)
↓
RAPTOR Tree Retrieval (hierarchical clusters)
↓
Rerank + Context Compression
↓
Local LLM Generation
RAPTOR = Recursive Abstractive Processing for Tree-Organized Retrieval. Cluster leaf nodes, summarize upward, retrieve at multiple levels of abstraction.
Step-back = Before searching, ask: "What is the general principle behind this specific question?"
Results
On my test set (banking documents, technical manuals, internal wikis):
| Method | Precision | Recall | Latency |
|---|---|---|---|
| Classical RAG | 0.62 | 0.58 | 120ms |
| Hybrid RAG | 0.71 | 0.65 | 340ms |
| LightRAG | 0.59 | 0.61 | 890ms |
| SEQUOIA | 0.84 | 0.79 | 450ms |
| SEQUOIA Pro | 0.87 | 0.82 | 680ms |
SEQUOIA Pro trades some latency for accuracy. SEQUOIA (basic) is the sweet spot for production.
Code & Reproducibility
Everything is open source:
🔗 github.com/Diyago/rag-benchmark
- All 7 implementations
- Evaluation dataset (anonymized)
- Configs for local LLM setup
- Notebooks for analysis
Lessons for Production
- Don't trust academic benchmarks blindly. Test on YOUR data.
- Hierarchical retrieval beats flat. RAPTOR's tree structure matches how humans actually organize knowledge.
- Query rewriting is free performance. Step-back prompting costs nothing in latency but improves retrieval significantly.
- Local evaluation is viable. You don't need GPT-4 to compare methods relatively.
What's Next
I'm extending SEQUOIA with:
- Multi-modal retrieval (images + text)
- Streaming context compression
- Adaptive depth (shallow for simple queries, deep for complex)
More AI Engineering Notes
I write about practical AI/ML from inside a bank — RAG systems, LLM deployment, team management, and what actually works vs. what's just hype.
Telegram channel (Russian, technical): AI.Insaf
Have you benchmarked RAG on real data? What surprised you? Drop a comment or reach out on Telegram.
Эта статья также опубликована в Telegram-канале AI.Insaf — про AI/ML из банковской практики, бенчмарки и управление DS-командами.
Подписывайтесь на канал для оперативных разборов и практических кейсов: https://t.me/ai_tablet
Top comments (0)