Create account

DEV Community

Gunjan Tailor

Posted on May 21 • Edited on Jun 8

I was embarrassed by my RAG demo. Turns out the bug was never in my code.

#rag #llm #python #ai

I showed my RAG app to a friend.

He asked: "which region grew the most last quarter?"

It said Europe. The answer was Asia. By a lot.

I spent two days debugging embeddings, chunk sizes, temperature settings.
The bug was none of those things.

The table had been turned into this:

"45.2% Q3 Europe 38.1% Q2 Asia 41.7%..."

Numbers with no headers. No caption. No context.
The LLM wasn't hallucinating. It was working with garbage.

🛠️ So I built the thing I wished existed
Meet DocNest — not another chunker.
A document normalization engine that reads structure before touching content.

Every heading → a navigable §section with its own ID
Every table → preserved as { caption, headers, rows[] } JSON
Every section → one-sentence LLM summary + BM25 keyword index
All of it → packed into a portable .udf file

python

from docnest.pipeline import DocNestPipeline
from docnest.reader import UDFIndex

# Convert — runs once, costs a few LLM calls
pipeline = DocNestPipeline(
    llm_provider="groq",           # free tier works perfectly
    llm_api_key="gsk_...",
    emb_provider="huggingface",    # local, no API key needed
)
pipeline.convert("report.pdf")    # → report.udf ✓

# Query
idx = UDFIndex.load("report.udf")
result = idx.query("Which region had the highest Q3 growth?")

print(result.answer)       # "Asia grew the most, up +12.4pp"
print(result.layer_used)   # 1
print(result.tokens_used)  # 0  ← yes, really. zero.

✅ Zero tokens. Correct answer. 18ms.
That's not a cherry-picked example. Here's why it's possible.

⚡ The 5-layer query engine
Instead of dumping the full document into an LLM, queries escalate through layers — stopping the moment one can answer confidently.
LayerWhat it doesTokensSpeed0Pre-computed summary + key numbers0< 1ms1BM25 + cosine → lands on exact §section0< 20ms2Section-scoped LLM call~3001–3s3Multi-section synthesis~9002–5s4Full document fallback~4000+5–15s
I expected layers 2–4 to do most of the work.

🤯 Layers 0 and 1 handle roughly 70% of real-world questions — at zero token cost.
Seven out of ten queries answered from a structured index. You pay for LLM compute only when genuine reasoning is needed.

📊 Real numbers. Not vibes.
25 questions. 500-page open-source nutrition textbook. PyMuPDF + Groq free tier.
Question typeScoreBasic facts (calories, macros)✅ 5/5Detailed nutrition (fiber, glycemic index)✅ 5/5Micronutrients (vitamins, minerals)✅ 4/5Hard synthesis (BMR, omega-3, antioxidants)✅ 5/5Edge cases + hallucination traps✅ 5/5Total24/25 — 96%
The one failure: a table-only page where the text parser extracted nothing.
Fix: use DoclingPDFParser for image-heavy or scanned PDFs.

🧠 Handles 600-page PDFs without exploding your RAM
Standard Docling loads the full document into memory. 600 pages on a normal laptop = 💀 out of memory.
DocNest chunks automatically, processes each at full ML quality, merges the output. Peak RAM stays constant regardless of document size.
python

from docnest.parsers.pdf import DoclingPDFParser

# Just works — auto-detects large PDFs
raw = DoclingPDFParser().parse("600-page-annual-report.pdf")

# Or tune for your hardware
raw = DoclingPDFParser(chunk_pages=10).parse("report.pdf")  # 💻 low RAM
raw = DoclingPDFParser(chunk_pages=50).parse("report.pdf")  # 🚀

speed mode

🚀 Try it

bashpip install docnest-ai

Formats: PDF (ML + fast) · DOCX · XLSX · HTML · Markdown
LLM providers: Groq (free) · OpenAI · Ollama (local) · Anthropic · Mistral · Google · Cohere
Vector backends: numpy (zero deps) · FAISS · ChromaDB
bash# CLI — because boilerplate is boring
docnest convert report.pdf --llm-provider groq --llm-model llama-3.3-70b-versatile
docnest query report.udf "What are the key financial risks?"
docnest view report.udf # structured HTML viewer in browser
GitHub repo — star it if this solved a problem you've had:

tailorgunjan93 / docnest

The document normalization engine RAG has always needed. Parse any document, understand its structure, build RAG that actually works.

DOCNEST

Secure · Fast · Reliable · Cost-Effective

The document normalization engine RAG has always needed.

Why DOCNEST • Installation • Quick Start • Python API • PDF Parsing • How It Works • Benchmark • Providers • Roadmap

The Problem with RAG Today

Every RAG pipeline ingests documents the same broken way:

PDF → extract text → split every 512 chars → embed → store → hope

What gets silently destroyed:

Source	What blind chunking loses
Financial report	Table row `45.2% \| Q3 \| Europe` has no column headers
Legal contract	Clause split mid-sentence across two chunks
API documentation	Code example separated from its description
Research paper	Figure caption disconnected from its analysis