DEV Community

Cover image for I was embarrassed by my RAG demo. Turns out the bug was never in my code.
Gunjan Tailor
Gunjan Tailor

Posted on

I was embarrassed by my RAG demo. Turns out the bug was never in my code.

I showed my RAG app to a friend.

He asked: "which region grew the most last quarter?"

It said Europe. The answer was Asia. By a lot.

I spent two days debugging embeddings, chunk sizes, temperature settings.
The bug was none of those things.

The table had been turned into this:

"45.2% Q3 Europe 38.1% Q2 Asia 41.7%..."

Numbers with no headers. No caption. No context.
The LLM wasn't hallucinating. It was working with garbage.

πŸ› οΈ So I built the thing I wished existed
Meet DocNest β€” not another chunker.
A document normalization engine that reads structure before touching content.

Every heading β†’ a navigable Β§section with its own ID
Every table β†’ preserved as { caption, headers, rows[] } JSON
Every section β†’ one-sentence LLM summary + BM25 keyword index
All of it β†’ packed into a portable .udf file

python

from docnest.pipeline import DocNestPipeline
from docnest.reader import UDFIndex

# Convert β€” runs once, costs a few LLM calls
pipeline = DocNestPipeline(
    llm_provider="groq",           # free tier works perfectly
    llm_api_key="gsk_...",
    emb_provider="huggingface",    # local, no API key needed
)
pipeline.convert("report.pdf")    # β†’ report.udf βœ“

# Query
idx = UDFIndex.load("report.udf")
result = idx.query("Which region had the highest Q3 growth?")

print(result.answer)       # "Asia grew the most, up +12.4pp"
print(result.layer_used)   # 1
print(result.tokens_used)  # 0  ← yes, really. zero.


Enter fullscreen mode Exit fullscreen mode

βœ… Zero tokens. Correct answer. 18ms.
That's not a cherry-picked example. Here's why it's possible.

⚑ The 5-layer query engine
Instead of dumping the full document into an LLM, queries escalate through layers β€” stopping the moment one can answer confidently.
LayerWhat it doesTokensSpeed0Pre-computed summary + key numbers0< 1ms1BM25 + cosine β†’ lands on exact Β§section0< 20ms2Section-scoped LLM call~3001–3s3Multi-section synthesis~9002–5s4Full document fallback~4000+5–15s
I expected layers 2–4 to do most of the work.

🀯 Layers 0 and 1 handle roughly 70% of real-world questions β€” at zero token cost.
Seven out of ten queries answered from a structured index. You pay for LLM compute only when genuine reasoning is needed.

πŸ“Š Real numbers. Not vibes.
25 questions. 500-page open-source nutrition textbook. PyMuPDF + Groq free tier.
Question typeScoreBasic facts (calories, macros)βœ… 5/5Detailed nutrition (fiber, glycemic index)βœ… 5/5Micronutrients (vitamins, minerals)βœ… 4/5Hard synthesis (BMR, omega-3, antioxidants)βœ… 5/5Edge cases + hallucination trapsβœ… 5/5Total24/25 β€” 96%
The one failure: a table-only page where the text parser extracted nothing.
Fix: use DoclingPDFParser for image-heavy or scanned PDFs.

🧠 Handles 600-page PDFs without exploding your RAM
Standard Docling loads the full document into memory. 600 pages on a normal laptop = πŸ’€ out of memory.
DocNest chunks automatically, processes each at full ML quality, merges the output. Peak RAM stays constant regardless of document size.
python

from docnest.parsers.pdf import DoclingPDFParser

# Just works β€” auto-detects large PDFs
raw = DoclingPDFParser().parse("600-page-annual-report.pdf")

# Or tune for your hardware
raw = DoclingPDFParser(chunk_pages=10).parse("report.pdf")  # πŸ’» low RAM
raw = DoclingPDFParser(chunk_pages=50).parse("report.pdf")  # πŸš€ 
Enter fullscreen mode Exit fullscreen mode

speed mode


πŸš€ Try it

bashpip install docnest-ai

Enter fullscreen mode Exit fullscreen mode

Formats: PDF (ML + fast) Β· DOCX Β· XLSX Β· HTML Β· Markdown
LLM providers: Groq (free) Β· OpenAI Β· Ollama (local) Β· Anthropic Β· Mistral Β· Google Β· Cohere
Vector backends: numpy (zero deps) Β· FAISS Β· ChromaDB
bash# CLI β€” because boilerplate is boring
docnest convert report.pdf --llm-provider groq --llm-model llama-3.3-70b-versatile
docnest query report.udf "What are the key financial risks?"
docnest view report.udf # structured HTML viewer in browser
GitHub repo β€” star it if this solved a problem you've had:

GitHub logo tailorgunjan93 / docnest

The document normalization engine RAG has always needed. Parse any document, understand its structure, build RAG that actually works.

DOCNEST Logo

DOCNEST

The document normalization engine RAG has always needed.

CI License: MIT Python PyPI PyPI Downloads Status Stars Contributors

Parse any document. Understand its structure. Build RAG that actually works.

Why DOCNEST β€’ Installation β€’ Quick Start β€’ Python API β€’ PDF Parsing β€’ How It Works β€’ CLI Reference β€’ Providers β€’ Roadmap



The Problem with RAG Today

Every RAG pipeline ingests documents the same broken way:

PDF β†’ extract text β†’ split every 512 chars β†’ embed β†’ store β†’ hope

What gets silently destroyed:

Source What blind chunking loses
Financial report Table row 45.2% | Q3 | Europe has no column headers
Legal contract Clause split mid-sentence across two chunks
API documentation Code example separated from its description
Research paper Figure caption disconnected from its analysis

The LLM receives noise and returns approximate answers. This is not a retrieval problem β€” it is an ingestion problem.

See the difference

Take a financial report with a revenue table…






PyPI: https://pypi.org/project/docnest-ai

Format spec: https://github.com/tailorgunjan93/udf-spec

Top comments (1)

Collapse
 
gunjantailor profile image
Gunjan Tailor

GitHub β†’ github.com/tailorgunjan93/docnest
PyPI β†’ pypi.org/project/docnest-ai
Spec β†’ github.com/tailorgunjan93/udf-spec

The .udf format is an open spec β€” build on it,
extend it, contribute to it.

Stars and contributions genuinely appreciated ⭐

rag #llm #python #ai #opensource #machinelearning

nlp #documentai #vectorsearch #buildinpublic