DEV Community

Cover image for My RAG app confidently told my client the wrong answer. I spent 3 days debugging the wrong thing.
Gunjan Tailor
Gunjan Tailor

Posted on

My RAG app confidently told my client the wrong answer. I spent 3 days debugging the wrong thing.

Picture this.

It's a client demo. They're watching. I type:

"Which region had the highest revenue growth last quarter?"

My RAG app β€” three weeks of work, carefully tuned embeddings, clever prompts β€” responds instantly.

The client nods. Writes it down.

The answer was wrong. By almost double.

I spent three days debugging the wrong things.

Chunk size? Tried 256, 512, 1024. Nothing.
Temperature? 0.0, 0.3, 0.7. Still wrong.
Embeddings model? Swapped three of them. Nope.
Prompt engineering? Added "think step by step", "be precise", "do not hallucinate". 😭

The LLM wasn't hallucinating. It was doing its best with this:

"45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3  Asia   29.3%"
Enter fullscreen mode Exit fullscreen mode

Orphaned numbers. No column headers. No caption. No context.

The original table had all of that. My chunker ate it silently.

⚠️ The bug was never in retrieval. It was in ingestion. And I never thought to look there.


πŸ”₯ The dirty secret of RAG tutorials

Every tutorial shows you this pipeline:

PDF β†’ extract text β†’ chunk at 512 tokens β†’ embed β†’ store β†’ retrieve β†’ answer
Enter fullscreen mode Exit fullscreen mode

Clean. Simple. Completely wrong for structured documents.

Here's what blind chunking silently destroys:

Document What you had What the LLM gets
Financial report Revenue table with headers Orphaned numbers, zero context
Legal contract 3-page clause Split mid-sentence, both halves useless
API docs Function + code example Code separated from its description
Research paper Figure with caption Caption on chunk 7, analysis on chunk 12

πŸ—‘οΈ You're feeding the LLM garbage and expecting gold. The model isn't dumb β€” it's working with broken input.


πŸ› οΈ So I built the thing I wished existed

Meet DocNest β€” not another chunker.

A document normalization engine that reads structure before touching content.

  • Every heading β†’ a navigable Β§section with its own ID
  • Every table β†’ preserved as { caption, headers, rows[] } JSON
  • Every section β†’ one-sentence LLM summary + BM25 keyword index
  • All of it β†’ packed into a portable .udf file
from docnest.pipeline import DocNestPipeline
from docnest.reader import UDFIndex

# Convert β€” runs once, costs a few LLM calls
pipeline = DocNestPipeline(
    llm_provider="groq",           # free tier works perfectly
    llm_api_key="gsk_...",
    emb_provider="huggingface",    # local, no API key needed
)
pipeline.convert("report.pdf")    # β†’ report.udf βœ“

# Query
idx = UDFIndex.load("report.udf")
result = idx.query("Which region had the highest Q3 growth?")

print(result.answer)       # "Asia grew the most, up +12.4pp"
print(result.layer_used)   # 1
print(result.tokens_used)  # 0  ← yes, really. zero.
Enter fullscreen mode Exit fullscreen mode

βœ… Zero tokens. Correct answer. 18ms.
That's not a cherry-picked example. Here's why it's possible.


⚑ The 5-layer query engine

Instead of dumping the full document into an LLM, queries escalate through layers β€” stopping the moment one can answer confidently.

Layer What it does Tokens Speed
0 Pre-computed summary + key numbers 0 < 1ms
1 BM25 + cosine β†’ lands on exact Β§section 0 < 20ms
2 Section-scoped LLM call ~300 1–3s
3 Multi-section synthesis ~900 2–5s
4 Full document fallback ~4000+ 5–15s

I expected layers 2–4 to do most of the work.

🀯 Layers 0 and 1 handle roughly 70% of real-world questions β€” at zero token cost.

Seven out of ten queries answered from a structured index. You pay for LLM compute only when genuine reasoning is needed.


πŸ“Š Real numbers. Not vibes.

25 questions. 500-page open-source nutrition textbook. PyMuPDF + Groq free tier.

Question type Score
Basic facts (calories, macros) βœ… 5/5
Detailed nutrition (fiber, glycemic index) βœ… 5/5
Micronutrients (vitamins, minerals) βœ… 4/5
Hard synthesis (BMR, omega-3, antioxidants) βœ… 5/5
Edge cases + hallucination traps βœ… 5/5
Total 24/25 β€” 96%

The one failure: a table-only page where the text parser extracted nothing.
Fix: use DoclingPDFParser for image-heavy or scanned PDFs.


🧠 Handles 600-page PDFs without exploding your RAM

Standard Docling loads the full document into memory. 600 pages on a normal laptop = πŸ’€ out of memory.

DocNest chunks automatically, processes each at full ML quality, merges the output. Peak RAM stays constant regardless of document size.

from docnest.parsers.pdf import DoclingPDFParser

# Just works β€” auto-detects large PDFs
raw = DoclingPDFParser().parse("600-page-annual-report.pdf")

# Or tune for your hardware
raw = DoclingPDFParser(chunk_pages=10).parse("report.pdf")  # πŸ’» low RAM
raw = DoclingPDFParser(chunk_pages=50).parse("report.pdf")  # πŸš€ speed mode
Enter fullscreen mode Exit fullscreen mode

πŸš€ Try it

pip install docnest-ai
Enter fullscreen mode Exit fullscreen mode

Formats: PDF (ML + fast) Β· DOCX Β· XLSX Β· HTML Β· Markdown

LLM providers: Groq (free) Β· OpenAI Β· Ollama (local) Β· Anthropic Β· Mistral Β· Google Β· Cohere

Vector backends: numpy (zero deps) Β· FAISS Β· ChromaDB

# CLI β€” because boilerplate is boring
docnest convert report.pdf --llm-provider groq --llm-model llama-3.3-70b-versatile
docnest query report.udf "What are the key financial risks?"
docnest view report.udf     # structured HTML viewer in browser
Enter fullscreen mode Exit fullscreen mode

GitHub repo β€” star it if this solved a problem you've had:

GitHub logo tailorgunjan93 / docnest

The document normalization engine RAG has always needed. Parse any document, understand its structure, build RAG that actually works.

DOCNEST Logo

DOCNEST

The document normalization engine RAG has always needed.

CI License: MIT Python PyPI PyPI Downloads Status Stars Contributors

Parse any document. Understand its structure. Build RAG that actually works.

Why DOCNEST β€’ Installation β€’ Quick Start β€’ Python API β€’ PDF Parsing β€’ How It Works β€’ CLI Reference β€’ Providers β€’ Roadmap


The Problem with RAG Today

Every RAG pipeline ingests documents the same broken way:

PDF β†’ extract text β†’ split every 512 chars β†’ embed β†’ store β†’ hope

What gets silently destroyed:

Source What blind chunking loses
Financial report Table row 45.2% | Q3 | Europe has no column headers
Legal contract Clause split mid-sentence across two chunks
API documentation Code example separated from its description
Research paper Figure caption disconnected from its analysis

The LLM receives noise and returns approximate answers. This is not a retrieval problem β€” it is an ingestion problem.

See the difference

Take a financial report with a revenue table…

PyPI: https://pypi.org/project/docnest-ai
Format spec: https://github.com/tailorgunjan93/udf-spec


πŸ”¨ Honesty tax

🚧 This is 0.4.0a2 β€” alpha. It works on real documents, but PPTX parser isn't built yet, Qdrant/Weaviate backends are on the roadmap, and SharePoint/Confluence connectors are planned.

If any of those sound like something you want to build β€” good first issues are labeled and waiting.


πŸ’¬ One question for you

Most RAG infrastructure assumes text extraction is a solved problem.

It isn't. Not for tables. Not for anything where position and relationship carry meaning.

πŸ’¬ What document type has caused you the most RAG pain?

For me it was financial tables. Drop it in the comments β€” if it's a format DocNest doesn't handle yet, that's probably the next parser I build.


Building in the open at github.com/tailorgunjan93/docnest. Stars, issues, and brutal feedback all welcome. πŸ™

Top comments (0)