<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gunjan Tailor</title>
    <description>The latest articles on DEV Community by Gunjan Tailor (@gunjantailor).</description>
    <link>https://dev.to/gunjantailor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938215%2F3cbbb4e8-fd61-4eac-aeda-e6d393ac966c.png</url>
      <title>DEV Community: Gunjan Tailor</title>
      <link>https://dev.to/gunjantailor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gunjantailor"/>
    <language>en</language>
    <item>
      <title>I Built a Local-First AI Desktop Knowledge Base — Here's What I Learned</title>
      <dc:creator>Gunjan Tailor</dc:creator>
      <pubDate>Sat, 30 May 2026 10:17:30 +0000</pubDate>
      <link>https://dev.to/gunjantailor/i-built-a-local-first-ai-desktop-knowledge-base-heres-what-i-learned-3o4a</link>
      <guid>https://dev.to/gunjantailor/i-built-a-local-first-ai-desktop-knowledge-base-heres-what-i-learned-3o4a</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmin4f2eaw7rqgazq27p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmin4f2eaw7rqgazq27p2.png" alt=" "&gt;&lt;/a&gt;# I Built a Local-First AI Desktop Knowledge Base — Here's What I Learned&lt;/p&gt;

&lt;p&gt;After building &lt;a href="https://pypi.org/project/docnest-ai/" rel="noopener noreferrer"&gt;docnest-ai&lt;/a&gt; — a hybrid RAG engine for Python — the next logical question was: &lt;em&gt;what does a great end-user app built on top of it actually look like?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That question led me to build &lt;strong&gt;Knovex&lt;/strong&gt;: a local-first, AI-powered desktop knowledge base that runs entirely on your machine. No cloud uploads. No subscriptions. No data leakage. Just drop in your documents, ask questions, and learn.&lt;/p&gt;

&lt;p&gt;This post covers the architecture decisions, the problems I hit, and the interesting technical bits. If you want to skip straight to the app: &lt;a href="https://tailorgunjan93.github.io/knovex" rel="noopener noreferrer"&gt;tailorgunjan93.github.io/knovex&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why build a desktop app in 2026?
&lt;/h2&gt;

&lt;p&gt;Every AI knowledge tool I tried had the same deal: your documents leave your machine. Legal contracts, research notes, personal journals — all uploaded to some company's inference server. The privacy trade-off felt wrong.&lt;/p&gt;

&lt;p&gt;The local-first principle changes the threat model entirely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your files never leave your machine unless &lt;em&gt;you&lt;/em&gt; choose to enable cloud features&lt;/li&gt;
&lt;li&gt;The app works fully offline (use Ollama for a zero-network setup)&lt;/li&gt;
&lt;li&gt;API keys are encrypted at rest with Fernet AES-128, readable only by your OS account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The constraint also forced better engineering. When you can't lean on a cloud backend, you have to make the local stack actually fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture overview
&lt;/h2&gt;

&lt;p&gt;Knovex is a fully decoupled tri-layer app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│  Electron 33 (desktop shell)            │
│  ┌─────────────────────────────────┐    │
│  │  React 18 + MUI v6 + TypeScript │    │
│  │  TanStack Query v5 + Zustand    │    │
│  └──────────────┬──────────────────┘    │
└─────────────────│───────────────────────┘
                  │  REST + SSE  (localhost:8765)
┌─────────────────▼───────────────────────┐
│  FastAPI + Python 3.11                  │
│  docnest-ai (hybrid RAG engine)         │
│  SQLite WAL + FTS5                      │
│  LiteLLM (multi-provider LLM bridge)    │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend is a pure API consumer — it knows nothing about RAG, embeddings, or LLMs. All intelligence lives in the Python backend. This made it very easy to swap out components independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Electron?
&lt;/h3&gt;

&lt;p&gt;Electron gets a bad reputation, but for a privacy-first desktop app it's the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single installer ships backend binary (PyInstaller) + frontend + Electron in one &lt;code&gt;.exe&lt;/code&gt;/&lt;code&gt;.dmg&lt;/code&gt;/&lt;code&gt;.AppImage&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The backend process is spawned as a child process, communicates over localhost&lt;/li&gt;
&lt;li&gt;Window state, tray, native OS file dialogs — all handled properly&lt;/li&gt;
&lt;li&gt;Cross-platform with one codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The binary is ~85-92 MB depending on platform. Not tiny, but users get zero setup — no Python, no Node, no CLI gymnastics.&lt;/p&gt;




&lt;h2&gt;
  
  
  The RAG engine: docnest-ai
&lt;/h2&gt;

&lt;p&gt;Rather than naive chunking (split every 512 chars → embed → hope), docnest-ai runs a 6-stage normalization pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Structure extraction&lt;/strong&gt; — reads heading hierarchy, tables, lists (Docling or PyMuPDF)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Section assignment&lt;/strong&gt; — every heading becomes a navigable &lt;code&gt;§section&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Table normalization&lt;/strong&gt; — &lt;code&gt;{ caption, headers, rows[] }&lt;/code&gt; JSON, never loses column context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Section summarization&lt;/strong&gt; — LLM called once per document&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document intelligence&lt;/strong&gt; — summary, key numbers, insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding + quantize&lt;/strong&gt; — BM25 keywords + float16 vectors&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stages 1–3 and 6 run locally at zero LLM cost. Stages 4–5 call an LLM &lt;em&gt;once per document&lt;/em&gt; at ingest time. Every future query benefits from that upfront investment for free.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query resolution: five layers
&lt;/h3&gt;

&lt;p&gt;The query engine tries cheaper layers first before escalating:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L0&lt;/td&gt;
&lt;td&gt;Pre-computed summary/insights&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&amp;lt; 1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;td&gt;BM25 + cosine → navigate to §section&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&amp;lt; 20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2&lt;/td&gt;
&lt;td&gt;Section-scoped LLM&lt;/td&gt;
&lt;td&gt;~300&lt;/td&gt;
&lt;td&gt;1–3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3&lt;/td&gt;
&lt;td&gt;Multi-section synthesis&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;td&gt;2–5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L4&lt;/td&gt;
&lt;td&gt;Full-document fallback&lt;/td&gt;
&lt;td&gt;~4000+&lt;/td&gt;
&lt;td&gt;5–15s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practice, L0+L1 answer ~70% of real-world questions at zero LLM cost. You only pay when you genuinely need the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic search (v0.7.0+)
&lt;/h3&gt;

&lt;p&gt;For Knovex v0.7.0 I added hybrid semantic search on top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ONNX-based local embeddings (all-MiniLM-L6-v2, ~45 MB, one-time download)
# OR OpenAI text-embedding-3-small via API
&lt;/span&gt;
&lt;span class="c1"&gt;# Results fused with Reciprocal Rank Fusion (RRF):
# score = 1/(k + rank_fts5) + 1/(k + rank_ann)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RRF fusion handles the case where BM25 ranks a document high on keyword match but the semantic model ranks it high on conceptual similarity. The union tends to beat either individually.&lt;/p&gt;

&lt;p&gt;Average query latency on a typical KB is still sub-millisecond for the FTS5 path and ~0.9s end-to-end including the LLM call on an M-series Mac.&lt;/p&gt;




&lt;h2&gt;
  
  
  Learn Mode: turning documents into learning sessions
&lt;/h2&gt;

&lt;p&gt;This was the most fun feature to build. The idea: instead of just answering questions, the app can generate structured learning content from any document or topic.&lt;/p&gt;

&lt;p&gt;Nine formats, all streaming via SSE:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quiz&lt;/strong&gt; — interactive MCQ with XP rewards per question&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flashcards&lt;/strong&gt; — spaced repetition with interval scheduling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mind Map&lt;/strong&gt; — collapsible JSON tree rendered with D3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeline&lt;/strong&gt; — chronological events extracted from the text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guided&lt;/strong&gt; — step-by-step walkthrough via GuidedViewer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Story&lt;/strong&gt; — narrative markdown retelling of the content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ELI5&lt;/strong&gt; — explain like I'm five&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brainstorm&lt;/strong&gt; — creative connections and lateral ideas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed Learn&lt;/strong&gt; — bullet-point summary for fast review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The JSON formats (Quiz, Flashcards, Mind Map, Timeline) use a two-phase approach: LLM generates structured JSON → parse → re-stream the parsed results. Text formats (Story, ELI5, etc.) stream in real-time token by token.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gamification
&lt;/h3&gt;

&lt;p&gt;I added XP, level progression (10 tiers), daily streaks, and achievement badges. This was partly experimental — does adding game mechanics to a local productivity tool actually improve usage? Anecdotally yes: the streak counter creates a small daily habit pull.&lt;/p&gt;

&lt;p&gt;The Progress Page (v0.8.0) shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;26-week activity heatmap (sessions per day, colour-coded)&lt;/li&gt;
&lt;li&gt;Learning velocity chart (sessions/week + active days/week dual-axis)&lt;/li&gt;
&lt;li&gt;XP level with badge&lt;/li&gt;
&lt;li&gt;Week-over-week session delta&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design patterns used throughout
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adapter pattern (anti-corruption layer)
&lt;/h3&gt;

&lt;p&gt;Every third-party dependency sits behind a swappable interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/adapters/llm_client.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ILLMClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncIterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LiteLLMAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ILLMClient&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Wraps litellm — the only place litellm is imported&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StubLLMClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ILLMClient&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Used in tests — zero network calls&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern for: HTTP client (httpx), PDF parser (PyMuPDF / Docling), web search (DuckDuckGo / Serper / Brave), paragraph parser (python-docx).&lt;/p&gt;

&lt;p&gt;This made testing painless — all 61 E2E tests mock at the adapter boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy + plugin registration for parsers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_PARSERS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IFileParser&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;_PARSERS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;

&lt;span class="nd"&gt;@register_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PDFParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IFileParser&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@register_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.docx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocxParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IFileParser&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adding a new file format means writing one class and adding one decorator. Zero changes to the orchestration layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  EventBus for decoupled notifications
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In-process typed EventBus — no external dependencies
&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EventBus&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FileIngested&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;kb_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;chunk_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;

&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emit_typed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FileIngested&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;kb_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;chunk_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The watcher service (which detects stale/missing files) communicates with the KB service through events rather than direct calls. This kept the service layer clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges worth noting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SQLite WAL mode + concurrent async writes&lt;/strong&gt; — FastAPI runs async, and SQLite's WAL mode handles readers well but writers queue. I had to add retry logic with exponential backoff for the ingestion pipeline, which can run as a background task while chat is active.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PyInstaller + Python 3.11 + ONNX&lt;/strong&gt; — packaging the ONNX runtime into a PyInstaller binary was the most painful part of the v0.7.0 release. The model weights need to be bundled correctly, paths resolved at runtime via &lt;code&gt;sys._MEIPASS&lt;/code&gt;. Worth documenting if you're going down this path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSE streaming through Electron's IPC&lt;/strong&gt; — Electron's fetch API handles SSE properly, but the preload script needed explicit keep-alive handling to prevent the renderer from killing long-running streams during Learn Mode generation (which can take 10–30 seconds for complex documents).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows SmartScreen&lt;/strong&gt; — unsigned NSIS installers get flagged. Adding instructions to the download page for "More info → Run anyway" reduced support questions significantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Phase 2 of Knovex moves toward cloud + organisation features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Portal&lt;/strong&gt; — web admin for org key management and user management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 deployment modes&lt;/strong&gt; — Personal (own keys) / Organisation (portal-managed) / Self-hosted (Docker)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph agent orchestration&lt;/strong&gt; — beyond single-turn Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual workflow builder&lt;/strong&gt; — chain operations on your KB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile app&lt;/strong&gt; — React Native, same backend API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin/connector marketplace&lt;/strong&gt; — Notion, Confluence, GitHub, etc.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;App:&lt;/strong&gt; &lt;a href="https://tailorgunjan93.github.io/knovex" rel="noopener noreferrer"&gt;tailorgunjan93.github.io/knovex&lt;/a&gt; — free one-click installer for Windows, macOS, Linux&lt;br&gt;&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/tailorgunjan93/knovex" rel="noopener noreferrer"&gt;github.com/tailorgunjan93/knovex&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;RAG engine:&lt;/strong&gt; &lt;code&gt;pip install docnest-ai&lt;/code&gt;  &lt;/p&gt;

&lt;p&gt;MIT licensed. v0.10.0 is stable with 61 E2E tests passing.&lt;/p&gt;

&lt;p&gt;Happy to answer questions about any part of the stack in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>rag</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I was embarrassed by my RAG demo. Turns out the bug was never in my code.</title>
      <dc:creator>Gunjan Tailor</dc:creator>
      <pubDate>Thu, 21 May 2026 17:08:33 +0000</pubDate>
      <link>https://dev.to/gunjantailor/i-was-embarrassed-by-my-rag-demo-turns-out-the-bug-was-never-in-my-code-4hmb</link>
      <guid>https://dev.to/gunjantailor/i-was-embarrassed-by-my-rag-demo-turns-out-the-bug-was-never-in-my-code-4hmb</guid>
      <description>&lt;p&gt;I showed my RAG app to a friend.&lt;/p&gt;

&lt;p&gt;He asked: "which region grew the  most last quarter?"&lt;/p&gt;

&lt;p&gt;It said Europe. The answer was Asia. By a lot.&lt;/p&gt;

&lt;p&gt;I spent two days debugging embeddings, chunk sizes, temperature settings.&lt;br&gt;
The bug was none of those things.&lt;/p&gt;

&lt;p&gt;The table had been turned into this:&lt;/p&gt;

&lt;p&gt;"45.2% Q3 Europe 38.1% Q2 Asia 41.7%..."&lt;/p&gt;

&lt;p&gt;Numbers with no headers. No caption. No context.&lt;br&gt;
The LLM wasn't hallucinating. It was working with garbage.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpc6snla18uijvpwblqh5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpc6snla18uijvpwblqh5.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🛠️ So I built the thing I wished existed&lt;br&gt;
Meet DocNest — not another chunker.&lt;br&gt;
A document normalization engine that reads structure before touching content.&lt;/p&gt;

&lt;p&gt;Every heading → a navigable §section with its own ID&lt;br&gt;
Every table → preserved as { caption, headers, rows[] } JSON&lt;br&gt;
Every section → one-sentence LLM summary + BM25 keyword index&lt;br&gt;
All of it → packed into a portable .udf file&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocNestPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.reader&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UDFIndex&lt;/span&gt;

&lt;span class="c1"&gt;# Convert — runs once, costs a few LLM calls
&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocNestPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;groq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# free tier works perfectly
&lt;/span&gt;    &lt;span class="n"&gt;llm_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gsk_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;emb_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;huggingface&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# local, no API key needed
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → report.udf ✓
&lt;/span&gt;
&lt;span class="c1"&gt;# Query
&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UDFIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.udf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which region had the highest Q3 growth?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# "Asia grew the most, up +12.4pp"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layer_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# 1
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 0  ← yes, really. zero.
&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;✅ Zero tokens. Correct answer. 18ms.&lt;br&gt;
That's not a cherry-picked example. Here's why it's possible.&lt;/p&gt;

&lt;p&gt;⚡ The 5-layer query engine&lt;br&gt;
Instead of dumping the full document into an LLM, queries escalate through layers — stopping the moment one can answer confidently.&lt;br&gt;
LayerWhat it doesTokensSpeed0Pre-computed summary + key numbers0&amp;lt; 1ms1BM25 + cosine → lands on exact §section0&amp;lt; 20ms2Section-scoped LLM call~3001–3s3Multi-section synthesis~9002–5s4Full document fallback~4000+5–15s&lt;br&gt;
I expected layers 2–4 to do most of the work.&lt;/p&gt;

&lt;p&gt;🤯 Layers 0 and 1 handle roughly 70% of real-world questions — at zero token cost.&lt;br&gt;
Seven out of ten queries answered from a structured index. You pay for LLM compute only when genuine reasoning is needed.&lt;/p&gt;

&lt;p&gt;📊 Real numbers. Not vibes.&lt;br&gt;
25 questions. 500-page open-source nutrition textbook. PyMuPDF + Groq free tier.&lt;br&gt;
Question typeScoreBasic facts (calories, macros)✅ 5/5Detailed nutrition (fiber, glycemic index)✅ 5/5Micronutrients (vitamins, minerals)✅ 4/5Hard synthesis (BMR, omega-3, antioxidants)✅ 5/5Edge cases + hallucination traps✅ 5/5Total24/25 — 96%&lt;br&gt;
The one failure: a table-only page where the text parser extracted nothing.&lt;br&gt;
Fix: use DoclingPDFParser for image-heavy or scanned PDFs.&lt;/p&gt;

&lt;p&gt;🧠 Handles 600-page PDFs without exploding your RAM&lt;br&gt;
Standard Docling loads the full document into memory. 600 pages on a normal laptop = 💀 out of memory.&lt;br&gt;
DocNest chunks automatically, processes each at full ML quality, merges the output. Peak RAM stays constant regardless of document size.&lt;br&gt;
python&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.parsers.pdf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DoclingPDFParser&lt;/span&gt;

&lt;span class="c1"&gt;# Just works — auto-detects large PDFs
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;600-page-annual-report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or tune for your hardware
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 💻 low RAM
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 🚀 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;speed mode&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
🚀 Try it&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bashpip &lt;span class="nb"&gt;install &lt;/span&gt;docnest-ai

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Formats: PDF (ML + fast) · DOCX · XLSX · HTML · Markdown&lt;br&gt;
LLM providers: Groq (free) · OpenAI · Ollama (local) · Anthropic · Mistral · Google · Cohere&lt;br&gt;
Vector backends: numpy (zero deps) · FAISS · ChromaDB&lt;br&gt;
bash# CLI — because boilerplate is boring&lt;br&gt;
docnest convert report.pdf --llm-provider groq --llm-model llama-3.3-70b-versatile&lt;br&gt;
docnest query report.udf "What are the key financial risks?"&lt;br&gt;
docnest view report.udf     # structured HTML viewer in browser&lt;br&gt;
GitHub repo — star it if this solved a problem you've had:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/tailorgunjan93" rel="noopener noreferrer"&gt;
        tailorgunjan93
      &lt;/a&gt; / &lt;a href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;
        docnest
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      The document normalization engine RAG has always needed. Parse any document, understand its structure, build RAG that actually works.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div&gt;
&lt;a rel="noopener noreferrer" href="https://github.com/tailorgunjan93/docnest/docs/logo.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Ftailorgunjan93%2Fdocnest%2FHEAD%2Fdocs%2Flogo.svg" alt="DOCNEST Logo" width="100"&gt;&lt;/a&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;DOCNEST&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Secure · Fast · Reliable · Cost-Effective&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The document normalization engine RAG has always needed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tailorgunjan93/docnest/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/tailorgunjan93/docnest/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667" alt="License: MIT"&gt;&lt;/a&gt;
&lt;a href="https://python.org" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e7d16618cfb930dc9ed2cbd0283c05e8164571fa019ce4ec0981047de28192f8/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31312532422d626c75653f6c6f676f3d707974686f6e" alt="Python"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/docnest-ai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/09660d947fcded30a96c897b5d5a3962cbac7a9a3e12a5bd36a940f4444744ad/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f646f636e6573742d61693f636f6c6f723d677265656e" alt="PyPI"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/docnest-ai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/0d9ce0b1159aca10d98cdfe739542633d588b4f26a24e5e7ff7995790cfc3d5f/68747470733a2f2f696d672e736869656c64732e696f2f707970692f646d2f646f636e6573742d61693f636f6c6f723d626c7565" alt="PyPI Downloads"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest#-accuracy-benchmark--multi-format-rag-evaluation" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/a0dbfcdb0af1153869b58c59e9cd98bd7b5a3e0a6ba76785f8bb475086331cc3/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f52414725323041636375726163792d382e3525324631302d627269676874677265656e" alt="Accuracy"&gt;&lt;/a&gt;
&lt;a href=""&gt;&lt;img src="https://camo.githubusercontent.com/f3adeea933a64c2014c89092040b8c02f4931f3f5a5d46a189133d4ac21d0ebf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7374617475732d737461626c652d627269676874677265656e" alt="Status"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e2ab1ab10c5e4d7caa102b689469c5c6317ad19c273e05e28f02e048da214e79/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f7461696c6f7267756e6a616e39332f646f636e6573743f7374796c653d736f6369616c" alt="Stars"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest/graphs/contributors" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/375d3c06879d4f352880c2fb43546cca4287ddfbf90017f60da8a1b69ab93104/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f636f6e7472696275746f72732f7461696c6f7267756e6a616e39332f646f636e657374" alt="Contributors"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/tailorgunjan93/docnest/docs/banner.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Ftailorgunjan93%2Fdocnest%2FHEAD%2Fdocs%2Fbanner.svg" alt="DOCNEST Banner" width="100%"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tailorgunjan93/docnest#-why-docnest" rel="noopener noreferrer"&gt;Why DOCNEST&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-installation" rel="noopener noreferrer"&gt;Installation&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-quick-start" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-python-api" rel="noopener noreferrer"&gt;Python API&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-pdf-parsing--memory-guide" rel="noopener noreferrer"&gt;PDF Parsing&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-how-it-works" rel="noopener noreferrer"&gt;How It Works&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-accuracy-benchmark--multi-format-rag-evaluation" rel="noopener noreferrer"&gt;Benchmark&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-provider-interfaces" rel="noopener noreferrer"&gt;Providers&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-roadmap" rel="noopener noreferrer"&gt;Roadmap&lt;/a&gt;&lt;/p&gt;


&lt;/div&gt;
&lt;br&gt;


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Problem with RAG Today&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;Every RAG pipeline ingests documents the same broken way:&lt;/p&gt;

&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;PDF → extract text → split every 512 chars → embed → store → hope
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What gets silently destroyed:&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What blind chunking loses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Financial report&lt;/td&gt;
&lt;td&gt;Table row &lt;code&gt;45.2% | Q3 | Europe&lt;/code&gt; has no column headers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal contract&lt;/td&gt;
&lt;td&gt;Clause split mid-sentence across two chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API documentation&lt;/td&gt;
&lt;td&gt;Code example separated from its description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research paper&lt;/td&gt;
&lt;td&gt;Figure caption disconnected from its analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;The LLM receives noise and returns approximate answers.&lt;/strong&gt; This is not a retrieval problem — it is an ingestion problem.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;See the difference&lt;/h3&gt;
&lt;/div&gt;

&lt;p&gt;Take a financial report with a revenue table. Here is what each approach…&lt;/p&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/docnest-ai" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://pypi.org/project/docnest-ai" rel="noopener noreferrer"&gt;https://pypi.org/project/docnest-ai&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Format spec: &lt;a href="https://github.com/tailorgunjan93/udf-spec" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://github.com/tailorgunjan93/udf-spec" rel="noopener noreferrer"&gt;https://github.com/tailorgunjan93/udf-spec&lt;/a&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>My RAG app confidently told my client the wrong answer. I spent 3 days debugging the wrong thing.</title>
      <dc:creator>Gunjan Tailor</dc:creator>
      <pubDate>Mon, 18 May 2026 13:35:15 +0000</pubDate>
      <link>https://dev.to/gunjantailor/i-built-a-pdf-parser-that-actually-preserves-table-structure-for-rag-heres-why-it-matters-19fo</link>
      <guid>https://dev.to/gunjantailor/i-built-a-pdf-parser-that-actually-preserves-table-structure-for-rag-heres-why-it-matters-19fo</guid>
      <description>&lt;p&gt;Picture this.&lt;/p&gt;

&lt;p&gt;It's a client demo. They're watching. I type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Which region had the highest revenue growth last quarter?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My RAG app — &lt;strong&gt;three weeks of work&lt;/strong&gt;, carefully tuned embeddings, clever prompts — responds instantly.&lt;/p&gt;

&lt;p&gt;The client nods. Writes it down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The answer was wrong. By almost double.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent three days debugging the wrong things.&lt;/p&gt;

&lt;p&gt;Chunk size? Tried 256, 512, 1024. Nothing.&lt;br&gt;
Temperature? 0.0, 0.3, 0.7. Still wrong.&lt;br&gt;
Embeddings model? Swapped three of them. Nope.&lt;br&gt;
Prompt engineering? Added &lt;em&gt;"think step by step"&lt;/em&gt;, &lt;em&gt;"be precise"&lt;/em&gt;, &lt;em&gt;"do not hallucinate"&lt;/em&gt;. 😭&lt;/p&gt;

&lt;p&gt;The LLM wasn't hallucinating. It was doing its best with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3  Asia   29.3%"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Orphaned numbers. No column headers. No caption. No context.&lt;/p&gt;

&lt;p&gt;The original table had all of that. My chunker ate it silently.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The bug was never in retrieval. It was in ingestion.&lt;/strong&gt; And I never thought to look there.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  🔥 The dirty secret of RAG tutorials
&lt;/h2&gt;

&lt;p&gt;Every tutorial shows you this pipeline:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF → extract text → chunk at 512 tokens → embed → store → retrieve → answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Clean. Simple. &lt;strong&gt;Completely wrong for structured documents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what blind chunking silently destroys:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Document&lt;/th&gt;
&lt;th&gt;What you had&lt;/th&gt;
&lt;th&gt;What the LLM gets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Financial report&lt;/td&gt;
&lt;td&gt;Revenue table with headers&lt;/td&gt;
&lt;td&gt;Orphaned numbers, zero context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal contract&lt;/td&gt;
&lt;td&gt;3-page clause&lt;/td&gt;
&lt;td&gt;Split mid-sentence, both halves useless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API docs&lt;/td&gt;
&lt;td&gt;Function + code example&lt;/td&gt;
&lt;td&gt;Code separated from its description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research paper&lt;/td&gt;
&lt;td&gt;Figure with caption&lt;/td&gt;
&lt;td&gt;Caption on chunk 7, analysis on chunk 12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;🗑️ &lt;strong&gt;You're feeding the LLM garbage and expecting gold.&lt;/strong&gt; The model isn't dumb — it's working with broken input.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  🛠️ So I built the thing I wished existed
&lt;/h2&gt;

&lt;p&gt;Meet &lt;strong&gt;DocNest&lt;/strong&gt; — not another chunker.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;document normalization engine&lt;/strong&gt; that reads structure &lt;em&gt;before&lt;/em&gt; touching content.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every heading → a navigable &lt;code&gt;§section&lt;/code&gt; with its own ID&lt;/li&gt;
&lt;li&gt;Every table → preserved as &lt;code&gt;{ caption, headers, rows[] }&lt;/code&gt; JSON&lt;/li&gt;
&lt;li&gt;Every section → one-sentence LLM summary + BM25 keyword index&lt;/li&gt;
&lt;li&gt;All of it → packed into a portable &lt;code&gt;.udf&lt;/code&gt; file
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocNestPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.reader&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UDFIndex&lt;/span&gt;

&lt;span class="c1"&gt;# Convert — runs once, costs a few LLM calls
&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocNestPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;groq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# free tier works perfectly
&lt;/span&gt;    &lt;span class="n"&gt;llm_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gsk_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;emb_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;huggingface&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# local, no API key needed
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → report.udf ✓
&lt;/span&gt;
&lt;span class="c1"&gt;# Query
&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UDFIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.udf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which region had the highest Q3 growth?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# "Asia grew the most, up +12.4pp"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layer_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# 1
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 0  ← yes, really. zero.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Zero tokens. Correct answer. 18ms.&lt;/strong&gt;&lt;br&gt;
That's not a cherry-picked example. Here's why it's possible.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  ⚡ The 5-layer query engine
&lt;/h2&gt;

&lt;p&gt;Instead of dumping the full document into an LLM, queries escalate through layers — stopping the moment one can answer confidently.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-computed summary + key numbers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BM25 + cosine → lands on exact §section&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Section-scoped LLM call&lt;/td&gt;
&lt;td&gt;~300&lt;/td&gt;
&lt;td&gt;1–3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-section synthesis&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;td&gt;2–5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full document fallback&lt;/td&gt;
&lt;td&gt;~4000+&lt;/td&gt;
&lt;td&gt;5–15s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I expected layers 2–4 to do most of the work.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🤯 &lt;strong&gt;Layers 0 and 1 handle roughly 70% of real-world questions — at zero token cost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Seven out of ten queries answered from a structured index. You pay for LLM compute only when genuine reasoning is needed.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  📊 Real numbers. Not vibes.
&lt;/h2&gt;

&lt;p&gt;25 questions. 500-page open-source nutrition textbook. PyMuPDF + Groq free tier.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question type&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic facts (calories, macros)&lt;/td&gt;
&lt;td&gt;✅ 5/5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detailed nutrition (fiber, glycemic index)&lt;/td&gt;
&lt;td&gt;✅ 5/5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Micronutrients (vitamins, minerals)&lt;/td&gt;
&lt;td&gt;✅ 4/5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard synthesis (BMR, omega-3, antioxidants)&lt;/td&gt;
&lt;td&gt;✅ 5/5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge cases + hallucination traps&lt;/td&gt;
&lt;td&gt;✅ 5/5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;24/25 — 96%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The one failure: a table-only page where the text parser extracted nothing.&lt;br&gt;
Fix: use &lt;code&gt;DoclingPDFParser&lt;/code&gt; for image-heavy or scanned PDFs.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧠 Handles 600-page PDFs without exploding your RAM
&lt;/h2&gt;

&lt;p&gt;Standard Docling loads the full document into memory. 600 pages on a normal laptop = 💀 out of memory.&lt;/p&gt;

&lt;p&gt;DocNest chunks automatically, processes each at full ML quality, merges the output. Peak RAM stays constant regardless of document size.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docnest.parsers.pdf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DoclingPDFParser&lt;/span&gt;

&lt;span class="c1"&gt;# Just works — auto-detects large PDFs
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;600-page-annual-report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or tune for your hardware
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 💻 low RAM
&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DoclingPDFParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 🚀 speed mode
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🚀 Try it
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;docnest-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Formats:&lt;/strong&gt; PDF (ML + fast) · DOCX · XLSX · HTML · Markdown&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM providers:&lt;/strong&gt; Groq (free) · OpenAI · Ollama (local) · Anthropic · Mistral · Google · Cohere&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector backends:&lt;/strong&gt; numpy (zero deps) · FAISS · ChromaDB&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CLI — because boilerplate is boring&lt;/span&gt;
docnest convert report.pdf &lt;span class="nt"&gt;--llm-provider&lt;/span&gt; groq &lt;span class="nt"&gt;--llm-model&lt;/span&gt; llama-3.3-70b-versatile
docnest query report.udf &lt;span class="s2"&gt;"What are the key financial risks?"&lt;/span&gt;
docnest view report.udf     &lt;span class="c"&gt;# structured HTML viewer in browser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;GitHub repo — star it if this solved a problem you've had:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/tailorgunjan93" rel="noopener noreferrer"&gt;
        tailorgunjan93
      &lt;/a&gt; / &lt;a href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;
        docnest
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      The document normalization engine RAG has always needed. Parse any document, understand its structure, build RAG that actually works.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div&gt;
&lt;a rel="noopener noreferrer" href="https://github.com/tailorgunjan93/docnest/docs/logo.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Ftailorgunjan93%2Fdocnest%2FHEAD%2Fdocs%2Flogo.svg" alt="DOCNEST Logo" width="120"&gt;&lt;/a&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;DOCNEST&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The document normalization engine RAG has always needed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tailorgunjan93/docnest/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/tailorgunjan93/docnest/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667" alt="License: MIT"&gt;&lt;/a&gt;
&lt;a href="https://python.org" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e7d16618cfb930dc9ed2cbd0283c05e8164571fa019ce4ec0981047de28192f8/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31312532422d626c75653f6c6f676f3d707974686f6e" alt="Python"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/docnest-ai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/09660d947fcded30a96c897b5d5a3962cbac7a9a3e12a5bd36a940f4444744ad/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f646f636e6573742d61693f636f6c6f723d677265656e" alt="PyPI"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/docnest-ai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/0d9ce0b1159aca10d98cdfe739542633d588b4f26a24e5e7ff7995790cfc3d5f/68747470733a2f2f696d672e736869656c64732e696f2f707970692f646d2f646f636e6573742d61693f636f6c6f723d626c7565" alt="PyPI Downloads"&gt;&lt;/a&gt;
&lt;a href=""&gt;&lt;img src="https://camo.githubusercontent.com/a7dd954dfc85fc675b686e7d47fef2274be7095f2b5c77a5a031a945889b018c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7374617475732d616c7068612d79656c6c6f77" alt="Status"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e2ab1ab10c5e4d7caa102b689469c5c6317ad19c273e05e28f02e048da214e79/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f7461696c6f7267756e6a616e39332f646f636e6573743f7374796c653d736f6369616c" alt="Stars"&gt;&lt;/a&gt;
&lt;a href="https://github.com/tailorgunjan93/docnest/graphs/contributors" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/375d3c06879d4f352880c2fb43546cca4287ddfbf90017f60da8a1b69ab93104/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f636f6e7472696275746f72732f7461696c6f7267756e6a616e39332f646f636e657374" alt="Contributors"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Parse any document. Understand its structure. Build RAG that actually works.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/tailorgunjan93/docnest#-why-docnest" rel="noopener noreferrer"&gt;Why DOCNEST&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-installation" rel="noopener noreferrer"&gt;Installation&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-quick-start" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-python-api" rel="noopener noreferrer"&gt;Python API&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-pdf-parsing--memory-guide" rel="noopener noreferrer"&gt;PDF Parsing&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-how-it-works" rel="noopener noreferrer"&gt;How It Works&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-cli-reference" rel="noopener noreferrer"&gt;CLI Reference&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-provider-interfaces" rel="noopener noreferrer"&gt;Providers&lt;/a&gt; •
&lt;a href="https://github.com/tailorgunjan93/docnest#-roadmap" rel="noopener noreferrer"&gt;Roadmap&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Problem with RAG Today&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Every RAG pipeline ingests documents the same broken way:&lt;/p&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;PDF → extract text → split every 512 chars → embed → store → hope
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What gets silently destroyed:&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What blind chunking loses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Financial report&lt;/td&gt;
&lt;td&gt;Table row &lt;code&gt;45.2% | Q3 | Europe&lt;/code&gt; has no column headers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal contract&lt;/td&gt;
&lt;td&gt;Clause split mid-sentence across two chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API documentation&lt;/td&gt;
&lt;td&gt;Code example separated from its description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research paper&lt;/td&gt;
&lt;td&gt;Figure caption disconnected from its analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;The LLM receives noise and returns approximate answers.&lt;/strong&gt; This is not a retrieval problem — it is an ingestion problem.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;See the difference&lt;/h3&gt;
&lt;/div&gt;
&lt;p&gt;Take a financial report with a revenue table…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;PyPI: &lt;a href="https://pypi.org/project/docnest-ai" rel="noopener noreferrer"&gt;https://pypi.org/project/docnest-ai&lt;/a&gt;&lt;br&gt;
Format spec: &lt;a href="https://github.com/tailorgunjan93/udf-spec" rel="noopener noreferrer"&gt;https://github.com/tailorgunjan93/udf-spec&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔨 Honesty tax
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;🚧 This is &lt;code&gt;0.4.0a2&lt;/code&gt; — alpha. It works on real documents, but PPTX parser isn't built yet, Qdrant/Weaviate backends are on the roadmap, and SharePoint/Confluence connectors are planned.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If any of those sound like something you want to build — &lt;a href="https://github.com/tailorgunjan93/docnest/issues" rel="noopener noreferrer"&gt;good first issues are labeled and waiting&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 One question for you
&lt;/h2&gt;

&lt;p&gt;Most RAG infrastructure assumes text extraction is a solved problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It isn't.&lt;/strong&gt; Not for tables. Not for anything where position and relationship carry meaning.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;What document type has caused you the most RAG pain?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For me it was financial tables. Drop it in the comments — if it's a format DocNest doesn't handle yet, that's probably the next parser I build.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Building in the open at &lt;a href="https://github.com/tailorgunjan93/docnest" rel="noopener noreferrer"&gt;github.com/tailorgunjan93/docnest&lt;/a&gt;. Stars, issues, and brutal feedback all welcome.&lt;/em&gt; 🙏&lt;/p&gt;

</description>
      <category>rag</category>
      <category>python</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
