<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Toheed Asghar</title>
    <description>The latest articles on DEV Community by Toheed Asghar (@toheed_asghar_123132).</description>
    <link>https://dev.to/toheed_asghar_123132</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3741735%2F6d9123f7-8bb8-4b2c-b5c3-1987bbcf521d.jpg</url>
      <title>DEV Community: Toheed Asghar</title>
      <link>https://dev.to/toheed_asghar_123132</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/toheed_asghar_123132"/>
    <language>en</language>
    <item>
      <title>NL2SQL in 2026: How Multi-Agent Pipelines Convert Natural Language to Safe SQL</title>
      <dc:creator>Toheed Asghar</dc:creator>
      <pubDate>Mon, 13 Apr 2026 18:15:47 +0000</pubDate>
      <link>https://dev.to/toheed_asghar_123132/nl2sql-in-2026-how-multi-agent-pipelines-convert-natural-language-to-safe-sql-18l2</link>
      <guid>https://dev.to/toheed_asghar_123132/nl2sql-in-2026-how-multi-agent-pipelines-convert-natural-language-to-safe-sql-18l2</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Most people who need data can't write SQL. Product managers open Jira tickets for simple queries. Support teams need custom admin panels for every lookup. Analysts spend hours on routine joins they've written a hundred times.&lt;/p&gt;

&lt;p&gt;NL2SQL (natural language to SQL) fixes this. You type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Who are the top 5 customers by order volume?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LLMs have made this practical. On the Spider benchmark (200+ databases, standard academic dataset), modern systems hit 85–92% execution accuracy. But getting from "works on benchmarks" to "works in production" requires more than a single prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Single-Prompt NL2SQL Breaks
&lt;/h2&gt;

&lt;p&gt;The naive approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System: Here's the schema: {entire_database_schema}
User: {natural_language_question}
Assistant: {sql_query}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works when your database has 5 tables with names like &lt;code&gt;customers&lt;/code&gt; and &lt;code&gt;orders&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It fails when you have 500 tables, columns named &lt;code&gt;usr_trx_fl&lt;/code&gt;, and foreign key chains 6 layers deep. A single LLM call can't simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figure out which 5 of 500 tables are relevant&lt;/li&gt;
&lt;li&gt;Check for SQL injection&lt;/li&gt;
&lt;li&gt;Generate correct SQL&lt;/li&gt;
&lt;li&gt;Validate syntax, logic, and performance&lt;/li&gt;
&lt;li&gt;Explain what the query does&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need decomposition. Same principle as microservices — single responsibility per agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Agent NL2SQL: How It Works
&lt;/h2&gt;

&lt;p&gt;I'll walk through the pipeline using the open-source &lt;a href="https://github.com/ToheedAsghar/NL2SQL" rel="noopener noreferrer"&gt;NL2SQL&lt;/a&gt; project as a reference. It uses 8 agents orchestrated with LangGraph.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Table Discovery (Schema Linking)
&lt;/h3&gt;

&lt;p&gt;The hardest part. Given a question, find the relevant tables.&lt;/p&gt;

&lt;p&gt;Three signals run in parallel:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keyword Matching&lt;/strong&gt; — Token overlap between query and table/column names. No LLM needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"singers" → singer table (fuzzy match score: 0.95)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt; — Embedding similarity. Catches conceptual matches keywords miss.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"revenue" → order_details table (cosine similarity: 0.82)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Foreign Key Graph&lt;/strong&gt; — BFS from seed tables through FK relationships.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;singer (seed) → concert (depth 1, score 0.5) → stadium (depth 2, score 0.25)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scores are weighted and merged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.35&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.45&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;semantic&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fk_graph&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only top-K tables (default: 5) pass through to the next stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Security Filter
&lt;/h3&gt;

&lt;p&gt;Runs before generation. Checks for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL injection patterns (&lt;code&gt;'; DROP TABLE --&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Destructive operations (DELETE, TRUNCATE, ALTER)&lt;/li&gt;
&lt;li&gt;System table access&lt;/li&gt;
&lt;li&gt;Dangerous function calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blocked queries never reach the generator. Non-negotiable for user-facing systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Query Generation
&lt;/h3&gt;

&lt;p&gt;The generator receives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only the relevant tables (not the full schema)&lt;/li&gt;
&lt;li&gt;The user's question&lt;/li&gt;
&lt;li&gt;Few-shot examples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the first attempt fails validation → &lt;strong&gt;one-shot retry&lt;/strong&gt; with the error message appended. This simple pattern catches a surprising number of wrong column names and missing joins.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Parallel Validation
&lt;/h3&gt;

&lt;p&gt;Four validators run concurrently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌──────────────┐
│   Syntax    │  │    Logic    │  │  Security   │  │ Performance  │
│  validator  │  │  validator  │  │  validator  │  │  validator   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬───────┘
       │                │                │                 │
       └────────────────┴────────────────┴─────────────────┘
                              │
                     ┌────────▼────────┐
                     │   Fan-in merge  │
                     └─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Validator&lt;/th&gt;
&lt;th&gt;Checks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Syntax&lt;/td&gt;
&lt;td&gt;SQL parsing, grammar, dialect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logic&lt;/td&gt;
&lt;td&gt;Tables/columns exist, joins correct, types compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Injection, unauthorized ops, data exfiltration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Full table scans, missing indexes, complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Any failure → back to generator with error context.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Explanation + Safety Score
&lt;/h3&gt;

&lt;p&gt;Output includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plain-English explanation of the query&lt;/li&gt;
&lt;li&gt;Safety score (0–4 across four dimensions)&lt;/li&gt;
&lt;li&gt;Optimization recommendations
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;═══════════════════════════════════════
 EXPLANATION
═══════════════════════════════════════
This query retrieves the top 5 customers by order volume by:
1. Joining customers with orders on customer_id
2. Counting orders per customer
3. Sorting by count descending, limiting to 5

Safety Score: 3.5/4.0
✓ Security: Safe (no injection risks)
✓ Syntax: Valid SQL
✓ Logic: Correct table and column usage
⚠ Performance: Consider index on customer_id
═══════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  DDL Vector Stores: The Efficiency Fix
&lt;/h2&gt;

&lt;p&gt;Here's something worth knowing if you're building or evaluating NL2SQL systems.&lt;/p&gt;

&lt;p&gt;The semantic search step in most pipelines (including the one above) &lt;strong&gt;recomputes embeddings on every request&lt;/strong&gt;. It embeds the query + every candidate table, computes cosine similarity in-memory, and discards the vectors. Next request: same computation, same API calls, same cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Better Approach
&lt;/h3&gt;

&lt;p&gt;Pre-compute embeddings for your DDL structure and store them in a vector DB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you embed per table:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Table: orders
Columns: order_id (PK, INTEGER), customer_id (FK → customers.id),
order_date (DATE), total_amount (DECIMAL), status (VARCHAR)
Relationships: references customers, referenced by order_items
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: include &lt;strong&gt;primary keys, foreign keys, types, and relationships&lt;/strong&gt; in the text you embed. This gives the embedding model structural context, not just names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Store in&lt;/strong&gt; FAISS / Chroma / Pinecone / Weaviate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At query time&lt;/strong&gt;: embed only the user's question (1 API call) → vector similarity search against pre-computed embeddings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Per-Request Embedding&lt;/th&gt;
&lt;th&gt;DDL Vector Store&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding calls/query&lt;/td&gt;
&lt;td&gt;1 + N (N = tables)&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;O(N)&lt;/td&gt;
&lt;td&gt;O(1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost at scale&lt;/td&gt;
&lt;td&gt;Linear with schema size&lt;/td&gt;
&lt;td&gt;Near-constant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema changes&lt;/td&gt;
&lt;td&gt;Auto (always fresh)&lt;/td&gt;
&lt;td&gt;Requires re-index&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Research backing&lt;/strong&gt;: LitE-SQL (EACL 2026) uses this exact pattern — pre-computed schema embeddings with contrastive learning. Results: &lt;strong&gt;88.45% on Spider, 72.10% on BIRD&lt;/strong&gt;, with 2–30x fewer parameters than full LLM approaches.&lt;/p&gt;

&lt;p&gt;For production systems where the schema changes less often than queries arrive (i.e., basically every production database), this is a clear win.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Benchmark-to-Production Gap
&lt;/h2&gt;

&lt;p&gt;Numbers you should know:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Schema Type&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Spider 1.0&lt;/td&gt;
&lt;td&gt;Clean, 3–10 tables, descriptive names&lt;/td&gt;
&lt;td&gt;85–92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spider 2.0&lt;/td&gt;
&lt;td&gt;Enterprise-realistic schemas&lt;/td&gt;
&lt;td&gt;6–21%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BIRD&lt;/td&gt;
&lt;td&gt;Dirty schemas, noisy labels&lt;/td&gt;
&lt;td&gt;~72% (best systems)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 85% → 6% drop isn't a bug. Spider 1.0 is clean academic data. Real databases have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hundreds of tables with cryptic names&lt;/li&gt;
&lt;li&gt;"Active users" meaning different things per company&lt;/li&gt;
&lt;li&gt;Nulls, inconsistent formats, undocumented columns&lt;/li&gt;
&lt;li&gt;Multi-hop joins, window functions, nested subqueries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-agent validation doesn't close this gap entirely, but it's the difference between a system that fails silently and one that says "I'm not confident — here's why."&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;Install from PyPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;nl2sql-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure (works with any OpenAI-compatible API — OpenRouter, OpenAI, Ollama, vLLM):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://openrouter.ai/api/v1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"openai/gpt-4o-mini"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_EMBEDDING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"openai/text-embedding-3-small"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DB_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sqlite"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DB_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/path/to/your/database.sqlite"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Interactive REPL&lt;/span&gt;
nl2sql

&lt;span class="c"&gt;# One-shot&lt;/span&gt;
nl2sql &lt;span class="s2"&gt;"Show me all singers from France"&lt;/span&gt;

&lt;span class="c"&gt;# Override database&lt;/span&gt;
nl2sql &lt;span class="nt"&gt;--db&lt;/span&gt; /path/to/other.sqlite &lt;span class="s2"&gt;"List all employees"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with any SQLite database. Point &lt;code&gt;DB_PATH&lt;/code&gt; at your own &lt;code&gt;.sqlite&lt;/code&gt; file or use the Spider benchmark databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/ToheedAsghar/NL2SQL" rel="noopener noreferrer"&gt;github.com/ToheedAsghar/NL2SQL&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/nl2sql-agents/" rel="noopener noreferrer"&gt;pypi.org/project/nl2sql-agents&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture at a Glance
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Query
    │
    ▼
┌─────────────────┐
│ Security Filter  │──── Block dangerous queries
└────────┬────────┘
         ▼
┌─────────────────────────────────────┐
│         Table Discovery             │
│  ┌──────┐ ┌────────┐ ┌──────────┐  │
│  │Keywrd│ │Semantic│ │ FK Graph │  │
│  └──┬───┘ └───┬────┘ └────┬─────┘  │
│     └─────────┴────────────┘        │
│         Weighted Merge              │
└────────────┬────────────────────────┘
             ▼
┌─────────────────┐
│ Schema Formatter │──── Format relevant tables
└────────┬────────┘
         ▼
┌─────────────────┐
│ Query Generator  │◄─── Retry with error context
└────────┬────────┘
         ▼
┌─────────────────────────────────────┐
│        Parallel Validation          │
│ ┌──────┐ ┌─────┐ ┌─────┐ ┌──────┐ │
│ │Syntax│ │Logic│ │Secur│ │ Perf │ │
│ └──────┘ └─────┘ └─────┘ └──────┘ │
└────────────┬────────────────────────┘
             ▼
┌─────────────────┐
│   Explainer     │──── Explanation + Safety Score
└─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;8 agents. Parallel fan-out/fan-in. LangGraph &lt;code&gt;StateGraph&lt;/code&gt; with conditional edges. MIT licensed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next for NL2SQL
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DDL vector stores&lt;/strong&gt; replacing per-request embeddings as the default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic layers&lt;/strong&gt; (dbt, Cube) + NL2SQL as complementary approaches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-database support&lt;/strong&gt; — PostgreSQL, MySQL, BigQuery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution + visualization&lt;/strong&gt; — run queries and render results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn conversations&lt;/strong&gt; — follow-up questions with context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Right-sized models&lt;/strong&gt; — fine-tuned 7B models for individual agent tasks instead of GPT-4 for everything&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Star the repo if it's useful: &lt;a href="https://github.com/ToheedAsghar/NL2SQL" rel="noopener noreferrer"&gt;github.com/ToheedAsghar/NL2SQL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture diagram in the README is worth a look even if you're building something completely different — it's a clean reference for structuring multi-agent LangGraph pipelines.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>sql</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Built a Multi-Agent RAG System That Fact-Checks Its Own Answers — Here's How</title>
      <dc:creator>Toheed Asghar</dc:creator>
      <pubDate>Fri, 20 Feb 2026 14:24:23 +0000</pubDate>
      <link>https://dev.to/toheed_asghar_123132/i-built-a-multi-agent-rag-system-that-fact-checks-its-own-answers-heres-how-1g7j</link>
      <guid>https://dev.to/toheed_asghar_123132/i-built-a-multi-agent-rag-system-that-fact-checks-its-own-answers-heres-how-1g7j</guid>
      <description>&lt;p&gt;Every RAG system has the same Achilles' heel: &lt;strong&gt;hallucination&lt;/strong&gt;. You ask a question, it retrieves some documents, and the LLM confidently generates an answer that sounds right but is subtly wrong. No warning, no citation, no second opinion.&lt;/p&gt;

&lt;p&gt;I spent weeks building a system that fixes this. &lt;strong&gt;DocForge&lt;/strong&gt; is an open-source multi-agent RAG pipeline where four specialized AI agents collaborate — and one of them exists solely to fact-check the others.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk you through the architecture, the problems it solves, and how you can run it yourself.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ToheedAsghar" rel="noopener noreferrer"&gt;
        ToheedAsghar
      &lt;/a&gt; / &lt;a href="https://github.com/ToheedAsghar/DocForge" rel="noopener noreferrer"&gt;
        DocForge
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A RAG pipeline that doesn't trust its own answers. 4 AI agents collaborate to route queries, retrieve docs, synthesize answers, and catch hallucinations automatically.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;DocForge&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;
A &lt;strong&gt;Multi-Agent Retrieval-Augmented Generation (RAG) system&lt;/strong&gt; built with LangGraph, featuring intelligent query routing, adaptive retrieval, fact-checking with automatic retry logic, and a FastAPI backend
&lt;/p&gt;

&lt;p&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/9833615dcd5c4175da2140bbe363cb42af4faaeffc0624453b6154e68a60d0aa/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31312b2d626c75652e737667"&gt;&lt;img src="https://camo.githubusercontent.com/9833615dcd5c4175da2140bbe363cb42af4faaeffc0624453b6154e68a60d0aa/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31312b2d626c75652e737667" alt="Python"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/f8bae329f8d2a971ebb55aeab5bbff19068db78b1c61b908b89ef3fdf640224f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c616e6747726170682d302e302e32302b2d677265656e2e737667"&gt;&lt;img src="https://camo.githubusercontent.com/f8bae329f8d2a971ebb55aeab5bbff19068db78b1c61b908b89ef3fdf640224f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c616e6747726170682d302e302e32302b2d677265656e2e737667" alt="LangGraph"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667"&gt;&lt;img src="https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667" alt="License"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;a rel="noopener noreferrer" href="https://github.com/ToheedAsghar/DocForge/assets/demo.gif"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FToheedAsghar%2FDocForge%2Fassets%2Fdemo.gif" width="600" alt="DocForge Demo"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Key Features&lt;/h2&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Multi-Agent Architecture&lt;/h3&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Routing Agent&lt;/strong&gt; — Classifies query complexity (simple lookup / complex reasoning / multi-hop) and generates an optimized search query for the vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval Agent&lt;/strong&gt; — Adaptive document retrieval (3-10 docs based on complexity, with relaxed thresholds on retries)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis Agent&lt;/strong&gt; — Synthesizes coherent, cited answers from multiple sources using chain-of-thought reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation Agent&lt;/strong&gt; — Fact-checks every claim against source documents, identifies hallucinations, and corrects the answer if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Intelligent Workflow&lt;/h3&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Confidence-based validation skip&lt;/strong&gt; — When retrieval scores are high, sources are sufficient, and no information gaps exist, validation is skipped entirely for faster responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic retry with adaptive strategy&lt;/strong&gt; — On validation failure, the system retries retrieval with 50% more documents and a relaxed relevance threshold (up to 3 attempts)&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ToheedAsghar/DocForge" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;





&lt;h2&gt;
  
  
  Why Traditional RAG Falls Short
&lt;/h2&gt;

&lt;p&gt;A standard RAG pipeline is straightforward: embed a query, retrieve similar chunks from a vector database, and pass them to an LLM to generate an answer. It works — until it doesn't.&lt;/p&gt;

&lt;p&gt;Here are the failure modes I kept hitting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No query understanding&lt;/strong&gt; — A simple factual lookup and a complex multi-hop question both get the same retrieval strategy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fixed retrieval&lt;/strong&gt; — Always fetching the same number of documents regardless of question complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No verification&lt;/strong&gt; — The LLM's answer is accepted as-is, even when it contradicts or fabricates information beyond the source documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No recovery&lt;/strong&gt; — When retrieval fails to find relevant documents, the system has no mechanism to retry with a different strategy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;DocForge addresses every one of these with a multi-agent architecture.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture: Four Agents, One Pipeline
&lt;/h2&gt;

&lt;p&gt;DocForge is built on &lt;strong&gt;LangGraph&lt;/strong&gt;, which orchestrates four specialized agents into a stateful workflow:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Query
    │
    ▼
┌─────────────────┐
│   Redis Cache    │ ◄── Check cache first (SHA-256 key)
└────────┬────────┘
         │ (cache miss)
         ▼
┌─────────────────┐
│  Routing Agent   │ ◄── Classify complexity, optimize search query
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Retrieval Agent  │ ◄── Fetch 3–10 docs from Pinecone
└────────┬────────┘     (50% more on retry, relaxed threshold)
         │
         ▼
┌─────────────────┐
│ Analysis Agent   │ ◄── Synthesize cited answer (chain-of-thought)
└────────┬────────┘
         │
         ▼
    Confidence Check
    │
    ├── High confidence ──▶ Skip validation ──▶ Return &amp;amp; Cache
    │
    └── Otherwise:
         │
         ▼
    ┌─────────────────┐
    │ Validation Agent │ ◄── Fact-check every claim
    └────────┬────────┘
             │
             ▼
        ├── Valid           ──▶ Return &amp;amp; Cache
        ├── Invalid (&amp;lt; 3)  ──▶ Retry from Retrieval (adaptive)
        └── Invalid (≥ 3)  ──▶ Return corrected answer &amp;amp; Cache
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let me break down each agent.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Routing Agent — The Dispatcher
&lt;/h3&gt;

&lt;p&gt;Not all questions are equal. "What is LangGraph?" is a simple lookup. "Compare the tradeoffs of LangGraph vs. CrewAI for multi-agent orchestration" requires complex reasoning across multiple sources.&lt;/p&gt;

&lt;p&gt;The Routing Agent classifies every incoming query into one of three types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple lookup&lt;/strong&gt; — Direct factual questions (retrieves 3 documents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning&lt;/strong&gt; — Questions requiring synthesis across sources (retrieves 7 documents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-hop&lt;/strong&gt; — Questions that chain multiple pieces of information (retrieves 10 documents)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also rewrites the user's natural-language query into an optimized search query for the vector database, improving retrieval relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Retrieval Agent — Adaptive Search
&lt;/h3&gt;

&lt;p&gt;Based on the routing classification, the Retrieval Agent queries &lt;strong&gt;Pinecone&lt;/strong&gt; with the appropriate number of documents and relevance threshold.&lt;/p&gt;

&lt;p&gt;The key innovation here is &lt;strong&gt;adaptive retry&lt;/strong&gt;. If the Validation Agent later rejects the answer, retrieval reruns with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50% more documents&lt;/strong&gt; than the previous attempt&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;relaxed relevance threshold&lt;/strong&gt; to cast a wider net&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the system self-corrects when initial retrieval wasn't sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Analysis Agent — The Synthesizer
&lt;/h3&gt;

&lt;p&gt;The Analysis Agent takes the retrieved document chunks and synthesizes a coherent, cited answer using chain-of-thought reasoning. Every claim in the answer is tied back to a specific source document.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Validation Agent — The Fact-Checker
&lt;/h3&gt;

&lt;p&gt;This is the agent that makes DocForge different. The Validation Agent independently fact-checks &lt;strong&gt;every claim&lt;/strong&gt; in the synthesized answer against the source documents. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifies unsupported claims&lt;/li&gt;
&lt;li&gt;Detects hallucinated information&lt;/li&gt;
&lt;li&gt;Flags contradictions with sources&lt;/li&gt;
&lt;li&gt;Provides a corrected answer when issues are found&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If validation fails, the system retries from retrieval with an adaptive strategy — up to 3 attempts. If it still fails after maximum retries, it returns the best corrected answer it has.&lt;/p&gt;




&lt;h2&gt;
  
  
  Smart Optimizations That Matter in Production
&lt;/h2&gt;

&lt;p&gt;Building a multi-agent system that's correct is one thing. Making it fast and cost-effective is another.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confidence-Based Validation Skip
&lt;/h3&gt;

&lt;p&gt;Not every answer needs fact-checking. When all three conditions are met, DocForge skips the Validation Agent entirely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval scores are above &lt;strong&gt;0.85&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;At least &lt;strong&gt;3 source documents&lt;/strong&gt; were used&lt;/li&gt;
&lt;li&gt;No information gaps were detected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This saves 30–40% latency on high-confidence queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis Caching
&lt;/h3&gt;

&lt;p&gt;Every query result is cached in Redis with a SHA-256 key and 1-hour TTL. Repeated queries return instantly — roughly &lt;strong&gt;10x faster&lt;/strong&gt; than a fresh pipeline run, with zero token cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task-Specific Model Selection
&lt;/h3&gt;

&lt;p&gt;Different agents need different capabilities. DocForge lets you assign different models per task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fast, cheap model for simple routing decisions&lt;/span&gt;
&lt;span class="nv"&gt;GEMINI_ROUTING_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini-2.0-flash-lite

&lt;span class="c"&gt;# More capable model for complex synthesis and validation&lt;/span&gt;
&lt;span class="nv"&gt;GEMINI_ANALYSIS_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini-2.5-flash
&lt;span class="nv"&gt;GEMINI_VALIDATION_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cuts token costs by &lt;strong&gt;40–50%&lt;/strong&gt; compared to using a single expensive model for everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dual LLM Provider Support
&lt;/h3&gt;

&lt;p&gt;DocForge supports both &lt;strong&gt;OpenAI GPT&lt;/strong&gt; (via OpenRouter) and &lt;strong&gt;Google Gemini&lt;/strong&gt;. Switch providers with a single environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLM_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini  &lt;span class="c"&gt;# or "gpt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Getting Started in 5 Minutes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11+&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://www.pinecone.io/" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt; account (free tier works)&lt;/li&gt;
&lt;li&gt;An &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; or &lt;a href="https://ai.google.dev/" rel="noopener noreferrer"&gt;Google Gemini&lt;/a&gt; API key&lt;/li&gt;
&lt;li&gt;Redis (optional, for caching)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ToheedAsghar/DocForge.git
&lt;span class="nb"&gt;cd &lt;/span&gt;DocForge
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLM_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini
&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-gemini-key
&lt;span class="nv"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-pinecone-key
&lt;span class="nv"&gt;PINECONE_ENVIRONMENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;span class="nv"&gt;PINECONE_INDEX_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;techdoc-intelligence
&lt;span class="nv"&gt;REDIS_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;redis://localhost:6379
&lt;span class="nv"&gt;CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ingest Your Documents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;backend.ingestion.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ingest_documents&lt;/span&gt;

&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ingest_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./documents/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ingested &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;documents_loaded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; PDFs → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chunks_created&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Query the System
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;backend.agents.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_graph&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is LangGraph?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fact_checked_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validation: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation_passed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;corrected&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Latency: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Or Use the REST API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn backend.main:app &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8000/api/v1/query &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"query": "What is LangGraph?"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker is also supported:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent Orchestration&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Providers&lt;/td&gt;
&lt;td&gt;OpenAI GPT-4o-mini (via OpenRouter), Google Gemini 2.5 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;OpenAI &lt;code&gt;text-embedding-3-small&lt;/code&gt; (1536 dims)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Database&lt;/td&gt;
&lt;td&gt;Pinecone (serverless, cosine similarity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;Redis (SHA-256 keys, 1-hour TTL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Framework&lt;/td&gt;
&lt;td&gt;FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Framework&lt;/td&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration&lt;/td&gt;
&lt;td&gt;Pydantic Settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Containerization&lt;/td&gt;
&lt;td&gt;Docker + Docker Compose&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;A few takeaways from building a multi-agent RAG system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Validation is worth the latency cost.&lt;/strong&gt; In my testing, the Validation Agent caught hallucinated claims in roughly 15–20% of responses. That's 1 in 5 answers that would have been wrong without it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Adaptive retry is better than aggressive retrieval.&lt;/strong&gt; Instead of always retrieving 10+ documents (slow, expensive, noisy), start small and retry with more only when needed. Most queries are answered well with 3–5 documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Caching is a multiplier.&lt;/strong&gt; In any production Q&amp;amp;A system, users ask similar questions repeatedly. Redis caching turned repeated queries from 3–5 second operations into sub-100ms responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Different tasks need different models.&lt;/strong&gt; Routing a query is a simple classification task — it doesn't need GPT-4. Synthesizing a multi-source answer does. Task-specific model assignment is an easy win for cost optimization.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;DocForge is actively being developed. Here's what's on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for more document formats (DOCX, TXT, Markdown, HTML)&lt;/li&gt;
&lt;li&gt;Conversation history and multi-turn chat&lt;/li&gt;
&lt;li&gt;A frontend UI for non-technical users&lt;/li&gt;
&lt;li&gt;Multi-tenancy support&lt;/li&gt;
&lt;li&gt;Deployment guides for AWS, Railway, and Render&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;DocForge is fully open-source under the MIT license. If you're building a RAG system and tired of hallucinated answers, give it a spin:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ToheedAsghar/DocForge" rel="noopener noreferrer"&gt;github.com/ToheedAsghar/DocForge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you found this useful, a star on the repo would mean a lot. I'm also happy to answer questions in the comments — whether it's about the architecture, LangGraph, or multi-agent systems in general.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/ToheedAsghar" rel="noopener noreferrer"&gt;Toheed Asghar&lt;/a&gt; with LangGraph, LangChain, Pinecone, and FastAPI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>langchain</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
