<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ashg2099</title>
    <description>The latest articles on DEV Community by ashg2099 (@ashg2099).</description>
    <link>https://dev.to/ashg2099</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3955365%2Fed9f127b-a2c5-43a6-8f27-8b319e748450.png</url>
      <title>DEV Community: ashg2099</title>
      <link>https://dev.to/ashg2099</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ashg2099"/>
    <language>en</language>
    <item>
      <title>I Built an Open-Source Multi-Agent Fact-Checker — Here's How It Works</title>
      <dc:creator>ashg2099</dc:creator>
      <pubDate>Thu, 28 May 2026 00:25:32 +0000</pubDate>
      <link>https://dev.to/ashg2099/i-built-an-open-source-multi-agent-fact-checker-heres-how-it-works-5eah</link>
      <guid>https://dev.to/ashg2099/i-built-an-open-source-multi-agent-fact-checker-heres-how-it-works-5eah</guid>
      <description>&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;We have a misinformation problem. But more specifically, we have a speed problem.&lt;br&gt;
A journalist spots a suspicious claim. They search for sources. Cross-reference databases. Call experts. Write a verdict. Get it edited. Publish, maybe 6 hours later. Maybe 3 days later.&lt;br&gt;
Meanwhile, the original claim has been screenshot, reposted, quoted in newsletters, and cited in arguments across five platforms.&lt;br&gt;
I wanted to build something that closed that gap. Not a chatbot that guesses. A proper pipeline, one that retrieves real evidence, reasons from it, and tells you why it reached a verdict.&lt;br&gt;
That's what Sift is.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Sift?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Sift (Source Inspection &amp;amp; Fact-checking Tool)&lt;/strong&gt; is an open-source multi-agent AI pipeline that takes any text, extracts every factual claim, retrieves grounded evidence, and returns auditable verdicts — TRUE, FALSE, or UNCERTAIN, with cited sources and full reasoning chains.&lt;br&gt;
Paste a news article. A politician's speech. A viral statistic. A WhatsApp forward. Sift breaks it into individual claims and fact-checks each one independently.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Multi-Agent?
&lt;/h2&gt;

&lt;p&gt;The naive approach is to ask an LLM: "Is this claim true?"&lt;br&gt;
The problem: LLMs hallucinate. They have knowledge cutoffs. They're confidently wrong in ways that are hard to detect. And critically, they don't show their work.&lt;br&gt;
A single LLM call can't reliably handle the full pipeline of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracting structured claims from noisy text&lt;/li&gt;
&lt;li&gt;Retrieving dated, traceable evidence from live sources&lt;/li&gt;
&lt;li&gt;Reasoning across conflicting evidence without confabulating&lt;/li&gt;
&lt;li&gt;Adversarially reviewing its own conclusions for overconfidence&lt;/li&gt;
&lt;li&gt;Finding corrections when something is wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these is a distinct task that benefits from its own prompt, its own tools, and its own failure modes. That's why I built five separate agents, orchestrated with LangGraph.&lt;/p&gt;
&lt;h2&gt;
  
  
  The 5-Agent Pipeline
&lt;/h2&gt;
&lt;h2&gt;
  
  
  Agent 1 — Claim Extractor
&lt;/h2&gt;

&lt;p&gt;A single paragraph can contain 4-5 distinct factual claims. Generic LLMs miss them or conflate them.&lt;br&gt;
This agent uses LLaMA 3.3 70B via Groq with Pydantic structured output to extract every distinct verifiable claim from the input text. The output is a typed list of claims — exact text, no paraphrasing, no hallucination.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent 2 — Evidence Hunter
&lt;/h2&gt;

&lt;p&gt;LLMs hallucinate citations. You need real, retrievable, dated evidence.&lt;br&gt;
This agent runs HyDE retrieval across 4,270 indexed Guardian + Wikipedia chunks stored in pgvector, then hits Tavily live web search for recent data.&lt;br&gt;
Why HyDE instead of standard RAG?&lt;br&gt;
Standard RAG embeds the raw claim and searches for similar text. A short factual claim like "The Fed raised rates in March 2024" has a weak semantic signal on its own.&lt;br&gt;
HyDE (Hypothetical Document Embeddings) generates a hypothetical document that would contain the answer — something like a news article excerpt — then embeds that. The result is a richer semantic signal and significantly better retrieval recall on short factual claims.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent 3 — Synthesis Agent
&lt;/h2&gt;

&lt;p&gt;This agent reasons strictly from retrieved evidence. It returns TRUE / FALSE / UNCERTAIN with a calibrated confidence score.&lt;br&gt;
Critically — if evidence is thin or conflicting, it returns UNCERTAIN instead of confabulating certainty. This was one of the hardest things to get right. LLMs naturally trend toward false confidence. I had to explicitly prompt for epistemic humility and add Pydantic validators to catch zero-confidence outputs.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent 4 — Critic Agent
&lt;/h2&gt;

&lt;p&gt;Synthesis agents tend toward overconfidence when evidence partially supports a claim. You need an adversarial check.&lt;br&gt;
This agent independently reviews every verdict. It flags unsupported reasoning, catches cases where 1.1°C vs 1.19°C is a rounding difference, not a false claim, and adjusts confidence downward when warranted.&lt;br&gt;
This is the step most fact-checking systems skip — and it's the one that matters most for borderline claims.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent 5 — Correction Agent
&lt;/h2&gt;

&lt;p&gt;Knowing something is false isn't enough. Users need to know what IS true.&lt;br&gt;
This agent fires only on FALSE or UNCERTAIN verdicts. It runs a targeted live search to find the correct information and surfaces it with a cited source. Conditional — doesn't waste tokens on TRUE verdicts.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why LangGraph?
&lt;/h2&gt;

&lt;p&gt;The pipeline isn't linear for every claim. Some claims have no evidence — they skip synthesis and go straight to the criticism. Some need multiple retrieval attempts. Some claims loop.&lt;br&gt;
LangGraph's state machine handles conditional branching, loops, and shared state across agents cleanly. The state is typed with TypedDict — every agent reads from and writes to the same state object.&lt;/p&gt;
&lt;h2&gt;
  
  
  Infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FastAPI&lt;/strong&gt; returns a task ID immediately. &lt;strong&gt;Celery + Redis&lt;/strong&gt; runs the pipeline in the background. The client polls for results.&lt;br&gt;
&lt;strong&gt;Redis cache&lt;/strong&gt; stores results for 7 days — the same viral claim doesn't cost tokens twice. Cache hits at the API layer return in under 1 second, before Celery even runs.&lt;br&gt;
&lt;strong&gt;LangFuse&lt;/strong&gt; traces every LLM call — prompt, output, latency, token count — so I can debug agent failures without guessing.&lt;/p&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;LLM: LLaMA 3.3 70B via Groq API&lt;br&gt;
Embeddings: all-MiniLM-L6-v2 via HuggingFace Inference API&lt;br&gt;
Orchestration: LangGraph state machine&lt;br&gt;
RAG: HyDE + pgvector hybrid search&lt;br&gt;
Vector DB: PostgreSQL + pgvector&lt;br&gt;
API: FastAPI + Pydantic&lt;br&gt;
Task Queue: Celery + Redis&lt;br&gt;
Evidence Sources: Tavily (live) + Guardian API + Wikipedia&lt;br&gt;
Observability: LangFuse + Prometheus + Grafana&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The project is fully open source and Dockerized. One command runs the entire stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/ashg2099/Sift.git
cd Sift
cp .env.example .env
# Add your API keys (Groq, Tavily, HuggingFace — all free tiers)
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;strong&gt;&lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;&lt;/strong&gt; and start verifying claims.&lt;br&gt;
I'm actively looking for feedback — especially where it breaks. If you try it, I'd love to know what it gets wrong.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/ashg2099/Sift" rel="noopener noreferrer"&gt;https://github.com/ashg2099/Sift&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="https://www.linkedin.com/in/ashwin-gururaj-93943816a/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/ashwin-gururaj-93943816a/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
