<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ajay Mahadeven</title>
    <description>The latest articles on DEV Community by Ajay Mahadeven (@ajaymahadeven).</description>
    <link>https://dev.to/ajaymahadeven</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1198473%2F23de77d1-aa80-4da4-9128-2328ea39627e.jpeg</url>
      <title>DEV Community: Ajay Mahadeven</title>
      <link>https://dev.to/ajaymahadeven</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ajaymahadeven"/>
    <language>en</language>
    <item>
      <title>Building a Production-Grade AI Fact-Checker: Patterns, Pipelines, and the Question Every AI Engineer Must Answer</title>
      <dc:creator>Ajay Mahadeven</dc:creator>
      <pubDate>Wed, 22 Apr 2026 14:10:55 +0000</pubDate>
      <link>https://dev.to/ajaymahadeven/building-a-production-grade-ai-fact-checker-patterns-pipelines-and-the-question-every-ai-35im</link>
      <guid>https://dev.to/ajaymahadeven/building-a-production-grade-ai-fact-checker-patterns-pipelines-and-the-question-every-ai-35im</guid>
      <description>&lt;p&gt;&lt;em&gt;A research project from &lt;a href="https://economicdatasciences.com" rel="noopener noreferrer"&gt;Economic Data Sciences&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Before we begin
&lt;/h2&gt;

&lt;p&gt;This is not a tutorial about calling an AI API. You can find that in ten minutes on YouTube.&lt;/p&gt;

&lt;p&gt;This is about what happens &lt;em&gt;after&lt;/em&gt; you've called the API and realised that's the easy part — about the architecture decisions that separate a prototype from a system you can reason about, trust, and maintain. About the one question that keeps coming up in every serious AI project we build at Economic Data Sciences:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should this input go to the AI model at all?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The code is &lt;a href="https://github.com/ajaymahadeven/fact-check-analyzer" rel="noopener noreferrer"&gt;public&lt;/a&gt;. The app is &lt;a href="https://fact-check-analyzer.vercel.app/" rel="noopener noreferrer"&gt;live&lt;/a&gt;. Follow along.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;A full-stack AI fact-checker. You submit a claim — text, PDF, CSV, DOCX, or Markdown — and get back a verdict (TRUE / FALSE / DISPUTED / UNVERIFIABLE), a confidence score, reasoning, and cited sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugcjcbm5l8dyk2awgeoc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugcjcbm5l8dyk2awgeoc.png" alt=" " width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The full interface — five input types, a rate limit counter, and a "How to use" link for first-time visitors.      &lt;/p&gt;




&lt;p&gt;Under the hood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Two-stage AI pipeline&lt;/strong&gt; — a guardrail classifier runs before the analyzer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud provider abstraction&lt;/strong&gt; — one interface, three providers (Azure AI Foundry, AWS Bedrock, GCP Vertex AI), swappable via a single env var&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model rotation&lt;/strong&gt; — primary, round-robin, or fallback strategies across a model pool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File upload support&lt;/strong&gt; — documents extracted, analyzed, deleted. No retention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; — the entire pipeline exposed as a callable tool for Claude Desktop and Claude Code&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DB-level caching, token tracking, rate limiting, and a monthly spending guard&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not familiar with the app? There's a &lt;a href="https://fact-check-analyzer.vercel.app/how-to-use" rel="noopener noreferrer"&gt;How to Use&lt;/a&gt; page built for non-technical users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxtmywchidrwuuq47mvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxtmywchidrwuuq47mvf.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The How to Use page — plain language, step-by-step, with live result mock-ups so users know what to expect before they try.                                             &lt;/p&gt;




&lt;h2&gt;
  
  
  The question that shapes everything
&lt;/h2&gt;

&lt;p&gt;Here is a claim: &lt;em&gt;"Neil Armstrong walked on the moon in 1969."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is another input: &lt;em&gt;"How long do dolphins live?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A naive implementation sends both to the AI analyzer. The first is fact-checkable. The second is a question. They require completely different handling — but they look identical at the API boundary. Both are strings. Both come from a user.&lt;/p&gt;

&lt;p&gt;This is the question every AI engineer eventually faces: &lt;strong&gt;not all inputs are equal, and treating them as if they are costs you money, time, and trust.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1: The Guardrail Classifier
&lt;/h2&gt;

&lt;p&gt;Before the fact-check analyzer ever runs, every input goes through a classifier. One AI call. One purpose: is this a verifiable true/false claim?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/server/ai/classifier.ts&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a strict claim validator for a fact-checking application.

Your only job is to decide whether a given input is a verifiable 
true/false claim about the real world.

A VALID claim:
- Can be verified as true or false using evidence
- Makes a factual assertion about the real world

An INVALID claim falls into one of these categories:
- INFORMATIONAL: asking for information
- OPINION: subjective, no factual answer
- MATH: a mathematical expression
- IRRELEVANT: unrelated task
- HARMFUL: dangerous, hateful, or abusive content`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the classifier returns &lt;code&gt;INVALID&lt;/code&gt;, the pipeline stops. No second AI call. No DB write for analysis. The user gets a clear, structured rejection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02wjahee6glufvh13ntg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02wjahee6glufvh13ntg.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;br&gt;
The guardrail classifier in action -  "How long do dolphins live?" is a question, not a claimPipeline stops here. No analyzer call made. &lt;/p&gt;



&lt;p&gt;This is not a UX feature. It is an &lt;strong&gt;architectural gate&lt;/strong&gt;. It exists because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; — the analyzer call costs 10x more tokens than the classifier. Never pay for it on garbage input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt; — the analyzer produces worse results on non-claims. A garbage verdict is worse than no verdict.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; — harmful content is rejected before it reaches the more capable model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The classifier is cheap, fast (temperature 0, max 100 tokens), and wrong occasionally — but wrong cheaply. The asymmetry is intentional.&lt;/p&gt;


&lt;h2&gt;
  
  
  The pipeline in full
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rawClaim
  │
  ▼
Normalize → SHA256 hash
  │
  ▼
DB cache check → HIT: return immediately, zero AI calls
  │ MISS
  ▼
Spending guard → throws if monthly USD cap reached
  │
  ▼
Guardrail Classifier (AI Call 1)
  ├─ INVALID → return rejection + store as training data
  └─ VALID
       │
       ▼
  Fact-Check Analyzer (AI Call 2)
       │
       ▼
  Store: Submission → Claim → AnalysisResult → Sources
       │
       ▼
  Return: verdict + confidence + reasoning + sources
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Two AI calls maximum. Often zero — the cache handles the rest. This is not an accident; it is a constraint we enforced from the first line of code.&lt;/p&gt;

&lt;p&gt;Here's what a successful result looks like:&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchvji5lfgqofga76w598.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchvji5lfgqofga76w598.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;FALSE · 95% confidence. The claim, the verdict,the confidence bar, the reasoning, three cited sources with credibility ratings, a timestamp, and thumbs up/down feedback. Everything surfaced from two AI calls. &lt;/p&gt;


&lt;h2&gt;
  
  
  The spending guard
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/server/pipeline/spending-guard.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;enforceSpendingLimit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;classifierSpend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analyzerSpend&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;classifierResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;_sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;outputTokens&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analysisResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;_sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;outputTokens&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;totalUsd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculateUsd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;classifierSpend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analyzerSpend&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;totalUsd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;MONTHLY_SPEND_LIMIT_USD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Monthly spend limit reached&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Every token from every AI call is stored in the database. The spending guard queries the real numbers before every AI call. Not estimates. Not request counts. Actual token costs. If you've spent $5 this month, the gate closes until the month resets.&lt;/p&gt;

&lt;p&gt;This pattern is not optional in production AI systems. Without it, a single misbehaving client or a bug in your rate limiter can route hundreds of dollars out of your account overnight.&lt;/p&gt;


&lt;h2&gt;
  
  
  Provider abstraction: the architecture decision that ages well
&lt;/h2&gt;

&lt;p&gt;The app started on Azure. The provider abstraction was designed from day one to support AWS Bedrock and GCP Vertex AI as well — the adapters are built and the interface is identical across all three. Swapping providers requires changing a single environment variable, nothing else in the pipeline changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/server/ai/providers/types.ts&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CloudProvider&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AIRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;AIResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One interface. The concrete implementation is a factory decision made once at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/server/ai/providers/index.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getProvider&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;CloudProvider&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CLOUD_PROVIDER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;azure&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RotatingProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AzureProvider&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RotatingProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AWSProvider&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RotatingProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GCPProvider&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;RotatingProvider&lt;/code&gt; wrapper implements three strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Behaviour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Always use the first model in the pool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;round-robin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cycle through models, one per call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fallback&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Try the first; on error, try the next&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every &lt;code&gt;AIResponse&lt;/code&gt; carries &lt;code&gt;modelUsed&lt;/code&gt; — stored to the DB alongside the result. When you rotate to a new model and your quality metrics shift, you know exactly what changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  File uploads: when the document is the source of truth
&lt;/h2&gt;

&lt;p&gt;The original pipeline assumes the claim is about the world. File uploads flip that assumption entirely — the analyzer should look &lt;em&gt;only&lt;/em&gt; at the uploaded document. General world knowledge is irrelevant and potentially harmful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwjg58qsc35pn2tj5u98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwjg58qsc35pn2tj5u98.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The PDF upload tab — drag-drop zone, claim input, two separate inputs: the document and the claim about it.   &lt;/p&gt;




&lt;p&gt;This required rethinking two pipeline stages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The classifier.&lt;/strong&gt; A document assertion like "the monthly retainer is $12,500" would be rejected by the text classifier as unverifiable from general knowledge — a correct assessment in the wrong context. The fix was a separate system prompt for file mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FILE_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a claim validator. The user has uploaded 
a document and is making a claim about its contents.

Mark as VALID if the input makes any factual assertion, even one specific to 
a contract, report, or dataset.

Mark as INVALID only if: it is a question, purely subjective, 
irrelevant, or harmful.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same output schema. Same Zod validation. Same DB storage. Different intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sources.&lt;/strong&gt; The text analyzer returns cited web URLs. The file analyzer has nothing to cite — the document itself is the evidence. We considered fabricating placeholder URLs to satisfy the schema. We didn't:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FILE_SOURCE_INSTRUCTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="s2"&gt;`For the sources array: return an empty array — do not invent URLs.
   Your reasoning already explains what in the document supports the verdict.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Zod schema was relaxed to &lt;code&gt;min(0)&lt;/code&gt;. The result card hides the sources section when empty. The reasoning carries the document evidence. No fabrication, no silent failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CSV aggregation.&lt;/strong&gt; A claim about totals — "total revenue is $62,375" — cannot be verified against a truncated row sample. The CSV extractor computes full column aggregates across every row and appends them below the table. The model receives both the sample and the computed totals. Claims about aggregates are verifiable even on large files.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6n8pb5ldh8f4wylhl4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6n8pb5ldh8f4wylhl4b.png" alt=" " width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CSV pipeline — test-sales.csv uploaded, claim checked against computed column totals. FALSE at 100%, confidence: column summary shows 1,935 units sold, not 5,000. No sources section — the data is the evidence. &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Blob lifecycle.&lt;/strong&gt; Files touch Azure Blob Storage for seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Upload → Download (Azure SDK) → Extract → Delete → Analyze
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Submission.fileUrl&lt;/code&gt; is stored as &lt;code&gt;null&lt;/code&gt; after deletion. Nothing persists after processing.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real production bug: pdf-parse and the test fixture trap
&lt;/h2&gt;

&lt;p&gt;When we deployed to Vercel, PDF uploads failed immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ENOENT: no such file or directory,
open './test/data/05-versions-space.pdf'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cause: &lt;code&gt;pdf-parse&lt;/code&gt;'s &lt;code&gt;index.js&lt;/code&gt; entry point contains a debug block that reads a test fixture file at import time. In a local Node.js environment the path resolves. Inside Next.js's webpack bundler it doesn't — the file doesn't exist in the build output.&lt;/p&gt;

&lt;p&gt;The fix was two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Import the lib path directly — bypasses the test runner in index.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pdfParse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pdf-parse/lib/pdf-parse.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.js — tell webpack not to bundle this package at all&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;serverExternalPackages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pdf-parse&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This class of bug — a package that behaves differently when bundled versus run natively — is common enough in Node.js that it's worth knowing the pattern. &lt;code&gt;serverExternalPackages&lt;/code&gt; in Next.js is specifically designed for it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt versioning: the discipline that makes AI systems debuggable
&lt;/h2&gt;

&lt;p&gt;Every AI call stores two fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;modelVersion:  "gpt-4.1"
promptVersion: "classifier-v1.0"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you change a system prompt, you need to know which results were produced by which version. Without this, you cannot measure whether a change improved or degraded quality.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input type&lt;/th&gt;
&lt;th&gt;Classifier version&lt;/th&gt;
&lt;th&gt;Analyzer version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;&lt;code&gt;classifier-v1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;analyzer-v1.0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF&lt;/td&gt;
&lt;td&gt;&lt;code&gt;classifier-v1.1-file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;analyzer-v2.0-pdf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CSV&lt;/td&gt;
&lt;td&gt;&lt;code&gt;classifier-v1.1-file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;analyzer-v2.0-csv&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DOCX&lt;/td&gt;
&lt;td&gt;&lt;code&gt;classifier-v1.1-file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;analyzer-v2.0-docx&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MD&lt;/td&gt;
&lt;td&gt;&lt;code&gt;classifier-v1.1-file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;analyzer-v2.0-md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When the prompt changes, the version increments. Every result in the DB is forever linked to the prompt that produced it. The classifier results are especially valuable — every VALID and INVALID decision is labeled training data for the next iteration.&lt;/p&gt;




&lt;h2&gt;
  
  
  The MCP server: the pipeline as a tool
&lt;/h2&gt;

&lt;p&gt;The entire pipeline is exposed as a single MCP tool callable from Claude Desktop or Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fact-check-analyzer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"tsx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--env-file=/path/to/.env"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--tsconfig=/path/to/tsconfig.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/src/mcp/index.ts"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, in any Claude conversation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Is it true that the Great Wall of China is visible from space?"&lt;/em&gt;&lt;br&gt;
→ Claude calls &lt;code&gt;fact_check_claim&lt;/code&gt;, runs the full pipeline, returns &lt;code&gt;FALSE · 95% confidence&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The MCP server is not a wrapper around the app. It &lt;em&gt;is&lt;/em&gt; the app — same pipeline, same DB, same spending guard. The interface changed; the logic did not.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architectural parallel: when AI is not the answer
&lt;/h2&gt;

&lt;p&gt;This project was built as research at &lt;a href="https://economicdatasciences.com" rel="noopener noreferrer"&gt;Economic Data Sciences&lt;/a&gt;. The direct application is a larger platform where AI models work alongside deterministic decision engines.&lt;/p&gt;

&lt;p&gt;The central design challenge there — and the one this project was built to study — is this: &lt;strong&gt;not every user prompt belongs in front of a language model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some inputs contain signal that a deterministic system can process faster, cheaper, and more reliably than an AI model. The job of the classifier layer is to detect that signal and route accordingly — before the expensive call, not after.&lt;/p&gt;

&lt;p&gt;In the fact-checker, that line is drawn at "is this a verifiable claim?" — a relatively simple binary. In a production decision-support system, the same question is harder: does this prompt contain intent that a structured reasoning engine should handle? Is the user expressing a constraint, an objective, a preference — something that maps to a computational model rather than language generation?&lt;/p&gt;

&lt;p&gt;The architectural answer is the same in both cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Run a lightweight classifier on every input, before the main operation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Make the routing decision explicit and auditable — store it&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Never skip the gate to save one round trip — that round trip is the system's immune system&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the classifier prompt narrow and fast&lt;/strong&gt; — one job, doesn't need to be smart&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fact-checker is the simplified, public version of that pattern. The gate is binary. The routing is binary. But the lesson — that a cheap classifier protecting an expensive operation is not overhead, it is architecture — transfers directly to systems where the routing decisions are far more complex and the stakes far higher.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we'd tell ourselves at the start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Never trust AI output shape — always validate with Zod
- AI calls are not database calls — slow, expensive, non-deterministic
- Cache aggressively — never pay the AI twice for the same question
- Always cap max_tokens — every call, every time
- Version your prompts — so you can measure what changed
- The classifier is not optional — it is the cheapest line of defence you have
- serverExternalPackages exists for a reason — know when to use it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://fact-check-analyzer.vercel.app/" rel="noopener noreferrer"&gt;fact-check-analyzer.vercel.app&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;How to use:&lt;/strong&gt; &lt;a href="https://fact-check-analyzer.vercel.app/how-to-use" rel="noopener noreferrer"&gt;fact-check-analyzer.vercel.app/how-to-use&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/ajaymahadeven/fact-check-analyzer" rel="noopener noreferrer"&gt;github.com/ajaymahadeven/fact-check-analyzer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few claims worth trying:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Claim&lt;/th&gt;
&lt;th&gt;Expected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"The Great Wall of China is visible from space&lt;/td&gt;
&lt;td&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;with the naked eye"&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Humans use only 10% of their brain"&lt;/td&gt;
&lt;td&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Finland has more lakes than any other country"&lt;/td&gt;
&lt;td&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upload &lt;code&gt;test-contract.pdf&lt;/code&gt; → "The contract&lt;/td&gt;
&lt;td&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;expires on 31 December 2025"&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upload &lt;code&gt;test-sales.csv&lt;/code&gt; → "Total units sold&lt;/td&gt;
&lt;td&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;across all months      exceeds 5,000"&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Test files are in &lt;code&gt;src/tests/uploads/&lt;/code&gt; with expected verdicts documented.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This project is a research initiative of &lt;a href="https://economicdatasciences.com" rel="noopener noreferrer"&gt;Economic Data Sciences&lt;/a&gt;, exploring production patterns for AI-augmented decision systems. Questions, disagreements, and pull requests welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>systemdesign</category>
      <category>softwareengineering</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
