DEV Community

firstdata
firstdata

Posted on

How to Fact-Check Your AI Agent's Answers Using Authoritative Data Sources

Your AI agent just told a user that Brazil's GDP growth was 4.2% last year. Is that right? How would you even check?

This is the hallucination problem — and it's not going away. LLMs generate plausible-sounding answers, but they don't actually know facts. They pattern-match from training data that might be outdated, biased, or just plain wrong.

The Real Cost of Wrong Answers

A McKinsey survey found that 65% of organizations using generative AI reported at least one accuracy incident in production. In finance, healthcare, and policy — wrong numbers aren't just embarrassing, they're dangerous.

The fix isn't better prompting. It's grounding your AI in authoritative data sources.

What Makes a Data Source "Authoritative"?

Not all data is created equal. Here's the hierarchy:

Level Source Type Example Trust Score
🏛️ Government National statistics offices US Census Bureau, China NBS ⭐⭐⭐⭐⭐
🌐 International UN/World Bank/IMF World Bank Open Data ⭐⭐⭐⭐⭐
🔬 Research Universities, think tanks Our World in Data ⭐⭐⭐⭐
📊 Market Industry bodies Bloomberg, S&P ⭐⭐⭐
🏢 Commercial Paid data vendors Statista ⭐⭐

Building a Fact-Checking Pipeline

Here's a practical architecture:

User Query → AI Agent → Generate Answer
                ↓
         Extract Claims
                ↓
    Match to Authoritative Sources
                ↓
      Verify Against Real Data
                ↓
         Return with Citations
Enter fullscreen mode Exit fullscreen mode

Step 1: Identify Verifiable Claims

Not every AI output needs fact-checking. Focus on:

  • Numerical claims (statistics, percentages, rankings)
  • Temporal claims ("as of 2024", "last quarter")
  • Geographic claims ("in the EU", "across ASEAN")

Step 2: Map Claims to Data Sources

This is where most teams get stuck. You need a knowledge base of data sources — knowing which organization publishes what data, in what format, with what API.

For example:

  • GDP data → World Bank, IMF, national statistics offices
  • Trade data → UN Comtrade, WTO
  • Health data → WHO, national health ministries
  • Climate data → IPCC, NOAA, national weather services

Step 3: Query the Source

Many authoritative sources now offer APIs:

# Example: Query World Bank API for GDP data
import requests

url = "https://api.worldbank.org/v2/country/BRA/indicator/NY.GDP.MKTP.KD.ZG"
params = {"format": "json", "date": "2023"}
response = requests.get(url, params=params)
data = response.json()

actual_gdp_growth = data[1][0]["value"]  # Get the real number
Enter fullscreen mode Exit fullscreen mode

Step 4: Compare and Cite

ai_claim = 4.2  # What the AI said
actual = actual_gdp_growth  # What the data says

if abs(ai_claim - actual) > 0.5:
    return f"⚠️ Correction: Brazil's GDP growth was actually {actual}% (Source: World Bank)"
else:
    return f"✅ Verified: {actual}% (Source: World Bank)"
Enter fullscreen mode Exit fullscreen mode

The Missing Piece: A Data Source Directory

The hardest part of fact-checking isn't the code — it's knowing where to look.

That's why we built FirstData, an open-source knowledge base of 270+ authoritative data sources. It catalogs:

  • 🏛️ 60+ government statistical offices
  • 🌐 40+ international organizations (UN, World Bank, WHO, IMF)
  • 🔬 30+ research institutions
  • Complete with API endpoints, data domains, and access guides

It even has an MCP (Model Context Protocol) integration, so your AI agent can look up the right data source in real-time:

User: "What's the unemployment rate in Germany?"

Agent → MCP Query: search_source("germany unemployment")
     → Returns: germany-destatis (Federal Statistical Office)
     → Agent queries Destatis API
     → Returns verified answer with citation
Enter fullscreen mode Exit fullscreen mode

Try It Yourself

  1. Browse the catalog: github.com/MLT-OSS/FirstData
  2. Use the MCP endpoint: https://firstdata.deepminer.com.cn/mcp
  3. Star the repo if this is useful ⭐

Building trustworthy AI isn't about making models smarter — it's about connecting them to ground truth.

Top comments (0)