Michael O

Posted on Apr 20 • Edited on Apr 25 • Originally published at xeroaiagency.com

How to Stop Your AI Agent from Hallucinating Facts

#ai #solopreneur #automation #webdev

My AI co-founder published a tweet last year claiming Xero had "over 2,000 newsletter subscribers."

At the time, we had 47.

Not a catastrophic error. Nobody called it out publicly. But it was the kind of confident, plausible-sounding mistake that could have landed in a sales email, a pitch, or a press mention. That was the moment I stopped treating hallucinations as a model problem and started treating them as an architecture problem.

Here's what I built to fix it.

Why AI Agents Hallucinate in the First Place

AI agents hallucinate because they are pattern-completion engines that have no mechanism for saying "I don't know." When asked for a number, a date, or a claim they haven't been explicitly given, they generate the most plausible-sounding answer. Plausible is not accurate.

Three things make hallucinations more likely:

The agent has no authoritative source to reference. If there's no document or data feed telling it your real subscriber count, it will estimate based on context clues or training data.
The task requires specific facts but doesn't surface them. "Write a tweet about our growth" sounds like a creative task. The agent treats it as creative but pulls numbers from wherever they seem to fit.
There's no verification step before output. The agent writes, you read, it goes live. No layer in between checks whether the facts are real.

Most people solve problem one. Almost nobody solves problem three. That's the gap.

The Verification Layer: Three Documents, One Gate

The system I use in my Vault has three components. Each one closes a different failure mode.

1. The SOURCE_OF_TRUTH.md File

This is the most important document in the architecture. It's a plain-text file that contains every factual claim the agent is allowed to make about the business. Numbers, dates, product names, prices, URLs, founder background, company history.

Every time Evo needs to make a factual claim in any public-facing content, the source of truth file is loaded as part of the context. The prompt explicitly says: "Only use statistics, figures, and factual claims found in SOURCE_OF_TRUTH.md. If the fact you need is not in this file, do not make it up. Say the fact is unavailable and flag it for review."

This one change cut hallucinated numbers in public content by about 90%. The remaining 10% are cases where the file itself is outdated, which surfaces a different problem: you have to keep it current.

2. The Fact-Flag Prompt Layer

The source of truth file handles facts the agent knows about. The fact-flag layer handles facts it doesn't.

Any task that could require specific claims gets a wrapper prompt with this instruction appended:

Before finalizing this output, scan it for any specific claims: numbers, dates, product details, user counts, revenue figures, named results, case studies. For each claim, note whether it appears in SOURCE_OF_TRUTH.md or was generated from inference. If any claim was inferred, wrap it in [FACT CHECK: <claim>] and do not include it in the final output without confirmation.

This doesn't catch everything. But it catches the class of hallucination that's most dangerous: confident-sounding specific claims that sound real enough to pass a casual read.

3. The Human Review Gate for High-Stakes Output

Not every task needs human review. But certain categories always get it. In my system, those are:

Any content that includes a specific number (revenue, subscribers, users, results)
Any content that will be sent to a real person (email, DM, pitch)
Any content that makes a comparative or competitive claim
Any content that names a third party

For those, the output doesn't go to a queue for posting. It goes to a Telegram message with a prompt asking for explicit approval before it ships. This adds maybe 30 seconds to a task that would otherwise take 2 minutes. For the failure mode it prevents, that's a good trade.

What Doesn't Work (From Personal Experience)

Three approaches I tried that failed:

Telling the agent to "only state verified facts." This sounds like it should work. It doesn't. The agent interprets "verified" as "plausible given what I know," which is exactly the problem you're trying to solve.

Using a smarter model. More capable models still hallucinate on specific facts. The failure mode just sounds more confident. GPT-4 class models will invent a subscriber count with the same fluency they'd use to write a headline.

Manual review of all output. This works but it defeats the point of automation. If you're reading every sentence before it goes live, you've built a content assistant, not an AI system.

The common thread: none of these create a structural barrier. They rely on the agent's own judgment, which is the thing that fails. The three-layer system above creates external checks that don't depend on the agent catching its own mistakes.

Keeping the Source of Truth Current

The system only works if SOURCE_OF_TRUTH.md stays accurate. A stale source of truth is worse than no source of truth, because the agent will cite it confidently.

My maintenance rule: any time a real number changes (subscriber count, revenue, product price, post count), I update SOURCE_OF_TRUTH.md before the next session. Takes 30 seconds. I treat it like updating a shared spreadsheet. The agent checks it, not me.

I also do a quarterly audit. Every fact in the file gets checked against the actual source (Stripe, PostHog, MailerLite, wherever). If it's outdated, I update it. This takes about 20 minutes every three months and prevents the category of errors that would actually cause real damage.

The Practical Setup (What to Build First)

If you're starting from zero:

Week 1: Create SOURCE_OF_TRUTH.md. Populate it with every specific claim your AI might need to make: product names, prices, subscriber counts, revenue, founding date, everything. Don't make it perfect, just make it exist. An imperfect source of truth is still better than nothing.

Week 2: Add the fact-flag wrapper prompt to any task type that involves public content. Test it on a few runs and see what it surfaces.

Week 3: Identify your high-stakes output categories. Build the human review gate for those only.

Ongoing: Update SOURCE_OF_TRUTH.md whenever a real number changes. Audit quarterly.

The full system takes maybe four hours to build. Once it's running, the maintenance overhead is minimal. And the alternative is publishing a tweet that claims you have 2,000 subscribers when you have 47.

What This Means for Your AI Architecture

Hallucination prevention is not a prompt engineering problem. It's an information architecture problem. The agent needs a place to look up facts, a layer that flags when it's guessing, and a human in the loop for the outputs that matter.

This is one piece of the broader architecture I cover in The AI Co-Founder Stack at Xero AI. The book lays out the full system: identity files, source of truth, memory, guardrails, and the verification loop that keeps the whole thing honest.

If you're building an AI agent that runs autonomously, you need this layer. The errors it prevents aren't always visible. That's the point. By the time you notice a hallucination, it's already caused damage.

Build the gate before you need it.

Published by Michael Olivieri / Xero AI

Originally published at xeroaiagency.com

Want to build your own AI co-founder?

I'm building Xero in public — an AI system that runs distribution, content, and ops while I work a full-time job.

Start here (free): Your First AI Agent — $7 guide, instant download
Go deeper: Build an AI Co-Founder — the full architecture ($19)
Newsletter: AI for the Rest of Us — practical AI 3x/week for people with day jobs
Site: xeroaiagency.com

DEV Community