Your agent generates a claim. Then what?
In most agent pipelines: nothing. The claim flows straight into the next action — a tool call, a database write, a message sent. If the claim is wrong, the action is wrong, and the first person to notice is usually the user.
There are three patterns that fix this. Each one adds a verification step between generation and action — pre-action claim verification — and each one fits a different stage of agent maturity.
All three patterns below use AgentOracle (free to try, no wallet, no API keys). The code works as-is. Copy it, run it.
Pattern 1: Verify-Then-Act Gate (simplest)
Your agent has exactly one claim it's about to act on. You want a hard pass/fail before anything happens.
pip install langchain-agentoracle
from langchain_agentoracle import AgentOracleVerifyGateTool
gate = AgentOracleVerifyGateTool()
def verify_then_act(claim: str, action_fn):
"""Gate an action behind a single claim verification."""
result = gate.run(claim)
# Gate returns PASS/FAIL + confidence. Parse from formatted output.
if "Recommendation: ACT" in result:
return action_fn()
print(f"Action blocked — verification failed:\n{result}")
return None
# Example: agent thinks a contract exists and wants to call it
claim = "Contract 0xabc...123 is a valid USDC contract on Base mainnet"
verify_then_act(claim, lambda: call_contract(...))
When to use this:
- Single atomic claim ("X is true, therefore do Y")
- Binary decisions (proceed or halt)
- Free —
/verify-gatehas no cost
When it's not enough:
- Your agent generates paragraph-length output with multiple claims
- You need evidence for the verdict, not just pass/fail
- You want per-claim granularity (accept 3 of 4 claims, flag the 4th)
Pattern 2: Decompose-and-Score (most versatile)
Your agent outputs a paragraph. Some claims are factual, some might be hallucinated, and you don't want to throw out the whole output if only one sentence is wrong.
The /evaluate endpoint decomposes text into atomic claims, scores each independently, and returns per-claim verdicts. You can then keep the good claims, correct the bad ones, or flag them for human review.
from langchain_agentoracle import AgentOracleEvaluateTool
import json
import re
evaluator = AgentOracleEvaluateTool()
def audit_agent_output(text: str):
"""Decompose text into claims, verify each, return structured audit."""
raw = evaluator.run(text)
# The tool returns a formatted string. Extract per-claim verdicts.
claims = []
for block in re.findall(r"\[(SUPPORTED|REFUTED|UNVERIFIABLE)\] \((\d\.\d+)\) (.+?)(?=\n\n|\Z)", raw, re.DOTALL):
verdict, confidence, body = block
lines = body.strip().split("\n")
claim_text = lines[0].strip()
evidence = ""
for line in lines[1:]:
if line.strip().startswith("Evidence:"):
evidence = line.split("Evidence:", 1)[1].strip()
break
claims.append({
"claim": claim_text,
"verdict": verdict,
"confidence": float(confidence),
"evidence": evidence,
})
return claims
# Example: your agent produced this summary
agent_summary = """
OpenAI released GPT-4 in March 2023.
Bitcoin was created by Elon Musk in 2009.
Python was created by Guido van Rossum in 1991.
"""
audit = audit_agent_output(agent_summary)
for c in audit:
print(f"{c['verdict']:14} ({c['confidence']:.2f}) {c['claim']}")
if c['verdict'] == 'REFUTED':
print(f" ↳ {c['evidence'][:120]}")
Sample output:
SUPPORTED (1.00) OpenAI released GPT-4 in March 2023
REFUTED (0.83) Bitcoin was created by Elon Musk in 2009
↳ Bitcoin's creator is the pseudonymous Satoshi Nakamoto, not Elon Musk.
SUPPORTED (1.00) Python was created by Guido van Rossum in 1991
What you can do with this:
# Keep supported, flag refuted, escalate low-confidence
safe_claims = [c for c in audit if c["verdict"] == "SUPPORTED" and c["confidence"] >= 0.8]
need_human = [c for c in audit if c["verdict"] == "UNVERIFIABLE" or c["confidence"] < 0.5]
refuted = [c for c in audit if c["verdict"] == "REFUTED"]
if refuted:
# Regenerate with the corrections inline, or just flag
log_hallucination(refuted)
When to use this:
- Multi-claim agent output (summaries, research, plans)
- You need evidence, not just a verdict
- You want to selectively keep/reject claims
Pattern 3: Multi-Agent Supervisor (most advanced)
Now you have a CrewAI crew with a researcher agent and a writer agent. The writer is about to publish. You want a supervisor agent that:
- Discovers the right verification provider (via Decixa's multi-provider registry, falling back to local)
- Calls that provider to audit the writer's draft
- Only passes the draft through if verification clears a threshold
This is where AgentOracle's MCP server shines. It exposes both the resolve tool (discovery) and the verification tools in a single MCP binary any agent can call.
# No install — runs via npx
npx agentoracle-mcp
Hook it into Claude Desktop or Cursor or any MCP-compatible runtime. Then in your agent framework:
from crewai import Agent, Task, Crew
from crewai_agentoracle import AgentOracleEvaluateTool
# Writer agent — generates content, might hallucinate
writer = Agent(
role="Technical Writer",
goal="Draft a factual summary of recent AI regulation news",
backstory="Experienced technical writer. Optimizes for readability.",
)
# Supervisor agent — audits the writer's output using AgentOracle
supervisor = Agent(
role="Fact-Check Supervisor",
goal="Catch hallucinations before the writer's draft ships",
backstory="Pedantic editor. Refuses to pass content with unverified claims.",
tools=[AgentOracleEvaluateTool()],
)
draft = Task(
description="Write 3 sentences about recent AI regulation news",
agent=writer,
expected_output="A 3-sentence factual summary",
)
review = Task(
description=(
"Evaluate the writer's draft using the AgentOracle tool. "
"If any claim is REFUTED, return 'BLOCKED: <reason>'. "
"If overall confidence is below 0.7, return 'NEEDS_HUMAN_REVIEW'. "
"Otherwise return 'APPROVED' plus the cleaned draft."
),
agent=supervisor,
expected_output="APPROVED | BLOCKED | NEEDS_HUMAN_REVIEW + reasoning",
context=[draft],
)
Crew(agents=[writer, supervisor], tasks=[draft, review]).kickoff()
Why this pattern matters:
- Separation of concerns: one agent writes, one agent verifies
- The supervisor can be a smaller, cheaper model — it just needs to call the tool and apply logic
- Works in CrewAI, AutoGen, LangGraph — any framework that supports agent-as-tool-user
The discovery angle: If you want the supervisor to choose verification providers dynamically (not hardcode AgentOracle), use the resolve tool (v2.1.0 of agentoracle-mcp, via Decixa):
resolve(
capability="analyze",
intent="verify a factual claim before acting"
)
Returns the best-matching x402 verification endpoint across the ecosystem, ranked by latency, price, and tag match. AgentOracle is the only pre-action truth oracle currently classified under "Analyze → Verification" on Decixa, so it'll come back first today. As more providers list, your supervisor automatically picks the best one for each query.
Which Pattern To Pick
| Your agent setup | Pattern |
|---|---|
| Single binary decision |
1. Verify-then-act gate (free, /verify-gate) |
| Paragraph output, need per-claim scoring |
2. Decompose-and-score (free during beta, /evaluate) |
| Multi-agent pipeline, supervisor pattern | 3. Multi-agent supervisor (CrewAI + MCP) |
All three work together. Start with Pattern 1 while you're prototyping. Graduate to Pattern 2 when your agent produces structured output. Move to Pattern 3 when you have a real pipeline with distinct agent roles.
Getting Started
Playground (no setup): agentoracle.co
Packages:
-
pip install langchain-agentoracle— PyPI -
pip install crewai-agentoracle— PyPI -
npx agentoracle-mcp— npm (Claude Desktop, Cursor, Windsurf) -
npm install agentoracle-verify— npm
Source: GitHub
Benchmark: We ran AgentOracle head-to-head against GPT-4o on 200 peer-reviewed FEVER claims. Results + methodology.
Hallucinations aren't a bug to patch. They're a property of large language models that doesn't go away with bigger training runs or better prompts. The only reliable fix is to add a verification step your agent can't bypass.
These three patterns are what that step looks like in production code.
Top comments (0)