Your LangChain agent is wrong about 10% of the time. Not occasionally — consistently, confidently, and silently.
The problem isn't the model. It's that your agent has no way to know when it's wrong. It receives information, formats a response, and acts. No second opinion. No fact-check. No circuit breaker.
This tutorial shows you how to add a verification layer in 5 minutes that catches hallucinations before your agent acts on them.
The Problem
LLM hallucination rates in 2026 range from 3% to 20% depending on the task. On a summarization benchmark, GPT-4 looks great. On open-ended factual questions — the kind your agent asks constantly — it's a different story.
The deeper problem: reasoning models hallucinate more on factual tasks, not less. The more a model "thinks through" an answer, the more likely it is to fill gaps with plausible-sounding fiction.
In a simple chatbot, a hallucination is embarrassing. In an autonomous agent pipeline, it's a wrong action. A refunded order, a bad recommendation, a compliance violation, a message sent to the wrong person.
The standard fix is human review. But human review defeats the purpose of an autonomous agent.
The real fix is a verification layer that runs before your agent acts — independently of the model that generated the claim.
Install
pip install langchain-agentoracle
That's it. No API keys. No configuration. The free tier gives you 20 preview verifications per hour to test with.
Quick Start: Verify Before Your Agent Acts
The simplest integration — verify a piece of text and get per-claim verdicts:
from langchain_agentoracle import AgentOracleEvaluateTool
verifier = AgentOracleEvaluateTool()
# Your agent just generated this text — is it true?
agent_output = """
OpenAI released GPT-4 in March 2023.
Bitcoin was created by Elon Musk.
The Python programming language was created by Guido van Rossum.
"""
result = verifier.run(agent_output)
print(result)
Here's what comes back:
EVALUATION RESULT
Overall confidence: 0.61
Recommendation: ACT
Claims found: 3 | Supported: 2 | Refuted: 1 | Unverifiable: 0
Sources used: sonar, sonar-pro, adversarial, gemma-4
CLAIMS:
✓ [SUPPORTED] (1.00) OpenAI released GPT-4 in March 2023
Evidence: Widely documented historical fact; GPT-4 was announced
and released on March 14, 2023.
✗ [REFUTED] (0.83) Bitcoin was created by Elon Musk
Evidence: Bitcoin's creator is the pseudonymous Satoshi Nakamoto.
Correction: Bitcoin was created by Satoshi Nakamoto, not Elon Musk.
✓ [SUPPORTED] (1.00) Python was created by Guido van Rossum
Evidence: Confirmed in official Python documentation and
Van Rossum's own statements.
Three claims went in. Two came back supported with evidence. One came back refuted with a correction. Your agent now knows claim #2 is wrong before it acts on it.
Add It to Your Agent's Toolbelt
Want your agent to verify claims on its own? Add the tools directly:
from langchain_agentoracle import get_agentoracle_tools
# Returns all 6 AgentOracle tools ready for your agent
tools = get_agentoracle_tools()
# Or pick specific ones:
from langchain_agentoracle import (
AgentOracleEvaluateTool, # Per-claim verification ($0.01)
AgentOracleVerifyGateTool, # Quick pass/fail gate (free)
AgentOraclePreviewTool, # Research preview (free, 20/hr)
)
The tools follow LangChain's BaseTool interface, so they plug into any agent:
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
from langchain_agentoracle import AgentOracleEvaluateTool, AgentOraclePreviewTool
llm = ChatOpenAI(model="gpt-4")
tools = [
AgentOracleEvaluateTool(),
AgentOraclePreviewTool(),
]
agent = initialize_agent(
tools,
llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True,
)
# The agent can now verify claims before acting
agent.run("Check if this is true: Tesla's market cap exceeded $2 trillion in 2024")
The Verify-Then-Act Pattern
The most useful pattern: gate your agent's actions on verification confidence.
from langchain_agentoracle import AgentOracleEvaluateTool
import json
verifier = AgentOracleEvaluateTool()
def verify_then_act(text, confidence_threshold=0.8):
"""Only act if verification confidence exceeds threshold."""
result = verifier.run(text)
# Parse the confidence from the result
# The tool returns a formatted string with overall confidence
if "Overall confidence:" in result:
conf_line = [l for l in result.split('\n') if 'Overall confidence' in l][0]
confidence = float(conf_line.split(': ')[1])
if confidence >= confidence_threshold:
print(f"✅ VERIFIED ({confidence}) — safe to act")
return True
else:
print(f"⚠️ LOW CONFIDENCE ({confidence}) — hold for review")
return False
return False
# In your agent pipeline:
claim = "The Federal Reserve raised interest rates in March 2024"
if verify_then_act(claim):
# proceed with the action
pass
else:
# flag for human review or use a fallback
pass
Free Quick Check: The Verify Gate
Don't need per-claim breakdowns? The verify gate gives you a fast pass/fail:
from langchain_agentoracle import AgentOracleVerifyGateTool
gate = AgentOracleVerifyGateTool()
# Quick binary check — free, no payment needed
result = gate.run("The speed of light is approximately 300,000 km per second")
print(result)
# VERIFY GATE: FAIL
# Confidence: 1.00
# Recommendation: ACT
# ("FAIL" = gate found no issues — content is safe to act on)
Why AgentOracle
Most hallucination detection tools are built for humans — dashboards, observability platforms, monitoring UIs. They tell you what went wrong after the fact.
AgentOracle is built for agents. It sits in the pipeline, takes any text, runs it through 4 independent verification sources in parallel, and returns a machine-readable verdict before your agent acts.
No dashboards. No subscriptions. No API keys to configure. Your agent calls /evaluate, gets ACT / VERIFY / REJECT with a confidence score and evidence, and decides what to do next.
What's under the hood:
- 4 independent sources: Sonar, Sonar Pro, Adversarial challenge, and Gemma 4
- Per-claim decomposition — complex text gets broken into individual verifiable claims
- Confidence calibration across sources
- Evidence and corrections for every verdict
- 1,900+ claim fingerprints in the database and growing daily
Try It Now
Playground — no setup, no payment: agentoracle.co
Paste any text and see per-claim verdicts in under 15 seconds.
Packages:
-
pip install langchain-agentoracle— PyPI -
pip install crewai-agentoracle— PyPI -
npm install agentoracle-verify— npm
Source: GitHub
Hallucinations aren't going away. The models are getting better, but "better" still means wrong 3-10% of the time on the tasks your agents actually run.
A verification layer doesn't replace a good model. It catches the cases where even a good model is confidently wrong — which is exactly when you need it most.
Top comments (0)