Saurav Bhattacharya

Posted on Jun 8

Hallucination Detection Is Not a Model Problem—It's an Infrastructure Problem

#ai #typescript #testing #observability

Every team I talk to treats hallucination like a model quality issue. "We'll just use a better model." "We'll add more context." "We'll fine-tune."

They're solving the wrong problem.

Hallucination in production agentic systems isn't primarily about model capability. It's about missing infrastructure—the absence of runtime checks that catch when an agent's output diverges from grounded reality.

You don't need a better model. You need better plumbing.

The Three Hallucination Modes That Actually Matter

Forget the academic taxonomy. In production agent systems, hallucinations show up in three ways that actually break things:

1. Entity Fabrication — The agent references something that doesn't exist. A customer ID, a file path, an API endpoint. This is the easiest to catch and the one most teams ignore.

2. Temporal Drift — The agent states something that was true but isn't anymore. Prices, statuses, configurations. Your RAG context was stale 400ms after you fetched it.

3. Confidence Hallucination — The agent presents uncertain information with absolute certainty. No hedging, no caveats. This is the hardest to detect and the most dangerous in customer-facing systems.

Each mode requires a different detection strategy. Treating them uniformly is why most hallucination "solutions" fail.

Build Detection as Middleware, Not Post-Processing

The biggest architectural mistake: treating hallucination detection as a filter you slap on the output. By the time you're post-processing, you've already paid the latency cost of a bad generation, and your fallback path is "try again and hope."

Instead, build detection into your agent's execution middleware—between the LLM call and the action layer.

import { AgentMiddleware, GroundingCheck } from './agent-infra';

interface HallucinationResult {
  passed: boolean;
  mode: 'entity' | 'temporal' | 'confidence';
  evidence: string;
  severity: 'block' | 'warn' | 'log';
}

const hallucinationMiddleware: AgentMiddleware = {
  name: 'hallucination-detector',

  async intercept(ctx, next) {
    const output = await next();

    const checks: HallucinationResult[] = await Promise.all([
      // Entity grounding: verify all referenced entities exist
      checkEntityGrounding(output, ctx.knowledgeBase),
      // Temporal check: flag claims about state without fresh lookup
      checkTemporalClaims(output, ctx.lastFetchTimestamps),
      // Confidence calibration: detect absolute claims on uncertain data
      checkConfidenceCalibration(output, ctx.uncertaintyScores),
    ]);

    const blockers = checks.filter(c => !c.passed && c.severity === 'block');

    if (blockers.length > 0) {
      ctx.metrics.increment('hallucination.blocked', {
        modes: blockers.map(b => b.mode),
      });

      return ctx.fallback({
        reason: 'hallucination_detected',
        evidence: blockers,
        strategy: 're-ground-and-retry',
      });
    }

    const warnings = checks.filter(c => !c.passed && c.severity === 'warn');
    if (warnings.length > 0) {
      output.metadata.hedging = true;
      output.metadata.hallucinationWarnings = warnings;
    }

    return output;
  }
};

This isn't post-hoc filtering. It's structural. The agent cannot emit ungrounded output to the user without passing through detection.

Entity Grounding Is Embarrassingly Simple

The highest-ROI hallucination check is entity grounding, and it's almost trivial to implement:

async function checkEntityGrounding(
  output: AgentOutput,
  kb: KnowledgeBase
): Promise<HallucinationResult> {
  const entities = extractEntities(output.text);
  const ungrounded: string[] = [];

  for (const entity of entities) {
    const exists = await kb.exists(entity.type, entity.id);
    if (!exists) {
      ungrounded.push(`${entity.type}:${entity.id}`);
    }
  }

  return {
    passed: ungrounded.length === 0,
    mode: 'entity',
    evidence: ungrounded.length > 0 
      ? `Ungrounded entities: ${ungrounded.join(', ')}` 
      : 'All entities verified',
    severity: ungrounded.length > 0 ? 'block' : 'log',
  };
}

If your agent says "I've updated order #12847" and order #12847 doesn't exist in your system, that's not a nuanced problem. That's a lookup. Do the lookup.

Temporal Claims Need TTLs, Not Better Prompts

The second mode—temporal drift—is a caching problem wearing an AI costume.

Every piece of context your agent uses has a freshness window. A product price fetched 30 seconds ago? Probably fine. A deployment status fetched 5 minutes ago? Dangerous. A user's subscription tier fetched yesterday? Unacceptable for billing decisions.

Attach TTLs to your context, and flag any agent claim that relies on expired data:

function checkTemporalClaims(
  output: AgentOutput,
  fetchTimestamps: Map<string, number>
): HallucinationResult {
  const now = Date.now();
  const staleRefs: string[] = [];

  for (const [source, fetchedAt] of fetchTimestamps) {
    const ttl = getTTLForSource(source);
    if (now - fetchedAt > ttl && output.referencesSource(source)) {
      staleRefs.push(`${source} (stale by ${now - fetchedAt - ttl}ms)`);
    }
  }

  return {
    passed: staleRefs.length === 0,
    mode: 'temporal',
    evidence: staleRefs.length > 0
      ? `Stale sources referenced: ${staleRefs.join(', ')}` 
      : 'All sources within TTL',
    severity: staleRefs.length > 0 ? 'warn' : 'log',
  };
}

This is infrastructure, not AI. And it catches a class of hallucination that no amount of prompt engineering will solve.

The Confidence Problem Is Harder—But Measurable

Confidence hallucination requires a different approach. You need to track the uncertainty in your retrieval layer and propagate it forward.

If your RAG similarity scores are below 0.7 but your agent is making unhedged claims, something is wrong. If your agent is answering questions where zero documents were retrieved, something is very wrong.

The pattern: instrument your retrieval confidence, pass it through to the output layer, and flag mismatches between source confidence and output certainty.

Stop Hoping Models Will Self-Correct

The uncomfortable truth: models will get better at avoiding hallucination. But "better" isn't "reliable." You wouldn't ship a payment system that works 97% of the time without validation checks. Don't ship an agent that way either.

Hallucination detection is infrastructure. Build it like infrastructure—deterministic where possible, probabilistic only where necessary, and always observable.

If you're building this kind of detection into your agent eval pipeline, agent-eval gives you the grounding check primitives out of the box. And if you need to visualize where hallucinations are occurring across runs, AgentLens surfaces the patterns you'd otherwise miss in logs.

Hallucination isn't a model problem. It's your problem. Build the infrastructure.

Top comments (1)

Alex Shev • Jun 8

Yes. Hallucination control usually becomes an evidence pipeline problem: what sources did the system consult, what claims need grounding, and what should block output until verified.