HK Lee

Posted on Mar 12 • Originally published at pockit.tools

How to Reduce AI Hallucinations in Production: Grounding, RAG, and Guardrails That Actually Work

#ai #llm #hallucinations #rag

Your AI feature shipped last Tuesday. By Thursday, a user screenshotted it confidently citing a Supreme Court case that doesn't exist. By Friday, your support inbox had 47 tickets about the chatbot inventing product features you've never built. By Monday, your PM is asking you to "just add a disclaimer."

Sound familiar? You're not alone. As of early 2026, only 29% of developers trust AI outputs — down from 40% in 2024 — and nearly half of AI-generated code enters codebases without full review. The core issue isn't that LLMs are broken — it's that we're deploying them without the engineering infrastructure to catch when they break.

Hallucinations aren't a bug you can patch. They're a fundamental property of how large language models work. But that doesn't mean you're helpless. There's a growing engineering discipline around making LLMs reliably useful — and it's far more nuanced than "just use RAG."

This guide covers the full stack of hallucination reduction: from understanding why models hallucinate, to implementing grounding techniques, building effective RAG pipelines, adding output guardrails, and monitoring hallucination rates in production. Every technique comes with TypeScript code you can adapt today.

Why LLMs Hallucinate: The Engineering Mental Model

Before you can fix hallucinations, you need to understand why they happen. Not the academic explanation — the practical engineering mental model that helps you predict when they'll occur.

The Completion Machine

LLMs don't "know" things. They predict the most likely next token given the preceding tokens. When you ask GPT-5 "What is the capital of France?", it doesn't look up a fact — it generates "Paris" because that's the statistically most probable continuation of your prompt across its training data.

This distinction matters because it explains when hallucinations happen:

Low-confidence regions. When the model encounters a prompt where multiple continuations are roughly equally probable, it picks one. Sometimes it picks wrong.
Training data gaps. Information that wasn't in the training data (or was sparse) forces the model to interpolate. It doesn't say "I don't know" by default — it generates plausible-sounding text.
Instruction-following pressure. When you tell a model to "always provide an answer," it will — even when it shouldn't. The instruction to be helpful conflicts with the instruction to be accurate.
Context window overflow. When relevant information is buried in a long context, the model may ignore it and generate from parametric memory instead. This is the "lost in the middle" problem.
Format-induced fabrication. When you ask for structured output (JSON, tables, lists with specific fields), the model may fabricate values to fill required fields rather than leave them empty.

The Hallucination Taxonomy

Not all hallucinations are equal. Categorizing them helps you pick the right countermeasure:

Type	Description	Example	Danger Level
Factual fabrication	Inventing facts that sound real	"The React useState hook was introduced in version 15.3"	🔴 High
Source fabrication	Citing non-existent sources	"According to the 2024 StackOverflow survey..." (wrong data)	🔴 High
Confident extrapolation	Extending real facts beyond truth	"PostgreSQL supports up to 100TB tables natively"	🟡 Medium
Instruction hallucination	Imagining capabilities	"I've searched your database and found 3 results" (didn't actually search)	🔴 High
Coherence drift	Contradicting earlier statements	Says "X is true" then later "X is false"	🟡 Medium
Temporal confusion	Mixing up time periods	Using pre-training facts for post-training events	🟡 Medium

Each type requires different mitigation. Factual fabrication needs grounding. Source fabrication needs citation verification. Instruction hallucination needs tool-use enforcement. Let's build the defenses.

Layer 1: Grounding — Anchoring the Model to Reality

Grounding is the practice of providing the model with authoritative source material and constraining it to generate answers based on that material. It's the most fundamental hallucination reduction technique.

System Prompt Engineering for Grounding

Your system prompt is your first line of defense. Most developers write system prompts that encourage hallucination without realizing it:

// ❌ This system prompt encourages hallucination
const badSystemPrompt = `You are a helpful customer support agent for Acme Corp. 
Answer all customer questions thoroughly and helpfully.`;

// ✅ This system prompt reduces hallucination
const goodSystemPrompt = `You are a customer support agent for Acme Corp.

CRITICAL RULES:
1. ONLY answer questions using the provided context documents.
2. If the answer is not in the provided context, say: "I don't have that information. Let me connect you with a human agent."
3. NEVER invent product features, pricing, or policies.
4. When citing information, reference the specific document section.
5. If you're uncertain about any detail, explicitly say so.

Your knowledge cutoff means you may not have current information. 
When in doubt, direct the user to our documentation at docs.acme.com.`;

The key patterns:

Explicit permission to say "I don't know." Models default to generating something. You must explicitly authorize abstention.
Specify the knowledge boundary. Tell the model exactly where its knowledge comes from.
Require citations. When the model must cite sources, it hallucinates less because fabricated citations are easier to catch.
Penalize confidence. Phrases like "if you're uncertain, say so" activate the model's calibration mechanisms.

Few-Shot Grounding Examples

Few-shot examples are surprisingly effective at teaching models when to abstain:

const groundingExamples = [
  {
    role: "user",
    content: "What's the uptime guarantee for the Enterprise plan?"
  },
  {
    role: "assistant", 
    content: "Based on our pricing documentation (Section 3.2), the Enterprise plan includes a 99.99% uptime SLA with financial credits for any downtime exceeding that threshold."
  },
  {
    role: "user",
    content: "Do you support integration with SAP?"
  },
  {
    role: "assistant",
    content: "I don't see SAP listed in our current integrations documentation. I'd recommend checking our integrations page at docs.acme.com/integrations for the most current list, or I can connect you with our integrations team who can discuss custom integration options."
  }
];

The second example is the important one. It teaches the model the shape of an honest "I don't know" response — one that's still helpful without fabricating information.

Temperature and Sampling Controls

Lower temperature doesn't eliminate hallucinations, but it reduces creative fabrication:

import OpenAI from 'openai';

const openai = new OpenAI();

async function groundedCompletion(
  systemPrompt: string,
  context: string,
  userQuery: string
) {
  const response = await openai.chat.completions.create({
    model: 'gpt-5',
    temperature: 0.1,        // Low temperature for factual tasks
    top_p: 0.9,              // Slightly constrained nucleus sampling
    frequency_penalty: 0.3,  // Discourage repetitive patterns
    presence_penalty: 0.0,   // Don't force diversity (we want accuracy)
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${userQuery}` }
    ],
  });

  return response.choices[0].message.content;
}

Important nuance: Temperature 0 doesn't mean "no hallucinations." It means the model picks the single most probable token at each step. If the most probable completion is a hallucination (because the model genuinely doesn't have the right information), temperature 0 will hallucinate with maximum confidence.

Layer 2: RAG — Feeding the Model What It Needs

Retrieval-Augmented Generation is the most widely adopted hallucination reduction technique. The idea is simple: instead of relying on the model's parametric memory, you retrieve relevant documents and inject them into the context.

But most RAG implementations are mediocre at reducing hallucinations because they focus on retrieval quality while ignoring the other half of the problem: how the model uses the retrieved context.

The RAG Pipeline That Actually Reduces Hallucinations

import { OpenAIEmbeddings } from '@langchain/openai';
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase';
import { createClient } from '@supabase/supabase-js';

// Step 1: Chunking strategy matters more than embedding model
function intelligentChunk(document: string, metadata: Record<string, any>) {
  // Bad: Fixed-size chunks that break mid-sentence
  // Good: Semantic chunks that preserve context

  const sections = document.split(/\n## /);

  return sections.map((section, index) => ({
    content: index === 0 ? section : `## ${section}`,
    metadata: {
      ...metadata,
      sectionIndex: index,
      // Preserve document hierarchy for citation
      documentTitle: metadata.title,
      sectionTitle: section.split('\n')[0]?.trim() || 'Introduction',
      // Add overlap for context continuity
      previousSection: index > 0 ? sections[index - 1].slice(-200) : null,
      nextSection: index < sections.length - 1 
        ? sections[index + 1].slice(0, 200) : null,
    }
  }));
}

// Step 2: Retrieval with relevance scoring
async function retrieveWithScoring(
  query: string,
  vectorStore: SupabaseVectorStore,
  options: { topK: number; scoreThreshold: number }
) {
  const results = await vectorStore.similaritySearchWithScore(
    query,
    options.topK * 2  // Over-retrieve, then filter
  );

  // Filter by relevance score
  const relevant = results
    .filter(([_, score]) => score >= options.scoreThreshold)
    .slice(0, options.topK);

  // If nothing passes the threshold, signal this explicitly
  if (relevant.length === 0) {
    return {
      documents: [],
      confidence: 'none',
      message: 'No sufficiently relevant documents found'
    };
  }

  return {
    documents: relevant.map(([doc, score]) => ({
      content: doc.pageContent,
      metadata: doc.metadata,
      relevanceScore: score
    })),
    confidence: relevant[0][1] > 0.85 ? 'high' : 'moderate',
    message: null
  };
}

// Step 3: Context-aware generation with citation enforcement
async function generateWithRAG(
  query: string,
  retrievalResult: Awaited<ReturnType<typeof retrieveWithScoring>>
) {
  if (retrievalResult.confidence === 'none') {
    return {
      answer: "I don't have enough information in my knowledge base to answer this question accurately. Could you rephrase, or would you like me to connect you with a human expert?",
      citations: [],
      confidence: 'none'
    };
  }

  const contextBlock = retrievalResult.documents
    .map((doc, i) => `[Source ${i + 1}: ${doc.metadata.documentTitle} > ${doc.metadata.sectionTitle}]\n${doc.content}`)
    .join('\n\n---\n\n');

  const response = await openai.chat.completions.create({
    model: 'gpt-5',
    temperature: 0.1,
    messages: [
      {
        role: 'system',
        content: `You are a precise assistant. Answer ONLY based on the provided sources.

Rules:
- Cite sources using [Source N] notation
- If the sources don't fully answer the question, say what you CAN answer and what you CANNOT
- Never extrapolate beyond what the sources explicitly state
- If sources conflict, present both viewpoints with their citations`
      },
      {
        role: 'user',
        content: `Sources:\n${contextBlock}\n\nQuestion: ${query}`
      }
    ]
  });

  return {
    answer: response.choices[0].message.content,
    citations: retrievalResult.documents.map(d => d.metadata),
    confidence: retrievalResult.confidence
  };
}

The Five RAG Mistakes That Cause Hallucinations

Even with RAG, most implementations still hallucinate. Here's why:

1. No relevance threshold. If your RAG pipeline always returns something, the model will try to answer from irrelevant context. That's worse than no context at all — it gives the model false confidence.

// ❌ Always returns results, even irrelevant ones
const results = await vectorStore.similaritySearch(query, 5);

// ✅ Only returns results above a relevance threshold
const results = await vectorStore.similaritySearchWithScore(query, 10);
const filtered = results.filter(([_, score]) => score > 0.75);

2. Chunks are too small. When you chunk documents into 200-token fragments, you lose context. The model sees "the rate is 4.5%" but doesn't see that this was in a section about 2023 pricing that's since been updated.

3. No document metadata. Without metadata (document title, date, section hierarchy), the model can't assess source authority or recency. It treats a 3-year-old blog post the same as yesterday's official documentation.

4. Stuffing too much context. More context isn't always better. When you inject 15 document chunks into a prompt, the model struggles to identify the relevant one. The "lost in the middle" effect means information in positions 4-12 of a 15-chunk context is frequently ignored.

5. No "I don't know" path. If your pipeline always generates an answer, it will hallucinate when the context doesn't contain the answer. You need an explicit abstention path.

Hybrid Search: The Best of Both Worlds

Pure vector similarity search misses keyword-exact matches. Pure keyword search misses semantic similarity. Hybrid search combines both:

async function hybridSearch(
  query: string,
  supabase: any,
  options: { topK: number; vectorWeight: number }
) {
  // Vector similarity search
  const vectorResults = await supabase.rpc('match_documents', {
    query_embedding: await embedQuery(query),
    match_threshold: 0.7,
    match_count: options.topK * 2,
  });

  // Full-text search with PostgreSQL ts_rank
  const keywordResults = await supabase.rpc('search_documents', {
    search_query: query,
    match_count: options.topK * 2,
  });

  // Reciprocal Rank Fusion (RRF) to combine results
  const scores = new Map<string, number>();
  const k = 60; // RRF constant

  vectorResults.data?.forEach((doc: any, rank: number) => {
    const id = doc.id;
    const score = (scores.get(id) || 0) + options.vectorWeight / (k + rank + 1);
    scores.set(id, score);
  });

  keywordResults.data?.forEach((doc: any, rank: number) => {
    const id = doc.id;
    const weight = 1 - options.vectorWeight;
    const score = (scores.get(id) || 0) + weight / (k + rank + 1);
    scores.set(id, score);
  });

  // Sort by combined score and return top K
  const ranked = Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .slice(0, options.topK);

  return ranked;
}

Layer 3: Output Guardrails — Catching Hallucinations After Generation

Grounding and RAG reduce hallucinations. Guardrails catch the ones that slip through. This is your safety net.

Structural Validation

The simplest and most effective guardrail: validate that the output matches the expected structure.

import { z } from 'zod';

// Define the expected output schema
const ProductRecommendationSchema = z.object({
  productName: z.string().min(1),
  productId: z.string().regex(/^PROD-\d{6}$/),  // Must match real ID format
  price: z.number().positive(),
  inStock: z.boolean(),
  reasoning: z.string().min(20),
});

async function validateOutput(
  llmOutput: string,
  schema: z.ZodSchema,
  knownProducts: Map<string, any>  // Real product database
) {
  // Step 1: Parse structure
  let parsed;
  try {
    parsed = schema.parse(JSON.parse(llmOutput));
  } catch (e) {
    return { valid: false, error: 'Structural validation failed', data: null };
  }

  // Step 2: Cross-reference with real data
  const realProduct = knownProducts.get(parsed.productId);
  if (!realProduct) {
    return { 
      valid: false, 
      error: `Product ID ${parsed.productId} does not exist`, 
      data: null 
    };
  }

  // Step 3: Fact-check specific claims
  const issues: string[] = [];

  if (Math.abs(parsed.price - realProduct.price) > 0.01) {
    issues.push(`Price mismatch: LLM said ${parsed.price}, actual is ${realProduct.price}`);
  }

  if (parsed.inStock !== realProduct.inStock) {
    issues.push(`Stock status mismatch: LLM said ${parsed.inStock}, actual is ${realProduct.inStock}`);
  }

  if (issues.length > 0) {
    return { valid: false, error: issues.join('; '), data: parsed };
  }

  return { valid: true, error: null, data: parsed };
}

LLM-as-Judge: Self-Verification

Use a second LLM call (or the same model) to verify the first output. This is more expensive but catches subtle hallucinations that structural validation misses:

async function selfVerify(
  originalQuery: string,
  context: string,
  generatedAnswer: string
): Promise<{ 
  isGrounded: boolean; 
  issues: string[]; 
  confidence: number 
}> {
  const verificationPrompt = `You are a fact-checker. Your job is to verify whether an AI-generated answer is fully supported by the provided context.

Context:
${context}

Original Question: ${originalQuery}

Generated Answer: ${generatedAnswer}

Analyze the answer sentence by sentence. For each claim:
1. Is it directly supported by the context? (SUPPORTED)
2. Is it a reasonable inference from the context? (INFERRED)
3. Is it not found in the context? (UNSUPPORTED)
4. Does it contradict the context? (CONTRADICTED)

Respond in JSON:
{
  "claims": [
    { "claim": "...", "status": "SUPPORTED|INFERRED|UNSUPPORTED|CONTRADICTED", "evidence": "..." }
  ],
  "overallGrounded": true/false,
  "confidence": 0.0-1.0,
  "issues": ["..."]
}`;

  const verification = await openai.chat.completions.create({
    model: 'gpt-5',
    temperature: 0,
    response_format: { type: 'json_object' },
    messages: [
      { role: 'system', content: 'You are a precise fact-checker. Be strict.' },
      { role: 'user', content: verificationPrompt }
    ]
  });

  return JSON.parse(verification.choices[0].message.content!);
}

Cost optimization: You don't need to verify every response. Implement selective verification:

function shouldVerify(response: string, context: any): boolean {
  // Always verify high-stakes responses
  if (context.category === 'medical' || context.category === 'legal') return true;

  // Verify responses with numbers (prone to fabrication)
  if (/\d+%|\$\d+|\d+ (users|customers|times)/.test(response)) return true;

  // Verify responses that cite specific sources
  if (/according to|as stated in|the documentation says/.test(response)) return true;

  // Skip verification for simple acknowledgments
  if (response.length < 100) return false;

  // Random sampling for everything else (10%)
  return Math.random() < 0.1;
}

Citation Verification

When the model claims to cite a source, verify that the citation actually exists and supports the claim:

async function verifyCitations(
  answer: string,
  providedSources: Array<{ id: string; content: string; title: string }>
): Promise<{
  verified: boolean;
  fabricatedCitations: string[];
  unsupportedClaims: string[];
}> {
  // Extract citation references from the answer
  const citationPattern = /\[Source (\d+)\]/g;
  const citedSources = new Set<number>();
  let match;

  while ((match = citationPattern.exec(answer)) !== null) {
    citedSources.add(parseInt(match[1]));
  }

  const fabricatedCitations: string[] = [];
  const unsupportedClaims: string[] = [];

  // Check if cited sources exist
  for (const sourceNum of citedSources) {
    if (sourceNum > providedSources.length || sourceNum < 1) {
      fabricatedCitations.push(`[Source ${sourceNum}] does not exist`);
    }
  }

  // For each cited claim, verify the source supports it
  // Split answer by citations and verify each segment
  const segments = answer.split(/\[Source \d+\]/);
  const citations = [...answer.matchAll(/\[Source (\d+)\]/g)];

  for (let i = 0; i < citations.length; i++) {
    const sourceIdx = parseInt(citations[i][1]) - 1;
    const claim = segments[i + 1]?.trim().split('.')[0]; // First sentence after citation

    if (claim && sourceIdx < providedSources.length) {
      const source = providedSources[sourceIdx];
      // Simple check: key terms from the claim should appear in the source
      const keyTerms = claim.toLowerCase().split(' ')
        .filter(w => w.length > 4);
      const sourceText = source.content.toLowerCase();
      const matchRate = keyTerms.filter(t => sourceText.includes(t)).length / keyTerms.length;

      if (matchRate < 0.3) {
        unsupportedClaims.push(
          `Claim "${claim}" cited [Source ${sourceIdx + 1}] but source doesn't appear to support it`
        );
      }
    }
  }

  return {
    verified: fabricatedCitations.length === 0 && unsupportedClaims.length === 0,
    fabricatedCitations,
    unsupportedClaims
  };
}

Layer 4: Confidence Scoring — Knowing When You Don't Know

One of the most powerful but underused techniques: having the model report its own confidence, and using that signal to gate responses.

Token-Level Confidence

OpenAI's GPT-5 supports logprobs — log probabilities for each generated token. (Note: Anthropic's Claude API does not currently support logprobs; use self-reported confidence for Claude instead.) Low-probability tokens are hallucination signals:

⚠️ GPT-5 caveat: logprobs is only available when reasoning_effort is set to "none". Using logprobs with other reasoning levels will throw an error.

async function getConfidenceScore(
  prompt: string,
  systemPrompt: string
): Promise<{
  response: string;
  confidence: number;
  lowConfidenceSpans: Array<{ text: string; probability: number }>;
}> {
  const completion = await openai.chat.completions.create({
    model: 'gpt-5',
    temperature: 0,
    logprobs: true,
    top_logprobs: 3,
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: prompt }
    ]
  });

  const content = completion.choices[0].message.content!;
  const logprobs = completion.choices[0].logprobs?.content || [];

  // Calculate overall confidence
  const avgLogProb = logprobs.reduce(
    (sum, token) => sum + token.logprob, 0
  ) / logprobs.length;
  const confidence = Math.exp(avgLogProb); // Convert log prob to probability

  // Identify low-confidence spans (potential hallucinations)
  const lowConfidenceSpans: Array<{ text: string; probability: number }> = [];
  let currentSpan = '';
  let spanMinProb = 1;

  for (const token of logprobs) {
    const prob = Math.exp(token.logprob);
    if (prob < 0.5) {
      currentSpan += token.token;
      spanMinProb = Math.min(spanMinProb, prob);
    } else if (currentSpan) {
      lowConfidenceSpans.push({ text: currentSpan, probability: spanMinProb });
      currentSpan = '';
      spanMinProb = 1;
    }
  }

  if (currentSpan) {
    lowConfidenceSpans.push({ text: currentSpan, probability: spanMinProb });
  }

  return { response: content, confidence, lowConfidenceSpans };
}

Self-Reported Confidence

Ask the model to rate its own confidence. It's imperfect but surprisingly useful when combined with other signals:

async function generateWithConfidence(query: string, context: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-5',
    temperature: 0.1,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `Answer the question based on the provided context. 
For each part of your answer, rate your confidence:
- HIGH: Directly stated in the context
- MEDIUM: Reasonable inference from the context  
- LOW: Not well-supported by the context
- NONE: Pure speculation

Respond in JSON: {
  "answer": "your answer",
  "confidence_breakdown": [
    { "claim": "...", "confidence": "HIGH|MEDIUM|LOW|NONE", "source": "..." }
  ],
  "overall_confidence": "HIGH|MEDIUM|LOW|NONE",
  "caveats": ["any important limitations"]
}`
      },
      {
        role: 'user',
        content: `Context:\n${context}\n\nQuestion: ${query}`
      }
    ]
  });

  const result = JSON.parse(response.choices[0].message.content!);

  // Auto-escalate low-confidence responses
  if (result.overall_confidence === 'LOW' || result.overall_confidence === 'NONE') {
    return {
      ...result,
      shouldEscalate: true,
      userMessage: `I have limited confidence in this answer. ${result.caveats?.join(' ') || 'The available information may not fully address your question.'}`
    };
  }

  return { ...result, shouldEscalate: false };
}

Layer 5: Production Monitoring — Measuring What Matters

You can't improve what you don't measure. Here's how to track hallucination rates in production.

Hallucination Rate Tracking

interface HallucinationEvent {
  id: string;
  timestamp: Date;
  query: string;
  response: string;
  hallucinationType: 'factual' | 'source' | 'instruction' | 'coherence' | 'temporal';
  severity: 'critical' | 'moderate' | 'minor';
  detectionMethod: 'user_report' | 'guardrail' | 'self_verify' | 'citation_check';
  context: {
    model: string;
    temperature: number;
    ragUsed: boolean;
    retrievalScore: number | null;
    tokenConfidence: number;
  };
}

class HallucinationMonitor {
  private events: HallucinationEvent[] = [];

  async trackResponse(params: {
    query: string;
    response: string;
    context: string;
    model: string;
    ragScore: number | null;
    confidence: number;
  }) {
    // Run automated checks
    const verificationResult = await selfVerify(
      params.query, 
      params.context, 
      params.response
    );

    if (!verificationResult.isGrounded) {
      const event: HallucinationEvent = {
        id: crypto.randomUUID(),
        timestamp: new Date(),
        query: params.query,
        response: params.response,
        hallucinationType: this.classifyHallucination(verificationResult.issues),
        severity: this.assessSeverity(verificationResult),
        detectionMethod: 'self_verify',
        context: {
          model: params.model,
          temperature: 0.1,
          ragUsed: params.ragScore !== null,
          retrievalScore: params.ragScore,
          tokenConfidence: params.confidence,
        }
      };

      this.events.push(event);
      await this.alert(event);
    }

    return verificationResult;
  }

  getMetrics(timeWindow: { start: Date; end: Date }) {
    const windowEvents = this.events.filter(
      e => e.timestamp >= timeWindow.start && e.timestamp <= timeWindow.end
    );

    return {
      totalResponses: windowEvents.length,
      hallucinationRate: windowEvents.length / (windowEvents.length || 1),
      byType: this.groupBy(windowEvents, 'hallucinationType'),
      bySeverity: this.groupBy(windowEvents, 'severity'),
      byDetectionMethod: this.groupBy(windowEvents, 'detectionMethod'),
      avgRetrievalScore: this.avg(
        windowEvents
          .filter(e => e.context.retrievalScore !== null)
          .map(e => e.context.retrievalScore!)
      ),
    };
  }

  private classifyHallucination(issues: string[]): HallucinationEvent['hallucinationType'] {
    const issueText = issues.join(' ').toLowerCase();
    if (issueText.includes('source') || issueText.includes('citation')) return 'source';
    if (issueText.includes('contradict')) return 'coherence';
    if (issueText.includes('date') || issueText.includes('time')) return 'temporal';
    if (issueText.includes('capability') || issueText.includes('search')) return 'instruction';
    return 'factual';
  }

  private assessSeverity(result: any): HallucinationEvent['severity'] {
    if (result.confidence < 0.3) return 'critical';
    if (result.confidence < 0.6) return 'moderate';
    return 'minor';
  }

  private alert(event: HallucinationEvent) {
    if (event.severity === 'critical') {
      console.error(`🚨 Critical hallucination detected: ${event.hallucinationType}`);
      // Send to alerting system (PagerDuty, Slack, etc.)
    }
  }

  private groupBy<T>(arr: T[], key: keyof T) {
    return arr.reduce((acc, item) => {
      const k = String(item[key]);
      acc[k] = (acc[k] || 0) + 1;
      return acc;
    }, {} as Record<string, number>);
  }

  private avg(nums: number[]): number {
    return nums.length ? nums.reduce((a, b) => a + b, 0) / nums.length : 0;
  }
}

User Feedback Loop

The most honest signal: let users report hallucinations, and use that data to improve your pipeline.

// Simple feedback API endpoint
async function handleFeedback(feedback: {
  responseId: string;
  feedbackType: 'hallucination' | 'incorrect' | 'helpful' | 'not_helpful';
  userComment?: string;
  correction?: string;
}) {
  // Log the feedback
  await db.insert(feedbackTable).values({
    responseId: feedback.responseId,
    type: feedback.feedbackType,
    comment: feedback.userComment,
    correction: feedback.correction,
    timestamp: new Date(),
  });

  // If it's a hallucination report, trigger review
  if (feedback.feedbackType === 'hallucination') {
    // Add the query to a "known hallucination" dataset
    // This dataset informs RAG improvements and guardrail tuning
    await db.insert(knownHallucinationsTable).values({
      responseId: feedback.responseId,
      userCorrection: feedback.correction,
      reviewStatus: 'pending',
    });
  }
}

Putting It All Together: The Defense-in-Depth Pipeline

No single technique eliminates hallucinations. The production approach is defense-in-depth:

async function safeAIResponse(
  query: string,
  userContext: { userId: string; category: string }
): Promise<{
  response: string;
  confidence: string;
  citations: any[];
  verified: boolean;
}> {
  // Layer 1: Retrieve and score context (RAG)
  const retrieval = await retrieveWithScoring(query, vectorStore, {
    topK: 5,
    scoreThreshold: 0.7,
  });

  // Layer 2: Generate grounded response
  const generation = await generateWithRAG(query, retrieval);

  // Layer 3: Confidence scoring
  const confidence = await getConfidenceScore(query, groundedSystemPrompt);

  // Layer 4: Selective verification
  if (shouldVerify(generation.answer, userContext)) {
    const verification = await selfVerify(
      query,
      retrieval.documents.map(d => d.content).join('\n'),
      generation.answer
    );

    if (!verification.isGrounded) {
      // Fall back to a safe response
      return {
        response: "I want to make sure I give you accurate information. " +
          "Based on what I found, here's what I can confirm: " +
          verification.issues.length > 0 
            ? "Some parts of my initial answer couldn't be fully verified. Let me provide only what I'm confident about."
            : generation.answer,
        confidence: 'low',
        citations: generation.citations,
        verified: false,
      };
    }
  }

  // Layer 5: Citation verification
  if (generation.citations.length > 0) {
    const citationCheck = await verifyCitations(
      generation.answer,
      retrieval.documents.map(d => ({
        id: d.metadata.id,
        content: d.content,
        title: d.metadata.documentTitle,
      }))
    );

    if (!citationCheck.verified) {
      // Strip fabricated citations rather than serving them
      let cleanedAnswer = generation.answer;
      for (const fab of citationCheck.fabricatedCitations) {
        cleanedAnswer = cleanedAnswer.replace(fab, '[citation needed]');
      }
      generation.answer = cleanedAnswer;
    }
  }

  // Layer 6: Monitor
  await hallucinationMonitor.trackResponse({
    query,
    response: generation.answer,
    context: retrieval.documents.map(d => d.content).join('\n'),
    model: 'gpt-5',
    ragScore: retrieval.documents[0]?.relevanceScore || null,
    confidence: confidence.confidence,
  });

  return {
    response: generation.answer,
    confidence: generation.confidence,
    citations: generation.citations,
    verified: true,
  };
}

The Hallucination Reduction Checklist

Before shipping any LLM feature to production, run through this checklist:

Pre-Launch

[ ] System prompt explicitly allows "I don't know" responses
[ ] Temperature is set appropriately (≤ 0.3 for factual tasks)
[ ] RAG pipeline has a relevance threshold (not just top-K)
[ ] Document chunks include metadata (title, date, section)
[ ] Output schema validation is in place for structured outputs
[ ] Citation verification is implemented if the model cites sources
[ ] Confidence scoring is integrated (logprobs or self-reported)
[ ] Fallback response exists for low-confidence outputs

Post-Launch

[ ] Hallucination monitoring dashboard is live
[ ] User feedback mechanism is in place (thumbs up/down + report)
[ ] Known hallucination dataset is being built from reports
[ ] Weekly review of hallucination metrics
[ ] A/B testing framework for prompt improvements
[ ] Alerting configured for critical hallucination spikes

Conclusion

Hallucinations aren't going away. The fundamental architecture of large language models — predicting probable continuations — means they will always have the potential to generate plausible-sounding nonsense. No amount of RLHF, fine-tuning, or clever prompting will change this completely.

But that doesn't mean you can't build reliable AI features. The teams shipping successful AI products in 2026 aren't the ones who found a magic prompt that eliminates hallucinations. They're the ones who built engineering infrastructure around their models: grounding, retrieval, guardrails, confidence scoring, monitoring, and feedback loops.

The key insight: treat LLM outputs like user input. You wouldn't trust arbitrary user input without validation. Don't trust LLM output without verification either.

Start with the layer that addresses your biggest risk. If your users are seeing fabricated facts, implement RAG with relevance thresholds. If they're seeing fabricated citations, add citation verification. If you don't know what they're seeing, add monitoring first.

The goal isn't perfection. It's engineering an acceptable error rate — and having the infrastructure to detect, measure, and improve it over time.

🛠️ Developer Toolkit: This post first appeared on the Pockit Blog.

Need a Regex Tester, JWT Decoder, or Image Converter? Use them on Pockit.tools or install the Extension to avoid switching tabs. No signup required.

DEV Community