5 Types of AI Hallucinations (And How to Detect Them)

#ai #machinelearning #agents #discuss

The Many Faces of Hallucination

When developers hear "AI hallucination," they usually picture an LLM confidently making up a completely false fact, like claiming the moon is made of cheese. While this Factual Hallucination is a real problem, it's only one piece of a much larger puzzle.

In production AI agents, there are several other, more subtle types of hallucinations that can be just as damaging. If your evaluation framework only checks for factual errors, you're leaving your application vulnerable.

Here are the five key types of hallucinations you need to be detecting.

Type 1: Factual Hallucination

This is the classic definition. The agent states something as a fact that is verifiably false in the real world.

Example: "The first person to walk on Mars was Neil Armstrong."
Detection: Requires external knowledge validation, often through a search tool or a curated knowledge base.

Type 2: Contextual Hallucination (RAG Failure)

This is particularly dangerous in Retrieval-Augmented Generation (RAG) systems. The agent is given a specific context (like a document or a database query result) and is instructed to answer based only on that context. A contextual hallucination occurs when the agent ignores the context and uses its general knowledge instead.

Example: A user asks, "According to the provided legal document, what is the termination clause?" The agent responds with a generic, boilerplate termination clause instead of the specific one from the document.
Detection: Requires comparing the agent's response directly against the provided context to ensure all claims are supported.

Type 3: Instruction Hallucination

This happens when the agent directly violates one of the core instructions in its system prompt.

Example: The system prompt says, "You are a helpful assistant. You must never be rude to the user." The agent responds to a user's question with, "That's a stupid question."
Detection: Requires parsing the system prompt into a set of rules and programmatically checking the agent's behavior against those rules.

Type 4: Role Hallucination

This is a subtle but important failure where the agent forgets its assigned persona or role.

Example: An agent is designed to be a playful, pirate-themed chatbot for a children's game. Midway through the conversation, it drops the persona and starts speaking like a formal, technical document.
Detection: Requires evaluating the agent's tone, style, and vocabulary against the persona defined in the system prompt.

Type 5: Consistency Hallucination

The agent contradicts itself within the same conversation, showing a lack of stable reasoning.

Example: In the first turn, the agent says, "I cannot access external websites." Three turns later, it says, "I have just checked that website for you."
Detection: Requires analyzing the entire conversation history for logical contradictions.

Why This Matters

Most off-the-shelf evaluation frameworks are only good at catching Type 1 hallucinations. They can't detect when your agent is ignoring its context, violating its instructions, or breaking character. This is why so many agents that perform well in benchmarks fail spectacularly in production.

A robust evaluation strategy must include specific scorers for all five types of hallucinations. You need to analyze the agent's output not just against the real world, but also against its provided context, its system prompt, and its own conversation history.

Noveum.ai's LLM Observability Platform includes dedicated scorers for detecting all five types of hallucinations across your entire agent fleet.

Which type of hallucination do you find most challenging to deal with in your projects? Let's discuss below.

DEV Community