DEV Community

Cover image for How to Detect AI Hallucinations
vishalmysore
vishalmysore

Posted on

How to Detect AI Hallucinations

Code for this article is available here and here , but as always I would suggest you to please read the full article for better understanding.

Image description

Overview 🤔

In Indian philosophy, "maya" refers to the illusory nature of the world, which hides the true essence of reality. Similarly, "mithya" describes phenomena that are thought as real but does not have ultimate truth or substance. These philosophical concepts force us to question the validity of our perceptions and experiences, highlighting the potential for deception and illusion in our understanding of the world.

I personally think AI hallucinations as manifestations of maya or mithya within AI systems. Just as maya hides the true nature of reality, AI hallucinations distrort the accurate representation of reality within the domain of AI. These hallucinations occur when AI systems generate outputs that differ from reality or are inconsistent with the data they have been trained on.

Hallucinations in AI are not merely about the AI providing incorrect information; it's deeper than that. It's about the AI genuinely believing that the information it's presenting is true and actively trying to convince you of its truthfulness. This adds a layer of complexity, as the AI's confidence in its hallucination can be persuasive, leading users to trust information that may not align with reality or may start questioning their own expertise in the subject matter.

Image description

Dream Vs Reality 💭

In the story of the butterfly dream, the ancient Chinese sage and philosopher Zhuangzi contemplated whether he was a man dreaming of being a butterfly or a butterfly dreaming of being a man. Similarly, in the realm of artificial intelligence, there exists a similar ambiguity. What an AI system perceives as reality may appear as a hallucination or dream to us, while our interpretation of reality could be perceived as a hallucination to the system. These parallels highlight the fluidity of perception and the challenge of distinguishing between what is real and what is perceived in both human consciousness and artificial intelligence.

To checkout the Hallucination leaderboard click here

Now lets jump to some practical ways to detect and remove hallucinations from AI responses. I have analyzed 4 different libraries and tools for my cookGPT project for the same.

SelfCheckGPT

SelfCheckGPT is a method designed for detecting hallucinations in Large Language Models (LLMs) without the need for external resources. It evaluates the consistency of responses generated by LLMs to determine if the information provided is factual or not. SelfCheckGPT outperforms other methods and serves as a strong baseline for assessing the reliability of LLM-generated text. It works by comparing multiple responses generated by a Large Language Model (LLM) in response to a query. It measures the consistency between these responses to determine if the information provided is factual or hallucinated. By sampling multiple responses, SelfCheckGPT can identify inconsistencies and contradictions, indicating potential hallucinations in the generated text. This method does not require external databases and can be used for black-box models, making it a versatile tool for detecting unreliable information generated by LLMs.
One of the main features of SelfCheckGPT is MQAG which stands for Multiple-choice Question Answering and Generation. It evaluates information consistency between source and summary using multiple-choice questions. Consists of question generation stage, statistical distance analysis, and answerability threshold setting. Uses total variation as the main statistical distance for comparison. Provides a novel approach to assess information content in summaries through question answering.
Comparing Multiple Responses: SelfCheckGPT compares multiple responses generated by an LLM to measure consistency and identify potential hallucinations.
Sampling Responses: By sampling multiple responses, SelfCheckGPT can detect inconsistencies and contradictions in the generated text.
Utilizing Question Answering: SelfCheckGPT incorporates question answering to assess the consistency of information by generating multiple-choice questions and evaluating the answers.
Entropy-based Metrics: It uses entropy-based metrics to analyze the probability distribution of words in the generated text, providing insights into the reliability of the information.
Zero-resource Approach: SelfCheckGPT is a zero-resource approach that does not rely on external databases, making it applicable to black-box LLMs ( and that is the exact reason I like it )

Image description

At this point I would slightly deviate form the original topic and go bit deeper into RAG as I am using SelfCheckGPT to build my RAG integration patterns as well
As more systems are adopting the Retrieval Augmented Generation (RAG) approach, it becomes crucial to establish integration patterns for building complex AI systems. With the growing number of collections on vector databases, determining the appropriate collection for a user query is essential. I propose three approaches for this, starting with a RAG Router.

RAG-Router
This router would direct the query to a specific RAG document based on self-evaluation by SelfCheckGPT. The SelfCheckGPT would continuously question itself to determine which RAG document is most relevant for the query at hand. This dynamic routing approach ensures that the user query is directed to the most appropriate RAG document, enhancing the efficiency and accuracy of the system.

question = "What is the main ingredient of the recipe?"
options  = ['Paneer', 'Chicken', 'Potato', 'Eggplant']
Enter fullscreen mode Exit fullscreen mode
questions = [{'question': question, 'options': options}]
probs = mqag_model.answer(questions=questions, context=document)
print(probs[0])
Enter fullscreen mode Exit fullscreen mode

Based on the probability it will route the question to specific RAG

Image description
RAG-Branching
In cases where a query belongs to two documents or two documents have nearly equal probability, the router will send the query to both documents in the vector database. Subsequently, it will retrieve the results from both documents and merge them before sending the merged result to the Large Language Model (LLM). This approach ensures comprehensive coverage and maximizes the chances of providing relevant information to the user, even in scenarios where the query overlaps multiple documents or where document probabilities are closely matched

Image description

et's consider a scenario where a user submits a query for a recipe that could potentially belong to multiple categories or cuisines. For example, the user might input the query "paneer tikka masala recipe."
In this case, the RAG router would analyze the query and identify multiple relevant documents in the vector database, such as "North Indian Cuisine," "Paneer Recipes," and "Tikka Masala Variations." If these documents have similar probabilities or if the query overlaps with multiple categories, the router will send the query to all relevant documents.
Subsequently, CookGPT will retrieve recipe suggestions from each of these documents independently. For instance, the "North Indian Cuisine" document might provide a traditional paneer tikka masala recipe, while the "Paneer Recipes" document could offer a variety of paneer-based dishes, including paneer tikka masala. Similarly, the "Tikka Masala Variations" document might present alternative versions of tikka masala recipes.
Once CookGPT retrieves the results from all relevant documents, it will merge the recipe suggestions before presenting them to the user. This ensures that the user receives a comprehensive list of recipe options, encompassing various cuisines and variations, thus enhancing their cooking experience with diverse and authentic Indian dishes.

RAG-Chaining
In RAG chaining, the router operates sequentially. After receiving an answer from one RAG document, it evaluates whether any other RAG documents need to be queried as well. The router then chains the response from the first RAG document to the subsequent one, continuing this process until all relevant documents have been queried.

Image description
Suppose the user asks for recipes for "palak paneer" (a popular Indian main dish) and "gulab jamun" (a traditional Indian dessert).
Main Dish (Palak Paneer):Initially, the router queries the RAG document titled "North Indian Cuisine."Upon receiving a response with a palak paneer recipe from this document, the router evaluates if any other relevant documents need to be consulted. It determines that additional recipes for palak paneer variations may be available in the "Paneer Recipes" document. The router sequentially queries the "Paneer Recipes" RAG document and retrieves any relevant responses.

Dessert (Gulab Jamun):Following the completion of the main dish query, the router moves on to the user's request for a dessert recipe.It starts by querying the RAG document titled "Indian Desserts."Upon obtaining a response with a gulab jamun recipe, the router checks if there are any other dessert variations to explore.It identifies the "Traditional Sweets" RAG document as a potential source for additional dessert recipes.The router then proceeds to query the "Traditional Sweets" document and gathers relevant responses.

Once all relevant RAG documents have been queried and responses obtained for both the main dish and dessert, the router merges the responses and presents the final set of recipes to the Large Language Model (LLM). This sequential chaining approach ensures a systematic exploration of relevant sources for both main dishes and desserts, enabling CookGPT to provide comprehensive and tailored recipe recommendations to the user

Image description

Now lets come back to our original topic "AI Hallucinations" and look at 3 additional tools

RefChecker

RefChecker operates through a 3-stage pipeline: 1. Triplets Extraction: Utilizes LLMs to break down text into knowledge triplets for detailed analysis. 2. Checker Stage: Predicts hallucination labels on the extracted triplets using LLM-based or NLI-based checkers. 3. Aggregation: Combines individual triplet-level results to determine the overall hallucination label for the input text based on predefined rules. Additionally, RefChecker includes a human labeling tool, a search engine for Zero Context settings, and a localization model to map knowledge triplets back to reference snippets for comprehensive analysis.
Triplets in the context of RefChecker refer to knowledge units extracted from text using Large Language Models (LLMs). These triplets consist of three elements that capture essential information from the text. The extraction of triplets helps in finer-grained detection and evaluation of claims by breaking down the original text into structured components for analysis. The triplets play a crucial role in detecting hallucinations and assessing the factual accuracy of claims made by language models.
RefChecker includes support for various Large Language Models (LLMs) that can be used locally for processing and analysis. Some of the popular LLMs supported by RefChecker include GPT4, GPT-3.5-Turbo, InstructGPT, Falcon, Alpaca, LLaMA2, and Claude 2. These models can be utilized within the RefChecker framework for tasks such as response generation, claim extraction, and hallucination detection without the need for external connections to cloud-based services. I did not use it as it requires integration with several other providers or a large GPU for Mistral model. But this looks very promising and In future I will come back to this one (depends on how much I want to spend on GPU for my open source project)

FacTool

FACTOOL is a task and domain-agnostic framework designed to tackle the escalating challenge of factual error detection in generative AI. It is a five-step tool-augmented framework that consists of claim extraction, query generation, tool querying, evidence collection, and verification. FACTOOL uses tools like Google Search, Google Scholar, code interpreters, Python, and even LLMs themselves to detect factual errors in knowledge-based QA, code generation, math problem solving, and scientific literature review writing. It outperforms all other baselines across all scenarios and is shown to be highly robust in performing its specified tasks compared to LLMs themselves.

FactScore

FactScore platform is trying to solve challenge with analyzing long-form text due to the complexity of its content, where each piece of information may be either be true or false. All previous approaches has primarily focused on analyzing text at the sentence level. However, even within a single sentence, there can be a mixture of supported and unsupported facts. This platform right now caters to biographies and data from wiki.

Vectra HHEM
HHEM is another model for detecting Hallucinations , its basically a binary classifier to validate if a summary is factually consistent with the source document. Complete details about this model is here

Image description

*Conclusion *
Even though I liked SelfCheckGPT and will be using it for my CookGPT project, I would highly suggest you to use more than one approach to detect Hallucination depending on the type of application you have ( Finance, Healthcare), AI Hallucinations can be very serious in those applications, What makes these situations even more dangerous is that these glitches can, cast doubt on our confidence in AI systems and, at times, on our own understanding of the subject. What if the answer supplied by the AI system is correct, but I am unable to grasp it? What if others have grasped the answer, and I am the only one who hasn't? Should I raise this issue or not? Will I be judged if I call this out? These questions underscore the complexity and uncertainty surrounding AI hallucinations, prompting us to reconsider our perceptions and interpretations.

Top comments (0)