DEV Community

KumarAtGIT
KumarAtGIT

Posted on

Gen AI Learnings : Hallucinations and your options

Introduction

Generative AI has captured the tech world's attention over the past year. As more use cases are evaluated for production implementation, a prevalent challenge we encounter is hallucinations—confident but incorrect answers from large language models (LLMs).

In this blog, we'll share our real-time experiences with this issue and the various approaches we used to mitigate it.

Usecase Details

Our use case involved validating image specifications (Aspect ratio, Dots per Inch (DPI) , ADA compliance, etc.) for a set of images in an automated and foolproof manner. The goal was to use a Generative AI solution to automate all these manual QA validations typically performed on images.

Model output and Hallucinations

LLM Details

Gemini 1.5 Flash-002
Temperature: 0
Max Token: 8000

Request

Gemini_Hallucination_UseCase_Request_1

Response

Gemini_Hallucination_UseCase_Response_1

As you can see the output from Gemini is that Aspect Ratio for the given image is 16:9

Now we will make a minor tweak to the prompt and check the response again.

Response

Gemini_Hallucination_UseCase_Request_2

The aspect ratio calculation clearly changes even though the prompt details remain largely the same. Repeated requests with this updated prompt also resulted in varying data.

While this is not unexpected from multimodal models, given their inherently non-deterministic nature, it is surprising to see considerable deviations in output despite minimal changes in the prompt.

Mitigation Approaches

1) Switching to a more appropriate model, Gemini Pro results were more consistent for this usecase.

Gemini_Hallucination_UseCase_Request_2

2)Using non-Gen AI options for more definitive answers can be beneficial. While this may not always be feasible, in our case, we were able to utilize the Google Vision API, which provides image dimensions that can be used to calculate the aspect ratio. The result from this API was consistent.

Google Vision API

**Request **

curl --location 'https://vision.googleapis.com/v1/images:annotate' \
--header 'Authorization: Bearer xrtCDXeO80SlvKKWj6sLgdNy436sb7USzJ2moOwTBZeG6gMxZpSV0w6JbEwXuyxGmtqoIWRaCgYKAeUSARMSFQHGX2Mi2sQxwgSv1wvrw88IXGejbA0427' \
--header 'x-goog-user-project: gcp-sample-02' \
--header 'Content-Type: application/json; charset=utf-8' \
--data '{
  "requests": [
    {
      "features": [
        {
          "maxResults": 10,
          "type": "CROP_HINTS"
        }
      ],
      "image": {
        "source": {
          "imageUri": "https://images.xyz-abc.com/is/image/123/french-bread-pizza-bundle-image-web"
        }
      }
    }
  ]
}

**Response**
{
    "responses": [
        {
            "cropHintsAnnotation": {
                "cropHints": [
                    {
                        "boundingPoly": {
                            "vertices": [
                                {
                                    "x": 20
                                },
                                {
                                    "x": 324
                                },
                                {
                                    "x": 324,
                                    "y": 203
                                },
                                {
                                    "x": 20,
                                    "y": 203
                                }
                            ]
                        },
                        "confidence": 0.6875,
                        "importanceFraction": 0.58285713
                    }
                ]
            }
        }
    ]
}

Aspect Ratio = (x2 -x1)/(y2-y1) = (324-20)/(203-0)=1.49 (Approx)
Enter fullscreen mode Exit fullscreen mode

Conclusion

It was fascinating to observe how a minor prompt variation can lead to significant output changes from an LLM perspective. Hallucinations remain an active research problem in the Gen AI space. While there are many ways to minimize them (such as tuning model parameters, using the Retrieval-Augmented Generation (RAG) approach, adding guardrails etc), it's important to understand that Gen AI models are inherently non-deterministic by design and will never be completely foolproof against hallucinations. If a solution requires 100% hallucination-proof results, non-Gen AI approaches should also be considered.

Thank you for reading this blog. I hope these live project experiences will help you build your Gen AI solution to be more hallucination-free, if not entirely foolproof! 😊

API Trace View

Struggling with slow API calls? 👀

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more