Introduction
Generative AI has captured the tech world's attention over the past year. As more use cases are evaluated for production implementation, a prevalent challenge we encounter is hallucinations—confident but incorrect answers from large language models (LLMs).
In this blog, we'll share our real-time experiences with this issue and the various approaches we used to mitigate it.
Usecase Details
Our use case involved validating image specifications (Aspect ratio, Dots per Inch (DPI) , ADA compliance, etc.) for a set of images in an automated and foolproof manner. The goal was to use a Generative AI solution to automate all these manual QA validations typically performed on images.
Model output and Hallucinations
LLM Details
Gemini 1.5 Flash-002
Temperature: 0
Max Token: 8000
Request
Response
As you can see the output from Gemini is that Aspect Ratio for the given image is 16:9
Now we will make a minor tweak to the prompt and check the response again.
Response
The aspect ratio calculation clearly changes even though the prompt details remain largely the same. Repeated requests with this updated prompt also resulted in varying data.
While this is not unexpected from multimodal models, given their inherently non-deterministic nature, it is surprising to see considerable deviations in output despite minimal changes in the prompt.
Mitigation Approaches
1) Switching to a more appropriate model, Gemini Pro results were more consistent for this usecase.
2)Using non-Gen AI options for more definitive answers can be beneficial. While this may not always be feasible, in our case, we were able to utilize the Google Vision API, which provides image dimensions that can be used to calculate the aspect ratio. The result from this API was consistent.
Google Vision API
**Request **
curl --location 'https://vision.googleapis.com/v1/images:annotate' \
--header 'Authorization: Bearer xrtCDXeO80SlvKKWj6sLgdNy436sb7USzJ2moOwTBZeG6gMxZpSV0w6JbEwXuyxGmtqoIWRaCgYKAeUSARMSFQHGX2Mi2sQxwgSv1wvrw88IXGejbA0427' \
--header 'x-goog-user-project: gcp-sample-02' \
--header 'Content-Type: application/json; charset=utf-8' \
--data '{
"requests": [
{
"features": [
{
"maxResults": 10,
"type": "CROP_HINTS"
}
],
"image": {
"source": {
"imageUri": "https://images.xyz-abc.com/is/image/123/french-bread-pizza-bundle-image-web"
}
}
}
]
}
**Response**
{
"responses": [
{
"cropHintsAnnotation": {
"cropHints": [
{
"boundingPoly": {
"vertices": [
{
"x": 20
},
{
"x": 324
},
{
"x": 324,
"y": 203
},
{
"x": 20,
"y": 203
}
]
},
"confidence": 0.6875,
"importanceFraction": 0.58285713
}
]
}
}
]
}
Aspect Ratio = (x2 -x1)/(y2-y1) = (324-20)/(203-0)=1.49 (Approx)
Conclusion
It was fascinating to observe how a minor prompt variation can lead to significant output changes from an LLM perspective. Hallucinations remain an active research problem in the Gen AI space. While there are many ways to minimize them (such as tuning model parameters, using the Retrieval-Augmented Generation (RAG) approach, adding guardrails etc), it's important to understand that Gen AI models are inherently non-deterministic by design and will never be completely foolproof against hallucinations. If a solution requires 100% hallucination-proof results, non-Gen AI approaches should also be considered.
Thank you for reading this blog. I hope these live project experiences will help you build your Gen AI solution to be more hallucination-free, if not entirely foolproof! 😊
Top comments (0)