DEV Community: KumarAtGIT

Vector Search Analysis - Google BigQuery vs Azure AI Search

KumarAtGIT — Sun, 09 Feb 2025 18:10:40 +0000

Introduction

As more people embrace generative AI solutions, search technology has become a major focus area. Traditional search methods relied on text matching and some fuzzy logic, but generative AI has introduced vector search capabilities. This new approach significantly enhances traditional methods by adding contextual search, greatly improving the natural language search experience for users.

In this blog, we'll share our live project experience using Google BigQuery's vector search capabilities and Azure AI's search capabilities. We'll discuss our findings on how both tech stacks perform for image-related semantic search and deduplication use cases.

Analysis Details

Test Data

The test data used for this analysis included a set of images and semantically similar images. This data was prepared as part of a human rubric. The idea was to use this reference test data of similar images to evaluate how well the two tech stacks perform in terms of similar search criteria.

Criteria

The criteria used for this analysis included popular search algorithms like Euclidean distance and Cosine similarity. While other algorithms like dot product and HNSW are also available in the vector search domain, we focused on the first two as they are the most widely used for this type of use case.

Euclidean distance (https://en.wikipedia.org/wiki/Euclidean_distance)
Cosine similarity (https://en.wikipedia.org/wiki/Cosine_similarity)
Dot Product (https://en.wikipedia.org/wiki/Dot_product)
HNSW (https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)

Results

We conducted the experiment on over 50 similar images, varying in terms of image quality, composition, theme, number of items, and lifestyle aspects.

Conclusion

We found that for image search criteria, BigQuery-based search was closer to human search comparisons compared to Azure search. While the difference wasn't significant, BigQuery search almost always had a better or closer resemblance to human search results. It's also important to note that both platforms continue to evolve, and these results are subject to change.

Gen AI Learnings : Hallucinations and your options

KumarAtGIT — Wed, 05 Feb 2025 08:07:32 +0000

Introduction

Generative AI has captured the tech world's attention over the past year. As more use cases are evaluated for production implementation, a prevalent challenge we encounter is hallucinations—confident but incorrect answers from large language models (LLMs).

In this blog, we'll share our real-time experiences with this issue and the various approaches we used to mitigate it.

Usecase Details

Our use case involved validating image specifications (Aspect ratio, Dots per Inch (DPI) , ADA compliance, etc.) for a set of images in an automated and foolproof manner. The goal was to use a Generative AI solution to automate all these manual QA validations typically performed on images.

Model output and Hallucinations

LLM Details

Gemini 1.5 Flash-002
Temperature: 0
Max Token: 8000

Request

Response

As you can see the output from Gemini is that Aspect Ratio for the given image is 16:9

Now we will make a minor tweak to the prompt and check the response again.

Response

The aspect ratio calculation clearly changes even though the prompt details remain largely the same. Repeated requests with this updated prompt also resulted in varying data.

While this is not unexpected from multimodal models, given their inherently non-deterministic nature, it is surprising to see considerable deviations in output despite minimal changes in the prompt.

Mitigation Approaches

1) Switching to a more appropriate model, Gemini Pro results were more consistent for this usecase.

2)Using non-Gen AI options for more definitive answers can be beneficial. While this may not always be feasible, in our case, we were able to utilize the Google Vision API, which provides image dimensions that can be used to calculate the aspect ratio. The result from this API was consistent.

Google Vision API

**Request **

curl --location 'https://vision.googleapis.com/v1/images:annotate' \
--header 'Authorization: Bearer xrtCDXeO80SlvKKWj6sLgdNy436sb7USzJ2moOwTBZeG6gMxZpSV0w6JbEwXuyxGmtqoIWRaCgYKAeUSARMSFQHGX2Mi2sQxwgSv1wvrw88IXGejbA0427' \
--header 'x-goog-user-project: gcp-sample-02' \
--header 'Content-Type: application/json; charset=utf-8' \
--data '{
  "requests": [
    {
      "features": [
        {
          "maxResults": 10,
          "type": "CROP_HINTS"
        }
      ],
      "image": {
        "source": {
          "imageUri": "https://images.xyz-abc.com/is/image/123/french-bread-pizza-bundle-image-web"
        }
      }
    }
  ]
}

**Response**
{
    "responses": [
        {
            "cropHintsAnnotation": {
                "cropHints": [
                    {
                        "boundingPoly": {
                            "vertices": [
                                {
                                    "x": 20
                                },
                                {
                                    "x": 324
                                },
                                {
                                    "x": 324,
                                    "y": 203
                                },
                                {
                                    "x": 20,
                                    "y": 203
                                }
                            ]
                        },
                        "confidence": 0.6875,
                        "importanceFraction": 0.58285713
                    }
                ]
            }
        }
    ]
}

Aspect Ratio = (x2 -x1)/(y2-y1) = (324-20)/(203-0)=1.49 (Approx)

Conclusion

It was fascinating to observe how a minor prompt variation can lead to significant output changes from an LLM perspective. Hallucinations remain an active research problem in the Gen AI space. While there are many ways to minimize them (such as tuning model parameters, using the Retrieval-Augmented Generation (RAG) approach, adding guardrails etc), it's important to understand that Gen AI models are inherently non-deterministic by design and will never be completely foolproof against hallucinations. If a solution requires 100% hallucination-proof results, non-Gen AI approaches should also be considered.

Thank you for reading this blog. I hope these live project experiences will help you build your Gen AI solution to be more hallucination-free, if not entirely foolproof! 😊

Priority Processing in Event Driven Architectures: Common Design Patterns

KumarAtGIT — Wed, 05 Feb 2025 06:37:14 +0000

Introduction

Have you ever found yourself working on a design for an Event Driven Architecture (EDA)? You're not alone. More and more solutions are adopting EDA, and one of the challenges you might encounter is how to prioritize messages within such an architecture. It can indeed be a challenge.

Unlike non-EDA systems, where most of the data is readily available, data in EDA systems is almost always changing, making priority processing a complex task. What may be a priority at a given time T might not hold the same importance at time T+1. In these transient systems, the application processing the messages must adapt and implement some form of priority mechanism. While absolute priority processing may not be feasible in real-time event-driven systems, some level of prioritization can certainly be achieved.

This blog aims to share some common design patterns for implementing prioritized processing in EDA systems. It focuses on the consumer side of these architectures, as there are often limited or no controls over how messages are produced in a prioritized manner.

Approach 1: Source Separation: Different Sources for Priority vs Non-Priority Messages

It's the most trivial and simplistic implementation which can cater to implementing basic priority in processing records at consumer end.

Having dedicated input source/topic for priority messages keeps them separate from the bulk of non-priority messages and helps allocating dedicated processing capacity. We can spin up separate instances of consumer for dedicatedly processing priority messages.

High Level Design

PRO/CON Analysis

Pros

Simple and Faster Implementation.
Independent scaling based on traffic patterns for messages based on priority.

Cons

Increased implementation complexity due to maintenance and processing from two different sources.
Producer systems may not support the option of sending messages to two different sources.

Approach 2: Priority Queue: Using In-Memory Prioritization Framework

n systems where separate input sources for different priority messages are not feasible, consumer systems can implement an in-memory prioritization mechanism. A commonly used data structure for this purpose is Priority Queues. This approach allows the consumer system to adjust prioritization criteria according to their needs while simplifying the source system by maintaining a single input source.

High Level Design

To optimize Priority Queue implementation and reduce the frequency of re-arranging, records can be read in batches from the input source. In EDA systems using Kafka, this capability is inherently supported by the platform's consumer APIs, with parameters like max.poll.records.

Time Complexity of Priority Queue

The time complexity of Priority Queue for insertion(enqueue) and deletion (dequeue) methods, is O(log(n)).
For removal and contains methods, the time complexity is linear. Methods that retrieve elements from the Priority Queue have constant time complexity. By default, Priority Queue elements are naturally ordered. To change the ordering, a comparator can be specified during the creation of the Priority Queue object, which the Priority Queue will then use to order its elements. Several commercial products, such as Redis, offer this feature for reuse.

Deciding criteria for priority queue implementation

Based on attribute in incoming payload
Predefined set of rules which application can implement.

PRO/CON Analysis

Pros

Relatively simple implementation. In Memory Caching systems like Redis has this supported.
Prioritization Criteria can be customized at consumer end.

Cons

In memory Queue is Vulnerable to System failures.
It's an additional Hop in system and introduces some additional processing time.
Additional Overhead for maintaining In-memory queues.

Approach 3: Database: Using DB for Storing and Priority Processing

Using databases to store and fetch messages based on priority can be an alternative for systems seeking more robust and scalable prioritization. Leveraging a database for storage ensures that the system can handle failures more reliably. Records can be retrieved based on various prioritization criteria using an SQL interface.

High Level Design

PRO/CON Analysis

Pros

Robust failure mechanism handling leveraging DB system.
High Flexibility in customizing/changing Prioritization Criteria. Prioritization related SQL queries can be stored and updated outside of application scope On-Demand basis.

Cons

Additional Overhead with DB. SYNC time between Input Source and DB records may introduce some latency in overall processing.
Additional Cost implication related to Database.

Conclusion

Implementing priority in real-time event-based systems remains an architectural challenge, and the design patterns discussed above are some general approaches to address this need. Absolute priority implementation is elusive in these systems due to constantly changing data, necessitating some form of time window or slicing to define the scope of prioritization.

As more systems adopt real-time EDA processing, the demand for priority implementation is growing, and various approaches have evolved across the industry to meet this need. The final choice for any implementation depends on finding the optimal balance between the pros and cons of each approach.

This blog aimed to share knowledge on possible architectures for priority implementation in real-time event-driven systems, based on my years of industry experience with similar use cases. I hope this provides valuable insights and thought processes for teams and applications with similar needs.

References

https://www.confluent.io/blog/prioritize-messages-in-kafka/

https://netflixtechblog.com/timestone-netflixs-high-throughput-low-latency-priority-queueing-system-with-built-in-support-1abf249ba95f