Akriti Upadhyay

Posted on Feb 28, 2024

Perform Image-Driven Reverse Image Search on E-Commerce Sites with ImageBind and Qdrant

#ai #vectordatabase #imagebind #qdrant

Introduction

In 1950, when Alan Turing introduced the term "Machine Intelligence" in his paper "Computing Machinery and Intelligence," no one had imagined that one day, in the future, it would lead to various innovations using artificial intelligence in different domains. One domain that is very popular and important among users is online shopping. With the surge in e-commerce, users increasingly rely on visual cues to guide them in their purchasing decisions. In response to this shift in consumer behavior, image-driven product search has emerged as a powerful tool to enhance the shopping experience. E-commerce platforms like Amazon, Myntra, Ajio, and Meesho are using image-driven product search widely.

You must be familiar with image-driven searches on shopping websites. This innovative approach uses the visual content in images to let users explore products more intuitively and efficiently. By simply uploading or capturing an image, shoppers can quickly find similar or related items within a vast catalog. Whether seeking fashion inspiration, home decor ideas, or specific product recommendations, image-driven search offers a dynamic and personalized shopping journey tailored to individual preferences and tastes.

We can make the results accurate by using the recently developed all-in-one embedding model by Meta: ImageBind. But before using the embedding model, we need to use a vector database to store them.

When it comes to image search, vector databases have been particularly transformative. Traditional image search methods often rely on metadata tags or textual descriptions, which can be limited in capturing the rich visual content of images. With vector databases, images are transformed into high-dimensional vectors that encapsulate their visual features by allowing for more accurate and nuanced similarity comparisons. This means that users can search for images based on visual similarity by enabling tasks for e-commerce product search from images with remarkable precision. However, when it comes to vector databases, we are faced with a dilemma: which database will be the best for our application?

Here, I have chosen the Qdrant Vector database, which offers an advanced search algorithm for approximate nearest neighbor search for advanced AI applications, namely the HNSW algorithm. Meesho is already using the Qdrant vector database but still, the results are not very accurate. We can make the results more accurate by integrating the Qdrant and ImageBind embedding models. Before diving into the content, let’s look at the steps:

Loading the Dataset.
Initializing the Qdrant Vector DB.
Image Embeddings with ImageBind.
Deploying with Gradio.

Reverse Product Image Search with Qdrant

Let's install the dependencies first to get started with the reverse product image search.



%pip install opendatasets gradio qdrant-client transformers sentence_transformers sentencepiece tqdm

Loading the Dataset

Using the opendatasets library, download the Kaggle dataset using your username and key. You can obtain them by visiting the Settings page on Kaggle. Click on "Access API Keys," and a kaggle.json file will be downloaded. This file will contain your username and API key.



import opendatasets as od
od.download("https://www.kaggle.com/datasets/vikashrajluhaniwal/fashion-images")

Now, let’s store the images in a list so that we can easily access the images.



import random
import gradio as gr
from PIL import Image
from qdrant_client import QdrantClient
from qdrant_client.http import models
import tempfile
import os
from tqdm import tqdm



import os

def get_image_paths(directory):
    # Initialize an empty list to store the image paths
    image_paths = []

    # Iterate through all files and directories within the given directory
    for root, dirs, files in os.walk(directory):
        for file in files:
            # Check if the file has an image extension (e.g., .jpg, .png, .jpeg, etc.)
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp')):
                # Construct the full path to the image file
                image_path = os.path.join(root, file)
                # Append the image path to the list
                image_paths.append(image_path)

    return image_paths

# Directory paths
women_directory = './fashion-images/data/Footwear_Women/Images/images_with_product_ids/'
men_directory = './fashion-images/data/Footwear_Men/Images/images_with_product_ids/'
girls_directory = './fashion-images/data/Apparel_Girls/Images/images_with_product_ids/'
boys_directory = './fashion-images/data/Apparel_Boys/Images/images_with_product_ids/'

# Get image paths for different categories
image_paths_Women = get_image_paths(women_directory)
image_paths_Men = get_image_paths(men_directory)
image_paths_Girls = get_image_paths(girls_directory)
image_paths_Boys = get_image_paths(boys_directory)

all_image_paths = []
all_image_paths.append(image_paths_Boys)
all_image_paths.append(image_paths_Girls)
all_image_paths.append(image_paths_Men)
all_image_paths.append(image_paths_Women)

Initializing the Qdrant Vector DB

Initialize the Qdrant Client with in-memory storage. The collection name will be “imagebind_data” and we will be using cosine distance.



# Initialize Qdrant client and load collection
client = QdrantClient(":memory:")
client.recreate_collection(collection_name = "imagebind_data", 
                           vectors_config = {"image": models.VectorParams( size = 1024, distance = models.Distance.COSINE ) } )

Image Embeddings with ImageBind

ImageBind is an innovative model developed by Meta AI’s FAIR Lab. This model is designed to learn a joint embedding across six different modalities: images, text, audio, depth, thermal, and IMU data. One of the key features of ImageBind is its ability to learn this joint embedding without requiring all combinations of paired data. It has been discovered that only image-paired data is necessary to bind the modalities together effectively. This unique capability allows ImageBind to leverage recent large-scale vision-language models and extend their zero-shot capabilities to new modalities simply by utilizing their natural pairing with images.

We’ll use ImageBind for creating embeddings, but before diving deep, first, let’s follow some steps required for installing ImageBind.

Clone the git repository of Imagebind:



git clone https://github.com/facebookresearch/ImageBind.git

Change the directory:



cd Imagebind

Edit the requirements.txt file: Delete Mayavi and Cartopy if getting errors or facing problems.
Install the requirements:



pip install -r requirements.txt

Come back to your system.

Then, load the model.



import sys 
sys.path.append("./ImageBind/")

device = "cuda"
import imagebind
from imagebind.models import imagebind_model
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)

After initializing the model, we will now create embeddings.



from imagebind.models.imagebind_model import ModalityType
from imagebind import data
import torch

embeddings_list = []

for image_paths in [image_paths_Boys, image_paths_Girls, image_paths_Men, image_paths_Women]:
    inputs = {ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device)}
    with torch.no_grad():
        embeddings = model(inputs)
    embeddings_list.append(embeddings)

Then we’ll update the Qdrant Vector DB with the generated embeddings.



import uuid

points = []

# Iterate over each embeddings and corresponding image paths
for idx, (embedding, image_paths) in enumerate(zip(embeddings, all_image_paths)):
    for sub_idx, sample in enumerate(image_paths):
        # Convert the sample to a dictionary
        payload = {"path": sample}
        # Generate a unique UUID for each point
        point_id = str(uuid.uuid4())
        points.append(models.PointStruct(id=point_id,
                                         vector= {"image": embedding['vision'][sub_idx]}, 
                                         payload=payload)
                      )

client.upsert(collection_name="imagebind_data", points=points)

We’ll prepare a processing function in which we will take the image as an input and perform a reverse image search with the help of the embeddings.



def process_text(image_query):

    user_query = [image_query]
    dtype, modality = ModalityType.VISION, 'image'
    user_input = {dtype: data.load_and_transform_vision_data(user_query, device)}

    with torch.no_grad():
        user_embeddings = model(user_input)

    image_hits = client.search(
        collection_name='imagebind_data',
        query_vector=models.NamedVector(
            name="image",
            vector=user_embeddings[dtype][0].tolist()
            )
    )
    # Check if 'path' is in the payload of the first hit
    if image_hits and 'path' in image_hits[0].payload:
        return (image_hits[0].payload['path'])
    else:
        return None

Deploying with Gradio

Now that we have prepared the processing image function, we’ll use Gradio to deploy by defining its interface.



import tempfile
tempfile.tempdir = "./fashion-images/data"

# Gradio Interface
iface = gr.Interface(
    title="Reverse Image Search with Imagebind",
    description="Leveraging Imagebind to perform reverse image search for ecommerce products",
    fn=process_text,
    inputs=[
        gr.Image(label="image_query", type="filepath")
        ],
    outputs=[
        gr.Image(label="Image")],  
)

Image Search Using Product Category

If you want to search images with the product category, then you have to define some functions. We’ll define a function to get images from the category.



# Define function to get images of selected category
def get_images_from_category(category):
    # Convert category to string
    category_str = str(category)
    # Directory path for selected category
    category_dir = f"./fashion-images/data/{category_str.replace(' ', '_')}/Images/images_with_product_ids/"
    # List of image paths
    image_paths = os.listdir(category_dir)
    # Open and return images
    images = [Image.open(os.path.join(category_dir, img_path)) for img_path in image_paths]
    return images

Then list the product categories.



# Define your product categories
product_categories = ["Apparel Boys", "Apparel Girls", "Footwear Men", "Footwear Women"]

After that, we’ll define a function for category selection.



# Define function to handle category selection
def select_category(category):
    # Get images corresponding to the selected category
    images = get_images_from_category(category)
    # Return a random image from the list
    return random.choice(images)

Deploying with Gradio

Now, we’ll create Gradio interface components for the category selection, such as category dropdown and submit button.



# Create interface components for the category selection
category_dropdown = gr.Dropdown(product_categories, label="Select a product category")
submit_button = gr.Button()
images_output = gr.Image(label="Images of Selected Category")

After that, we’ll create a Gradio interface and pass the functions and components.



category_search_interface = gr.Interface(
    fn=select_category,
    inputs=category_dropdown,
    outputs=images_output,
    title="Category-driven Product Search for Ecommerce",
    description="Select a product category to view a random image from the corresponding directory.",
)

Merging Two Gradio Interfaces

We have deployed two Gradio Interfaces: one is Reverse Image Search and another is Image Search Using Product Category. What if we can see both in one application? That can be performed by using TabbedInterface.



# Combine both interfaces into the same API
combined_interface = gr.TabbedInterface([iface, category_search_interface])

# Launch the combined interface
combined_interface.launch(share=True)

Now, we’ll get an internal URL and a public URL. The application is ready with two tabs.

Tab 0: Reverse Image Search with ImageBind for E-Commerce

Tab 1: Category-Driven Product Search for E-Commerce

Examples

Let’s see how our application performs.

I passed a girl’s image who was wearing a sleeveless Dress. Let’s see if our application can find a similar image from the products.

The application found a dress which is quite similar. Impressive!

Let’s try passing Shoes as an Image. I passed a picture of three people’s feet with sneakers. Let’s see the result.

The result is impressive. The output is sneakers of the same style.

Let’s try a category search. For example, I want to see what are the products in the Boys Apparel category. Gradio can give only one image as output, but in this application, it will not give the same image for every search.

The first search gave an image of a white t-shirt. Let’s search again to see what other t-shirts there are.

The result is a blue-gray t-shirt. So, the results are not repeated. Great!

Conclusion

With Qdrant Vector DB, reverse image search is possible for e-commerce products. We saw from our results how to perform a reverse image search by uploading an image and getting images of similar products from select product categories. The results were accurate with the help of the ImageBind embedding model.

Hope you enjoyed reading this blog. Now it is your turn to try this integration.

CodeSpace

You can find the code on GitHub.

Thanks for reading!

This blog was originally published here: https://medium.com/@akriti.upadhyay/perform-image-driven-reverse-image-search-on-e-commerce-sites-with-imagebind-and-qdrant-0a62f0169e19

DEV Community

Perform Image-Driven Reverse Image Search on E-Commerce Sites with ImageBind and Qdrant

Introduction

Reverse Product Image Search with Qdrant

Loading the Dataset

Initializing the Qdrant Vector DB

Image Embeddings with ImageBind

Deploying with Gradio

Image Search Using Product Category

Deploying with Gradio

Merging Two Gradio Interfaces

Examples

Conclusion

CodeSpace

Top comments (0)

Read next

AGI is not that important

Salesforce vs. HubSpot: Which CRM is Right for Your Team?

Building a Local AI Code Reviewer with ClientAI and Ollama - Part 2

The Limitations of Machine Learning: What We Still Can't Teach Machines