Introduction
In 1950, when Alan Turing introduced the term "Machine Intelligence" in his paper "Computing Machinery and Intelligence," no one had imagined that one day, in the future, it would lead to various innovations using artificial intelligence in different domains. One domain that is very popular and important among users is online shopping. With the surge in e-commerce, users increasingly rely on visual cues to guide them in their purchasing decisions. In response to this shift in consumer behavior, image-driven product search has emerged as a powerful tool to enhance the shopping experience. E-commerce platforms like Amazon, Myntra, Ajio, and Meesho are using image-driven product search widely.
You must be familiar with image-driven searches on shopping websites. This innovative approach uses the visual content in images to let users explore products more intuitively and efficiently. By simply uploading or capturing an image, shoppers can quickly find similar or related items within a vast catalog. Whether seeking fashion inspiration, home decor ideas, or specific product recommendations, image-driven search offers a dynamic and personalized shopping journey tailored to individual preferences and tastes.
We can make the results accurate by using the recently developed all-in-one embedding model by Meta: ImageBind. But before using the embedding model, we need to use a vector database to store them.
When it comes to image search, vector databases have been particularly transformative. Traditional image search methods often rely on metadata tags or textual descriptions, which can be limited in capturing the rich visual content of images. With vector databases, images are transformed into high-dimensional vectors that encapsulate their visual features by allowing for more accurate and nuanced similarity comparisons. This means that users can search for images based on visual similarity by enabling tasks for e-commerce product search from images with remarkable precision. However, when it comes to vector databases, we are faced with a dilemma: which database will be the best for our application?
Here, I have chosen the Qdrant Vector database, which offers an advanced search algorithm for approximate nearest neighbor search for advanced AI applications, namely the HNSW algorithm. Meesho is already using the Qdrant vector database but still, the results are not very accurate. We can make the results more accurate by integrating the Qdrant and ImageBind embedding models. Before diving into the content, let’s look at the steps:
- Loading the Dataset.
- Initializing the Qdrant Vector DB.
- Image Embeddings with ImageBind.
- Deploying with Gradio.
Reverse Product Image Search with Qdrant
Let's install the dependencies first to get started with the reverse product image search.
%pip install opendatasets gradio qdrant-client transformers sentence_transformers sentencepiece tqdm
Loading the Dataset
Using the opendatasets
library, download the Kaggle dataset using your username and key. You can obtain them by visiting the Settings page on Kaggle. Click on "Access API Keys," and a kaggle.json
file will be downloaded. This file will contain your username and API key.
import opendatasets as od
od.download("https://www.kaggle.com/datasets/vikashrajluhaniwal/fashion-images")
Now, let’s store the images in a list so that we can easily access the images.
import random
import gradio as gr
from PIL import Image
from qdrant_client import QdrantClient
from qdrant_client.http import models
import tempfile
import os
from tqdm import tqdm
import os
def get_image_paths(directory):
# Initialize an empty list to store the image paths
image_paths = []
# Iterate through all files and directories within the given directory
for root, dirs, files in os.walk(directory):
for file in files:
# Check if the file has an image extension (e.g., .jpg, .png, .jpeg, etc.)
if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp')):
# Construct the full path to the image file
image_path = os.path.join(root, file)
# Append the image path to the list
image_paths.append(image_path)
return image_paths
# Directory paths
women_directory = './fashion-images/data/Footwear_Women/Images/images_with_product_ids/'
men_directory = './fashion-images/data/Footwear_Men/Images/images_with_product_ids/'
girls_directory = './fashion-images/data/Apparel_Girls/Images/images_with_product_ids/'
boys_directory = './fashion-images/data/Apparel_Boys/Images/images_with_product_ids/'
# Get image paths for different categories
image_paths_Women = get_image_paths(women_directory)
image_paths_Men = get_image_paths(men_directory)
image_paths_Girls = get_image_paths(girls_directory)
image_paths_Boys = get_image_paths(boys_directory)
all_image_paths = []
all_image_paths.append(image_paths_Boys)
all_image_paths.append(image_paths_Girls)
all_image_paths.append(image_paths_Men)
all_image_paths.append(image_paths_Women)
Initializing the Qdrant Vector DB
Initialize the Qdrant Client with in-memory storage. The collection name will be “imagebind_data” and we will be using cosine distance.
# Initialize Qdrant client and load collection
client = QdrantClient(":memory:")
client.recreate_collection(collection_name = "imagebind_data",
vectors_config = {"image": models.VectorParams( size = 1024, distance = models.Distance.COSINE ) } )
Image Embeddings with ImageBind
ImageBind is an innovative model developed by Meta AI’s FAIR Lab. This model is designed to learn a joint embedding across six different modalities: images, text, audio, depth, thermal, and IMU data. One of the key features of ImageBind is its ability to learn this joint embedding without requiring all combinations of paired data. It has been discovered that only image-paired data is necessary to bind the modalities together effectively. This unique capability allows ImageBind to leverage recent large-scale vision-language models and extend their zero-shot capabilities to new modalities simply by utilizing their natural pairing with images.
We’ll use ImageBind for creating embeddings, but before diving deep, first, let’s follow some steps required for installing ImageBind.
- Clone the git repository of Imagebind:
git clone https://github.com/facebookresearch/ImageBind.git
- Change the directory:
cd Imagebind
- Edit the requirements.txt file: Delete Mayavi and Cartopy if getting errors or facing problems.
- Install the requirements:
pip install -r requirements.txt
- Come back to your system.
Then, load the model.
import sys
sys.path.append("./ImageBind/")
device = "cuda"
import imagebind
from imagebind.models import imagebind_model
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)
After initializing the model, we will now create embeddings.
from imagebind.models.imagebind_model import ModalityType
from imagebind import data
import torch
embeddings_list = []
for image_paths in [image_paths_Boys, image_paths_Girls, image_paths_Men, image_paths_Women]:
inputs = {ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device)}
with torch.no_grad():
embeddings = model(inputs)
embeddings_list.append(embeddings)
Then we’ll update the Qdrant Vector DB with the generated embeddings.
import uuid
points = []
# Iterate over each embeddings and corresponding image paths
for idx, (embedding, image_paths) in enumerate(zip(embeddings, all_image_paths)):
for sub_idx, sample in enumerate(image_paths):
# Convert the sample to a dictionary
payload = {"path": sample}
# Generate a unique UUID for each point
point_id = str(uuid.uuid4())
points.append(models.PointStruct(id=point_id,
vector= {"image": embedding['vision'][sub_idx]},
payload=payload)
)
client.upsert(collection_name="imagebind_data", points=points)
We’ll prepare a processing function in which we will take the image as an input and perform a reverse image search with the help of the embeddings.
def process_text(image_query):
user_query = [image_query]
dtype, modality = ModalityType.VISION, 'image'
user_input = {dtype: data.load_and_transform_vision_data(user_query, device)}
with torch.no_grad():
user_embeddings = model(user_input)
image_hits = client.search(
collection_name='imagebind_data',
query_vector=models.NamedVector(
name="image",
vector=user_embeddings[dtype][0].tolist()
)
)
# Check if 'path' is in the payload of the first hit
if image_hits and 'path' in image_hits[0].payload:
return (image_hits[0].payload['path'])
else:
return None
Deploying with Gradio
Now that we have prepared the processing image function, we’ll use Gradio to deploy by defining its interface.
import tempfile
tempfile.tempdir = "./fashion-images/data"
# Gradio Interface
iface = gr.Interface(
title="Reverse Image Search with Imagebind",
description="Leveraging Imagebind to perform reverse image search for ecommerce products",
fn=process_text,
inputs=[
gr.Image(label="image_query", type="filepath")
],
outputs=[
gr.Image(label="Image")],
)
Image Search Using Product Category
If you want to search images with the product category, then you have to define some functions. We’ll define a function to get images from the category.
# Define function to get images of selected category
def get_images_from_category(category):
# Convert category to string
category_str = str(category)
# Directory path for selected category
category_dir = f"./fashion-images/data/{category_str.replace(' ', '_')}/Images/images_with_product_ids/"
# List of image paths
image_paths = os.listdir(category_dir)
# Open and return images
images = [Image.open(os.path.join(category_dir, img_path)) for img_path in image_paths]
return images
Then list the product categories.
# Define your product categories
product_categories = ["Apparel Boys", "Apparel Girls", "Footwear Men", "Footwear Women"]
After that, we’ll define a function for category selection.
# Define function to handle category selection
def select_category(category):
# Get images corresponding to the selected category
images = get_images_from_category(category)
# Return a random image from the list
return random.choice(images)
Deploying with Gradio
Now, we’ll create Gradio interface components for the category selection, such as category dropdown and submit button.
# Create interface components for the category selection
category_dropdown = gr.Dropdown(product_categories, label="Select a product category")
submit_button = gr.Button()
images_output = gr.Image(label="Images of Selected Category")
After that, we’ll create a Gradio interface and pass the functions and components.
category_search_interface = gr.Interface(
fn=select_category,
inputs=category_dropdown,
outputs=images_output,
title="Category-driven Product Search for Ecommerce",
description="Select a product category to view a random image from the corresponding directory.",
)
Merging Two Gradio Interfaces
We have deployed two Gradio Interfaces: one is Reverse Image Search and another is Image Search Using Product Category. What if we can see both in one application? That can be performed by using TabbedInterface.
# Combine both interfaces into the same API
combined_interface = gr.TabbedInterface([iface, category_search_interface])
# Launch the combined interface
combined_interface.launch(share=True)
Now, we’ll get an internal URL and a public URL. The application is ready with two tabs.
Tab 0: Reverse Image Search with ImageBind for E-Commerce
Tab 1: Category-Driven Product Search for E-Commerce
Examples
Let’s see how our application performs.
I passed a girl’s image who was wearing a sleeveless Dress. Let’s see if our application can find a similar image from the products.
The application found a dress which is quite similar. Impressive!
Let’s try passing Shoes as an Image. I passed a picture of three people’s feet with sneakers. Let’s see the result.
The result is impressive. The output is sneakers of the same style.
Let’s try a category search. For example, I want to see what are the products in the Boys Apparel category. Gradio can give only one image as output, but in this application, it will not give the same image for every search.
The first search gave an image of a white t-shirt. Let’s search again to see what other t-shirts there are.
The result is a blue-gray t-shirt. So, the results are not repeated. Great!
Conclusion
With Qdrant Vector DB, reverse image search is possible for e-commerce products. We saw from our results how to perform a reverse image search by uploading an image and getting images of similar products from select product categories. The results were accurate with the help of the ImageBind embedding model.
Hope you enjoyed reading this blog. Now it is your turn to try this integration.
CodeSpace
You can find the code on GitHub.
Thanks for reading!
This blog was originally published here: https://medium.com/@akriti.upadhyay/perform-image-driven-reverse-image-search-on-e-commerce-sites-with-imagebind-and-qdrant-0a62f0169e19
Top comments (0)