Retail Search Observation
I was on a hunt for glow in the dark shoes (don’t ask). It became evident after searching a few websites, that many in-house search engines are, well, for the lack of a better description, a bit dumb. These in-house search engines were returning “dark” shoes by keyword matching “dark”, but I was searching for shoes with a little more pizazz. I needed the glow.
Tech Introduction
Artificial intelligence capabilities have taken a quantum leap. One such capability is the ability to semantically search images and text. AI search can understand the meaning of a users’ search and go beyond simple keyword matching. AI search can accurately understand that I am searching for not dark, but the concept, glow in the dark. And get me this glow, I so desperately crave.
Below, I will demonstrate a simple AI text to image search using OpenAI’s CLIP model. Demo video and code are listed at the top of this article.
Demo Dataset
I chose this fantastic demo dataset that includes 44,000 fashion products. This dataset consists of eCommerce retail clothing images and their captions.
Step 1: Indexing the Fashion Images aka Storing the AI’s understanding of the Image
OpenAI’s CLIP can take an input image and run the image through the model to output a vector. This vector has a numerical understanding of the image contents. For instance, the vector might encode if there is a shirt in the image, the style, etc. We will want to search these vectors. In my particular example, we will want to search these vectors for “glow in the dark shoes”. But the beauty of this AI search is that a user can search for any query and the AI will attempt to understand the query and find the correct image. All of this searching is made possible by vectors.
However, before we can search these vectors, we will index the vectors and store for fast retrieval. We will use Spotify’s Annoy library.
Here is the script to create the CLIP indices of our fashion dataset.
image_features = model.encode_image(image)
t.add_item(i, image_features[0])
Above are two important lines to understand. The first line encodes each image. The “image_features” will be a vector that represents the content/semantics of the image. The second line adds the vector to our index for faster retrieval. To summarize this script, we are taking each fashion image and getting the AI numerical understanding of the image and storing it so we can later retrieve searches fast.
Step 2: Searching the Fashion Images with Natural Language
In the previous step we created an index or data store of the numerical vector understanding of each image. Next, we will compute a similar numerical vector understanding of the user’s search query.
The line below uses openAI’s CLIP model to get the vector understanding of the user’s search text.
text_features = model.encode_text(text)
Next order of business is to find what fashion product image best matches the user’s search. We compare the search text vector with the many image vectors, to find the most similar matches. This is matching the AI’s understanding of the text search and the fashion images. This is done with the code below.
nns = u.get_nns_by_vector(text_features[0], n_results,search_k=-1, include_distances=False)
There you have it, a simple way to leverage AI to create a smart retail search!
Top comments (0)