DEV Community

Cover image for Multimodal Madness! Create a Product Recommender for Smart Shopping
Chloe Williams for Zilliz

Posted on

Multimodal Madness! Create a Product Recommender for Smart Shopping

In this issue: 

  • Milvus Magic: Image Search & Smart Shopping!

  • Multimodal RAG resources

  • New! GenAI Resource Hub

  • Community Spotlights

  • Learn Milvus Live! 

🛍️Building a Multimodal Product Recommender Demo Using Milvus and Streamlit 

Ever wished you could find products just by showing a picture and describing what you want? Well, now you can! 🛍️✨Upload an image, enter text instructions, and find the closest-matching Amazon products from the Milvus vector database. Follow the step-by-step tutorial here.

🧙Key technologies to make the magic happen: 

🔎MagicLens - multimodal embedding model using a dual-encoder architecture to process text and images based on CLIP (OpenAI 2021) or CoCa (Google Research 2022).

💻OpenAI’s GPT-4o - generative Multimodal Large Language Model by OpenAI that integrates text, images, and other data types into a single model, enhancing traditional language models.

🐦Milvus - open-source, distributed vector database for storing, indexing, and searching vectors for Generative AI workloads.

🌐Streamlit - open-source Python library that simplifies creating and running web applications. 

Read tutorial 

Multimodal RAG Demo - Code 

🧠 Multimodal RAG Resources

Don’t know where to get started with Multimodal RAG? Check out these resources from basic to advanced! 

LEARN - What is Multimodal RAG? 

Multimodal RAG is an extended RAG framework incorporating multimodal data including various data types such as text, images, audio, videos, etc. Real-world applications, challenges, advantages, and more are highlighted below. 

Learn more

STEP ONE - Multimodal Embeddings

What’s the first step to building a multimodal retrieval augmented generation (RAG) app? Getting multimodal vector embeddings. Use CLIP to create the embeddings of the input data, Milvus to store the embeddings of the multimodal data (sometimes termed “multimodal embeddings”), and FiftyOne to explore the embeddings. 

🐴🚗 “Pony” is clearly a horse, “Ferrari” is clearly a car, but “Mustang” could be either.

Let’s compare images of words that may have different meanings in different contexts. 

Exploring Multimodal Embeddings with FiftyOne and Milvus

FOLLOW ALONG - Multimodal RAG Pipelines 

This resource focuses on how to use data to build a better multimodal RAG pipeline. It emphasizes using free and open-source tools, specifically leveraging FiftyOne for data management and visualization, Milvus as a vector store, and LlamaIndex for orchestrating large language models (LLMs).

Build Better Multimodal RAG Pipelines with FiftyOne, LlamaIndex, and Milvus

Watch Tutorial

FOLLOW ALONG: Multimodal RAG locally with CLIP and Llama3

The core idea behind the CLIP (Contrastive Language-Image Pretraining) model is to understand the connection between a picture and text. Learn how to generate embeddings with the CLIP ViT-B/32 model and use Llama3 as the LLM to build multimodal RAG. 

Get Started

Learn more about CLIP

GenAI Resource Hub

The new GenAI Resource Hub has Tutorials, Code Examples, and Best Practices for Developing and Deploying GenAI Applications. 

🎓 Learn

Basic concepts developers need to understand to create RAG/GenAl applications

🛠️ Build

Practical resources to help you build sample RAG/GenAl applications

🌎 Explore

After building your GenAl/RAG demo, learn what it takes to deploy it into production effectively

Start Building

👥 Community Spotlights 

We see you 👀building cool apps with Milvus! Need inspiration? Check out these fun projects 🔥: 

🩺MediSaga: A Retrieval Augmented Generation (RAG) chatbot application designed to answer medical questions by KF Surya!

https://www.linkedin.com/feed/update/urn:li:activity:7221854199649607680/ 

📄PDF Content Analyzer: a web application that extracts, processes, and analyzes content from PDF files by sarthakgarg07

📚Textbook Question Answering System: This system uses advanced natural language processing and machine learning techniques to answer questions based on textbook content by Aryaman Tiwari

https://www.linkedin.com/feed/update/urn:li:activity:7221782938726588417/ 

🎓 Learn Milvus Live!

Join us for upcoming virtual and in-person events to learn Milvus live. 

Aug 8: Building an Agentic RAG locally with Milvus, Ollama, and Llama Agents (virtual)

With the recent release of Llama Agents, we can now build agents that are async first and run as their own service. During this webinar, Stephen will show you how to build an Agentic RAG System using Llama Agents and Milvus.

Save Your Spot

Aug 13: South Bay Unstructured Data Meetup (in-person) 

We’ll be back at SAP in Palo Alto for our meetup! Talks from TwelveLabs, Zilliz, and more coming soon.

Register

Aug 13: New York Unstructured Data Meetup (in-person) 


We got a stacked speaker lineup for the New York meetup! Join us for the following AI talks: 

▶️ Quick intro to unstructured data, edge ai and Milvus

▶️ Modern Analytics & Reporting with Milvus Vector DB and GenAI

▶️ cuVS+Milvus

▶️ Combining Hugging Face Transformer Models and Visual Data with FiftyOne

Register

👾 Discord

Join our Discord channel to engage with our engineers and community members.

Enjoying Milvus? 

⭐ Give us a Star on GitHub

Top comments (0)