Chloe Williams for Zilliz

Posted on Aug 7, 2024

Multimodal Madness! Create a Product Recommender for Smart Shopping

#ai #tutorial #beginners #rag

In this issue:

Milvus Magic: Image Search & Smart Shopping!
Multimodal RAG resources
New! GenAI Resource Hub
Community Spotlights
Learn Milvus Live!

🛍️Building a Multimodal Product Recommender Demo Using Milvus and Streamlit

Ever wished you could find products just by showing a picture and describing what you want? Well, now you can! 🛍️✨Upload an image, enter text instructions, and find the closest-matching Amazon products from the Milvus vector database. Follow the step-by-step tutorial here.

🧙Key technologies to make the magic happen:

🔎MagicLens - multimodal embedding model using a dual-encoder architecture to process text and images based on CLIP (OpenAI 2021) or CoCa (Google Research 2022).

💻OpenAI’s GPT-4o - generative Multimodal Large Language Model by OpenAI that integrates text, images, and other data types into a single model, enhancing traditional language models.

🐦Milvus - open-source, distributed vector database for storing, indexing, and searching vectors for Generative AI workloads.

🌐Streamlit - open-source Python library that simplifies creating and running web applications.

Read tutorial

Multimodal RAG Demo - Code

🧠 Multimodal RAG Resources

Don’t know where to get started with Multimodal RAG? Check out these resources from basic to advanced!

LEARN - What is Multimodal RAG?

Multimodal RAG is an extended RAG framework incorporating multimodal data including various data types such as text, images, audio, videos, etc. Real-world applications, challenges, advantages, and more are highlighted below.

Learn more

STEP ONE - Multimodal Embeddings

What’s the first step to building a multimodal retrieval augmented generation (RAG) app? Getting multimodal vector embeddings. Use CLIP to create the embeddings of the input data, Milvus to store the embeddings of the multimodal data (sometimes termed “multimodal embeddings”), and FiftyOne to explore the embeddings.

🐴🚗 “Pony” is clearly a horse, “Ferrari” is clearly a car, but “Mustang” could be either.

Let’s compare images of words that may have different meanings in different contexts.

Exploring Multimodal Embeddings with FiftyOne and Milvus

FOLLOW ALONG - Multimodal RAG Pipelines

This resource focuses on how to use data to build a better multimodal RAG pipeline. It emphasizes using free and open-source tools, specifically leveraging FiftyOne for data management and visualization, Milvus as a vector store, and LlamaIndex for orchestrating large language models (LLMs).

Build Better Multimodal RAG Pipelines with FiftyOne, LlamaIndex, and Milvus

Watch Tutorial

FOLLOW ALONG: Multimodal RAG locally with CLIP and Llama3

The core idea behind the CLIP (Contrastive Language-Image Pretraining) model is to understand the connection between a picture and text. Learn how to generate embeddings with the CLIP ViT-B/32 model and use Llama3 as the LLM to build multimodal RAG.

Get Started

Learn more about CLIP

GenAI Resource Hub

The new GenAI Resource Hub has Tutorials, Code Examples, and Best Practices for Developing and Deploying GenAI Applications.

🎓 Learn

Basic concepts developers need to understand to create RAG/GenAl applications

🛠️ Build

Practical resources to help you build sample RAG/GenAl applications

🌎 Explore

After building your GenAl/RAG demo, learn what it takes to deploy it into production effectively

Start Building

👥 Community Spotlights

We see you 👀building cool apps with Milvus! Need inspiration? Check out these fun projects 🔥:

🩺MediSaga: A Retrieval Augmented Generation (RAG) chatbot application designed to answer medical questions by KF Surya!

https://www.linkedin.com/feed/update/urn:li:activity:7221854199649607680/

📄PDF Content Analyzer: a web application that extracts, processes, and analyzes content from PDF files by sarthakgarg07.

📚Textbook Question Answering System: This system uses advanced natural language processing and machine learning techniques to answer questions based on textbook content by Aryaman Tiwari

https://www.linkedin.com/feed/update/urn:li:activity:7221782938726588417/

🎓 Learn Milvus Live!

Join us for upcoming virtual and in-person events to learn Milvus live.

Aug 8: Building an Agentic RAG locally with Milvus, Ollama, and Llama Agents (virtual)

With the recent release of Llama Agents, we can now build agents that are async first and run as their own service. During this webinar, Stephen will show you how to build an Agentic RAG System using Llama Agents and Milvus.

Save Your Spot

Aug 13: South Bay Unstructured Data Meetup (in-person)

We’ll be back at SAP in Palo Alto for our meetup! Talks from TwelveLabs, Zilliz, and more coming soon.

Aug 13: New York Unstructured Data Meetup (in-person)

We got a stacked speaker lineup for the New York meetup! Join us for the following AI talks:

▶️ Quick intro to unstructured data, edge ai and Milvus

▶️ Modern Analytics & Reporting with Milvus Vector DB and GenAI

▶️ cuVS+Milvus

▶️ Combining Hugging Face Transformer Models and Visual Data with FiftyOne

👾 Discord

Join our Discord channel to engage with our engineers and community members.

Enjoying Milvus?

⭐ Give us a Star on GitHub!

Forem

Multimodal Madness! Create a Product Recommender for Smart Shopping

🛍️Building a Multimodal Product Recommender Demo Using Milvus and Streamlit

🧠 Multimodal RAG Resources

GenAI Resource Hub

👥 Community Spotlights

🎓 Learn Milvus Live!

👾 Discord

Enjoying Milvus?

Top comments (0)

Read next

🤯 #NODES24: a practical path to Cloud-Native Knowledge Graph Automation & AI Agents

Unlock the World of Photogrammetry: A Free Course from University of Bonn

Streamline Java Object Initialization with Inline Blocks (Including Public Fields)

Bringing a DeepSeek R1 LangGraph Agent Into The Real World Using CopilotKit