Mrunmayee Rane

Posted on Apr 26 • Edited on May 26 • Originally published at Medium

Personalized Food Recommendation RAG bot on WhatsApp

#rags #nvidia #llm #llmapplications

Moving from New York City to the west coast, I found it difficult to decide as to what to eat for my meals. Also it was very challenging to find healthy and good restaurants in California. In New York city, it was easy to pick a spot and cuisine, cause every lane there were already 20–25 good restaurants. Having covered a wide range of restaurants in New York from the best of Chintan Pandya’s Dhamaka to the casual Thai at Up Thai, everything was at a quick walking or few subway stops away, In California, I was in for a surprise.

Let’s face the fact that finding meal options with personal preferences such as vegan, gluten free, sugar free, pescatarian food and a variety of cuisines is like searching for a needle in a haystack. Having recognized this dilemma and inspired by Nvidia’s LLM developer day. I embarked on a mission to simplify this search.

Goal? To create a system that understands your craving and points you to the ideal meal.

Journey began with yelp academic datasets. Huge goldmine of user reviews and business information. We zeroed it to California, a hub of diverse and vibrant culinary culture and narrowed it down to 20k samples for efficiency.

Leveraging Retrieval-Augmented Generation (RAG) for Personalized Recommendations

A key innovation in our system is the incorporation of Retrieval-Augmented Generation (RAG). RAG combines the strengths of both retrieval-based and generative AI models, enabling our system to provide highly accurate and personalized food recommendations. This approach works by first retrieving relevant information from our extensive dataset — in this case, the Yelp academic dataset — which includes a wide range of user reviews and business information. Then, using generative models, RAG synthesizes this information to produce coherent and context-specific recommendations. This method is particularly effective for catering to diverse dietary preferences and cuisines, as it can seamlessly integrate vast amounts of detailed data, including vegan, gluten-free, sugar-free, and pescatarian options. By leveraging RAG, we ensure that our recommendations are not just data-driven but also finely tuned to each user’s unique taste and preferences, truly embodying the essence of a personalized recommendation system.

Merged business and user reviews dataset, creating a detailed hashmap of businesses.

This hashmap contained detailed information for each business, including the name, ID, address, city, state, postal code, user reviews, operational hours, and categories — a treasure of information for any foodie. Recognizing the complexity of handling multiple user reviews and ratings for a single business ID, we employed an aggregation method. This approach averaged user ratings and consolidated multiple reviews per business, ensuring a more streamlined dataset. Subsequently, we transformed the hashmap back into a dataframe, and eventually into a CSV file, to facilitate easier referencing and mapping.

For the creation of embeddings and loading of the entire CSV document, we used langchain.document_loaders.csv_loader. To effectively manage the large volume of data, we divided the document into smaller chunks, enabling efficient processing by the LLM model. The RecursiveCharacterTextSplitter from LangChain was utilized for generic text splitting, ensuring the data was appropriately segmented.

Text Embeddings:

Model path sets the pre-trained model to be used for embeddings which is sentence-transformers/all-MiniLM-l6-v2.

It configures and initializes a sentence transformer model from Hugging Face for generating embeddings. It specifically uses the all-MiniLM-l6-v2 model, runs on the CPU, and produces non-normalized embeddings. Normalization is often used to standardize the length of the embedding vectors.

Chroma is a tool used for efficient similarity search and retrieval in large collections of data. It helps when there’s a need to find the most similar items quickly, while having a large number of embeddings. from_documents is a method that creates a Chroma database from a set of documents and their embeddings. embeddings is an object initialized using HuggingFaceEmbeddings. These embeddings are capable of converting text documents into vector embeddings. The embeddings for the docs are generated and used by Chroma to enable efficient similarity searches.

Retrieved Data:

Retriever creates a retriever object from the Chroma database (db), previously initialized.

as_retriever is a method, transforms the database into a retriever capable of performing search operations. search_type=”mmr” specifies the type of search algorithm used. “mmr” stands for Maximal Marginal Relevance. MMR is used to retrieve diverse results by balancing relevance and diversity, ensuring that the retrieved documents are not just relevant but also varied. get_relevant_documents is a method that takes a query and returns a list of documents that are most relevant to the query. num_results=7 specifies the number of results to return.It’s set to retrieve the top 7 relevant documents.

Above statements save the embeddings in a persistent directory, locally so that it can be easily retrieved when needed.

Then it loads the Chroma database for similarity searches and performs a search with a specified query. similarity_search_with_score is a method that searches for documents most similar to the given query based on their embeddings with similarity score and then sort them in descending order for highest ranking.

Prompt Creation:

I parsed details about the top 5 retrieved business information with a detailed prompt using the prompt template in langchain. Send this complete prompt to the Llama2–70B or Steerlm Llama 70B model using NVIDIA’s Cloud Function(NVCF) API and generate a response from it.

Integration with whatsapp through Twilio and Ngrok:

Twilio is a powerful platform for communications, enabling us to send and receive messages, make and receive phone calls, and more. In this project, we use Twilio to receive user queries via SMS and respond with food recommendations..

Setting Up Twilio Account :

First, you need to create a Twilio account and get a phone number that can send and receive SMS messages then obtain your Twilio Account SID, Auth Token, and phone number from the Twilio Console.

Install the Twilio Python helper library to handle messaging: pip install twilio

Configure Twilio to Forward Incoming Messages :

In the Twilio Console, configure your Twilio phone number to forward incoming messages to your FastAPI endpoint exposed by ngrok. This is typically done by setting the “Messaging” webhook URL to point to your /recommendation endpoint (e.g., https://your-domain.com/recommendation)..)

Processing Incoming Messages :

In the FastAPI app, we define an endpoint /recommendation that will handle incoming POST requests from Twilio. Twilio sends incoming messages to this endpoint. When a message is received, the content of the message is extracted and passed to the generate_answer function, which generates the food recommendation based on the user’s query. The response is then wrapped in a Twilio MessagingResponse object and sent back to the user.

Setting Up Webhooks in Twilio:

To complete the integration, you need to set up a webhook in Twilio to point to your FastAPI endpoint exposed by ngrok. Here’s how you can do it:

Log in to your Twilio Console.
Navigate to the “Phone Numbers” section and select the number you want to use.
In the “Messaging” section, set the “A Message Comes In” webhook to your ngrok URL, e.g., https://your-domain.com/recommendation.

Why Use Ngrok?

Ngrok is a tool that creates a secure tunnel to your localhost, allowing you to expose a local server to the internet. When developing locally, your FastAPI application runs on localhost, which is not accessible from the internet. Twilio needs a publicly accessible URL to send webhook requests to your /recommendation endpoint. Ngrok provides this by tunneling requests from a public URL to your local development server.

Setting Up Ngrok:

Install Ngrok by using _pip install pyngrok _command

Sign Up and Configure Ngrok :

Sign up for a free account on the Ngrok website to get your authentication token. After signing up, you will receive an authentication token which you need to configure Ngrok. Use the following command to add your auth token. “ngrok authtoken YOUR_AUTH_TOKEN”

First, run your FastAPI application on your local machine then Ngrok by opening a new terminal. “ngrok http 5000” After starting Ngrok, you will see the forwarding link which we will use to configure the Twilio webhook.

Update Twilio Webhook

Set the “A Message Comes In” webhook in messaging section to your ngrok public URL followed by the /recommendation endpoint.

Technologies behind these tastes were langchain, hugging face, pandas, chroma for vector storage and streamlit for user interface and steerlm Llama 70B model through NVIDIA’s Cloud Function(NVCF).

What’s Next: Enhancing and Expanding:

My vision includes integrating user feedback mechanisms, map functionalities, and personalized dietary preferences into the system. We also plan to evaluate our method with a larger dataset not limited to California, refining approach for even better accuracy.

Happy to connect on LinkedIn!

DEV Community

Personalized Food Recommendation RAG bot on WhatsApp

Top comments (0)