In this post, I’ll explain how I built a chatbot using the Llama2 model to query Excel data intelligently.
What We’re Building
- Loads an Excel file.
- Splits the data into manageable chunks.
- Stores the data in a vector database for fast retrieval.
- Use a local Llama2 model to answer questions based on the content of the Excel file.
Prerequisites:
Python (≥ 3.8)
Libraries: langchain, pandas, unstructured, Chroma
Step 1: Install Dependencies
%pip install -q unstructured langchain
%pip install -q "unstructured[all-docs]"
Step 2: Load the Excel File
import pandas as pd
excel_path = "Book2.xlsx"
if excel_path:
df = pd.read_excel(excel_path)
data = df.to_string(index=False)
else:
print("Upload an Excel file")
Step 3: Chunk the Data and Store in a Vector Database
Large text data is split into smaller, overlapping chunks for effective embedding and querying. These chunks are stored in a Chroma vector database.
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_text(data)
embedding_model = OllamaEmbeddings(model="nomic-embed-text", show_progress=False)
vector_db = Chroma.from_texts(
texts=chunks,
embedding=embedding_model,
collection_name="local-rag"
)
Step 4: Initialize the Llama2 Model
We use ChatOllama to load the Llama2 model locally.
from langchain_community.chat_models import ChatOllama
local_model = "llama2"
llm = ChatOllama(model=local_model)
Step 5: Create a Query Prompt
The chatbot will respond based on specific column names from the Excel file. We create a prompt template to guide the model
from langchain.prompts import PromptTemplate
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI assistant. Answer the user's questions based on the column names:
Id, order_id, name, sales, refund, and status. Original question: {question}"""
)
Step 6: Set Up the Retriever
We configure a retriever to fetch relevant chunks from the vector database, which will be used by the Llama2 model to answer questions.
from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(
vector_db.as_retriever(),
llm,
prompt=QUERY_PROMPT
)
Step 7: Build the Response Chain
The response chain integrates:
- A retriever to fetch context.
- A prompt to format the question and context.
- The Llama2 model to generate answers.
- An output parser to format the response.
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Step 8: Ask a Question
Now we’re ready to ask a question! Here’s how we invoke the chain to get a response:
raw_result = chain.invoke("How many rows are there?")
final_result = f"{raw_result}\n\nIf you have more questions, feel free to ask!"
print(final_result)
Sample Output
When I ran the above code on a sample Excel file, here’s what I got:
Based on the provided context, there are 10 rows in the table.
If you have more questions, feel free to ask!
Conclusion:
This approach leverages the power of embeddings and the Llama2 model to create a smart, interactive chatbot for Excel data. With some tweaks, you can extend this to work with other types of documents or integrate it into a full-fledged app!
Check working example with UI on my LinkedIn:
Introducing BChat Excel: A Conversational AI-Powered Tool for Excel File Interactions
Top comments (0)