Build a Local AI RAG App with Ollama and Python

#ai #rag #ollama #python

I was looking to do a little development around AI and decided to see what I could build on my PC.

We tried Ollama for a local development environment. Ollama is an open-source tool that makes it easy to download, run, and manage large language models (LLMs) on your own computer.

curl -fsSL https://ollama.com/install.sh | sh

Once downloaded and installed you can run as follows

~ ollama -v         
ollama version is 0.6.5
~ ollama run mistral
>>> tell me about ollama run mistral
 "Olama Run Mistral is a high-performance, open-source machine learning platform developed by OLMA.AI, a company based in Paris, France. The platform is designed to simplify the development and deployment of large-scale machine 
learning models with a focus on deep learning.
Mistral stands out for its scalability, flexibility, and ease of use. It's built around a distributed architecture that allows users to train and deploy large-scale machine learning models efficiently across multiple GPUs and 
clusters, reducing the time required for training and inference significantly.
Some key features of Olama" # ====> you get the idea a chatbot in your terminal

There is a bunch of models you can run https://ollama.com/search we will stick to mistral for the time being. And we need to keep it running to use it.

So what are we going to try to do ? We are going to create an chat app that can query my local files directory and return answers based off the contents.

First create requirements.txt and add the following, basically some libraries for interacting with text and ollama

langchain
langchain-community
langchain-ollama
langchain-huggingface
chromadb
sentence-transformers

Setup a virtual environment and install the requirements

sudo dnf install gcc-c++ python3-devel
pip install langchain chromadb sentence-transformers ollama
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

I create two files a main.py and utils/loaders.py

utils/loaders.py

from langchain_community.document_loaders import DirectoryLoader, TextLoader

import os


def load_sop_files(directory: str):
    allowed_exts = ('.md', '.asciidoc', '.txt')

    docs = []
    for root, _, files in os.walk(directory):
        for file in files:
            if file.lower().endswith(allowed_exts):
                path = os.path.join(root, file)
                try:
                    loader = TextLoader(path, encoding='utf-8')
                    docs.extend(loader.load())
                except Exception as e:
                    print(f"❌ Error loading {path}: {e}")
    return docs

utils/loaders.py function takes in a directory path and goes through all the files and finds type md,asciidoc and txt and breaks them up into an array utf-8 formatted langchain docs.

main.py

First thing I do in main.py is add the libraries and then call the function in utils/loaders.py to load my docs into the application.

from utils.loaders import load_sop_files 
from langchain.text_splitter import 
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_ollama import OllamaLLM
from langchain_huggingface import HuggingFaceEmbeddings




# Load and prepare documents
print("📂 Loading SOP documents...")
# pointing to local directory that is the same level as this project
docs = load_sop_files("../help/sops/")

LLM's need data split up into smaller sizes

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(docs)

We then need to convert the document data into a numeric format that can be handled by the LLM


print("🧠 Creating vector database...")
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(chunks, embeddings)

We set up Ollama mistral to take the numeric db data and use to
to formulate its answers, We are not training a modal here but
using the RAG(Retrieval-Augmented Generation) pattern

retriever = db.as_retriever()
llm = OllamaLLM(model="mistral")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)

NOTE : We are using the OllamaLLM and RetrievalQA to connect to our local LLM API but we could use the Ollama API here with the /api/generate end point, but this would be more verbose.

We then have a loop here to answer queries using the qa we created earlier as a source with the invoke function


print("🤖 SOP Assistant ready. Type your question below. Type 'exit' to quit.")



while True:
   query = input("\n📝 You: ")
   if query.lower() in ("exit", "quit"):
       print("👋 Bye! Take care.")
       break


   result = qa.invoke({"query": query})


   print("\n🤖 Assistant:\n", result["result"])


   print("\n📎 Sources:")
   for doc in result["source_documents"]:
       print(f" - {doc.metadata.get('source')}")