Retrieval-Augmented Generation (RAG) is the backbone of modern AI applications. But most tutorials send your data to the cloud (Pinecone, OpenAI) or require heavy setups.
I wanted something Private, Free, and Fast.
So I built LocalVaultQA - a 100% local RAG system using Cosdata (a new high-performance vector DB) and Ollama. Here is how I did it, and why this stack is a game-changer.
The Stack
- Ollama: The standard for running local LLMs. I used
gemma3:12bfor generation andnomic-embed-textfor embeddings. - Cosdata: An open-source, Rust-based vector database. It's incredibly lightweight and supports Hybrid Search out of the box.
- Streamlit: For a clean, interactive UI.
Key Challenge: Authentication & Search
Cosdata is secure by default, which can be a pain for local dev. It requires an Admin Key for every request.
The Fix: I wrote a custom start-cosdata.sh script that runs inside the Docker container to auto-configure the environment.
# docker-compose.yml snippet
command: /bin/sh -c "cd /opt/cosdata && /root/cosdata-v0.1.2-beta/bin/cosdata --admin-key my_secure_password"
This bypasses the interactive prompt, making the database spin up effectively "headless" and ready for connection immediately.
The Secret Sauce: Client-Side Hybrid Search
Semantic search (Dense vectors) is great for "concepts", but fails at specific keywords (like part numbers or acronyms). Keyword search (BM25) is the opposite.
Hybrid Search combines them.
Since the Cosdata Python SDK is still in beta, I implemented a robust Reciprocal Rank Fusion (RRF) algorithm directly in the client:
def query(self, question, top_k=5, mode="hybrid"):
# 1. Get Dense Results
embedding = ollama.embeddings(model="nomic-embed-text", prompt=question)["embedding"]
dense_res = self.collection.search.dense(query_vector=embedding, top_k=top_k*2)
# 2. Get Text Results
text_res = self.collection.search.text(query_text=question, top_k=top_k*2)
# 3. Fuse with RRF
scores = {}
for rank, hit in enumerate(dense_res):
scores[hit['text']] = 1.0 / (60 + rank)
for rank, hit in enumerate(text_res):
scores[hit['text']] = scores.get(hit['text'], 0) + (1.0 / (60 + rank))
# Sort and return top_k
return sorted_results[:top_k]
This ensures we get the best of both worlds, capturing "meaning" AND "precision".
Why Use This?
- Privacy: Your financial docs, legal papers, or personal journals never leave your laptop.
- Speed: Cosdata is written in Rust. Indexing is near-instant.
- Cost: $0. No tokens, no monthly fees.
Try It Yourself
Clone the repo, run docker-compose up -d, and you have a production-grade RAG pipeline running on your Mac.
Check out the code on GitHub! 👇
https://github.com/harishkotra/LocalVaultQA
Tools Used:

Top comments (0)