You do not need Pinecone, OpenAI embeddings, or a $200/month vector database to build a solid RAG system.
The Free Stack
| Component | Free Option |
|---|---|
| Embedding Model | BAAI/bge-large-zh-v1.5 |
| Vector Database | FAISS / Chroma |
| LLM | DeepSeek / Ollama |
| Reranker | BAAI/bge-reranker-base |
| BM25 | rank-bm25 (pip) |
Zero API costs. Zero monthly fees.
Embedding Without OpenAI
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
embeddings = model.encode(["Your text"])
Free, MIT-licensed, beats ada-002 on Chinese benchmarks.
Vector Search Without Pinecone
import faiss
index = faiss.IndexHNSWFlat(1024, 32)
index.add(embeddings.astype(np.float32))
D, I = index.search(query_embedding, k=10)
FAISS for speed. Chroma for ease. Both free.
LLM Without API Keys
Option A: Ollama local — ollama pull qwen2.5:7b
Option B: DeepSeek API free tier — 500 requests/day
Reranking for Free
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("BAAI/bge-reranker-base")
pairs = [[query, doc] for doc in candidates]
scores = reranker.predict(pairs)
One step that often improves precision more than weeks of prompt tuning.
The Full Pipeline
class FreeRAG:
def __init__(self):
self.embedder = SentenceTransformer("BAAI/bge-large-zh-v1.5")
self.index = faiss.IndexHNSWFlat(1024, 32)
self.reranker = CrossEncoder("BAAI/bge-reranker-base")
def search(self, query, k=10):
q_emb = self.embedder.encode([query])
_, ids = self.index.search(q_emb.astype(np.float32), 100)
pairs = [[query, self.docs[i]] for i in ids[0][:30]]
scores = self.reranker.predict(pairs)
return [self.docs[ids[0][i]] for i in sorted(range(len(scores)), key=lambda j: scores[j], reverse=True)[:k]]
Vector search + cross-encoder rerank. Same pipeline that costs $200/month on managed services.
When to Graduate to Paid
| Trigger | Move To |
|---|---|
| > 1M vectors | Milvus |
| Need GPU inference | TEI |
| Team access | Weaviate Cloud |
For the first 90% of your journey, free tools are enough.
☕ Support This Content
If my articles saved you money on SaaS bills, scan the QR code below to buy me a coffee.
Follow @mgj for weekly practical AI engineering content.

Top comments (0)