DEV Community

马国锦
马国锦

Posted on

The $0 RAG Stack: Build a Production Retrieval System Without Paying a Cent

You do not need Pinecone, OpenAI embeddings, or a $200/month vector database to build a solid RAG system.


The Free Stack

Component Free Option
Embedding Model BAAI/bge-large-zh-v1.5
Vector Database FAISS / Chroma
LLM DeepSeek / Ollama
Reranker BAAI/bge-reranker-base
BM25 rank-bm25 (pip)

Zero API costs. Zero monthly fees.


Embedding Without OpenAI

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
embeddings = model.encode(["Your text"])
Enter fullscreen mode Exit fullscreen mode

Free, MIT-licensed, beats ada-002 on Chinese benchmarks.


Vector Search Without Pinecone

import faiss
index = faiss.IndexHNSWFlat(1024, 32)
index.add(embeddings.astype(np.float32))
D, I = index.search(query_embedding, k=10)
Enter fullscreen mode Exit fullscreen mode

FAISS for speed. Chroma for ease. Both free.


LLM Without API Keys

Option A: Ollama local — ollama pull qwen2.5:7b
Option B: DeepSeek API free tier — 500 requests/day


Reranking for Free

from sentence_transformers import CrossEncoder
reranker = CrossEncoder("BAAI/bge-reranker-base")
pairs = [[query, doc] for doc in candidates]
scores = reranker.predict(pairs)
Enter fullscreen mode Exit fullscreen mode

One step that often improves precision more than weeks of prompt tuning.


The Full Pipeline

class FreeRAG:
    def __init__(self):
        self.embedder = SentenceTransformer("BAAI/bge-large-zh-v1.5")
        self.index = faiss.IndexHNSWFlat(1024, 32)
        self.reranker = CrossEncoder("BAAI/bge-reranker-base")

    def search(self, query, k=10):
        q_emb = self.embedder.encode([query])
        _, ids = self.index.search(q_emb.astype(np.float32), 100)
        pairs = [[query, self.docs[i]] for i in ids[0][:30]]
        scores = self.reranker.predict(pairs)
        return [self.docs[ids[0][i]] for i in sorted(range(len(scores)), key=lambda j: scores[j], reverse=True)[:k]]
Enter fullscreen mode Exit fullscreen mode

Vector search + cross-encoder rerank. Same pipeline that costs $200/month on managed services.


When to Graduate to Paid

Trigger Move To
> 1M vectors Milvus
Need GPU inference TEI
Team access Weaviate Cloud

For the first 90% of your journey, free tools are enough.


☕ Support This Content

If my articles saved you money on SaaS bills, scan the QR code below to buy me a coffee.

Buy me a coffee

Follow @mgj for weekly practical AI engineering content.

Top comments (0)