⚡ From NumPy to FAISS: Making ChatPDF Fast & Scalable – Part 2
In Part 1, we made it work.
In Part 2, we make it usable 🚀
📌 Recap from Part 1
In Part 1, we built a ChatPDF app using:
- PDF → Text → Chunks
- Embeddings using Ollama
- Similarity search using NumPy
- LLM to generate answers
It worked well for small PDFs and helped us understand RAG from first principles.
But once I started testing with slightly larger PDFs…
😅 The Problem Started Showing Up
The issue was not correctness — it was performance.
Let’s revisit what we were doing during search.
❌ NumPy Search (What we had before)
similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]
🧠 What’s actually happening here?
Every time you ask a question:
- Compute similarity with every chunk
- Store all similarity scores
- Sort the entire list
- Pick top K
🚨 Why this becomes a problem
- Time complexity → O(n) per query
- More chunks = slower search
- Entire dataset scanned every time
To make this visible, I added timing:
start_time = time.perf_counter()
# similarity logic
end_time = time.perf_counter()
print(f"Total time with numpy: {execution_time}")
And as the number of chunks increased…
⏳ the delay became noticeable.
💡 So What’s the Solution?
Instead of:
“Search through everything every time”
We need:
“A system that knows where to look”
🔍 Let’s Visualize the Problem (This is the key moment)
👉 This is where the real difference becomes obvious:
💡 What this diagram shows:
- NumPy → scans every single chunk
- FAISS → directly jumps to the most relevant results
This is the exact shift from:
brute force → intelligent retrieval
🚀 Introducing FAISS
FAISS (Facebook AI Similarity Search) is built for:
- Fast vector similarity search
- Efficient indexing
- Handling large datasets
The key idea:
👉 Build an index once → search efficiently many times
🔄 Step 1: Moving from Raw Vectors → FAISS Index
❌ Before (NumPy mindset)
We stored vectors like this:
vector_db = np.array(vectors, dtype=np.float32)
That’s it.
No structure. No optimization. Just raw data.
✅ After (FAISS approach)
vector_np = np.array(vectors).astype('float32')
dimension = vector_np.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(vector_np)
🧠 Let’s understand this properly
1️⃣ Converting to float32
vector_np = np.array(vectors).astype('float32')
FAISS requires vectors in float32.
Even if your embeddings are already floats, doing this ensures:
- Compatibility
- No runtime surprises
2️⃣ Getting the dimension
dimension = vector_np.shape[1]
Each embedding looks like:
[0.12, -0.45, 0.88, ...]
The number of elements = dimension
FAISS needs this to build the index correctly.
3️⃣ Creating the index
index = faiss.IndexFlatIP(dimension)
-
IndexFlatIP→ Inner Product search - Since embeddings are normalized → 👉 Inner Product ≈ Cosine Similarity
So we are essentially saying:
“Store these vectors and allow fast similarity-based search.”
4️⃣ Adding vectors to FAISS
index.add(vector_np)
This step:
- Loads all embeddings into FAISS
- Builds the internal structure
👉 From here, we stop thinking in terms of arrays and start thinking in terms of an index
🎯 Big Concept Shift
| NumPy | FAISS |
|---|---|
| Raw vectors | Indexed vectors |
| Manual search | Optimized search |
| Full scan | Smart retrieval |
🔍 Step 2: Searching with FAISS
❌ Before (NumPy)
similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]
✅ After (FAISS)
distances, indices = index.search(query_vector.reshape(1, -1), k=TOP_K)
🧠 Let’s break this down
1️⃣ Why reshape?
query_vector.reshape(1, -1)
FAISS expects:
[number_of_queries, dimension]
Even a single query must be shaped like:
[[embedding]]
2️⃣ What does search() do?
distances, indices = index.search(...)
FAISS:
- Finds nearest vectors
- Sorts internally
- Returns top K
3️⃣ Mapping results back
[text_metadata[i] for i in indices[0]]
We use indices to fetch:
- Actual text chunks
- Page numbers
💡 Why this is powerful
Instead of:
- Writing similarity logic ❌
- Writing sorting logic ❌
You now:
👉 Call one optimized function
💾 Step 3: Avoid Recomputing Everything
🚨 Problem in Part 1
Every run:
- Read PDF
- Chunk text
- Generate embeddings
- Build vectors
✅ Solution: Save the Index
faiss.write_index(index, "db/index.faiss")
with open("db/metadata.pkl", "wb") as f:
pickle.dump(data, f)
🧠 What are we saving?
- FAISS index → vector structure
- Metadata → chunk + page info
- PDF hash → detect changes
🔁 Loading instead of recomputing
index = faiss.read_index("db/index.faiss")
Now:
- ⚡ Faster startup
- ❌ No repeated embedding calls
🔐 Step 4: Detecting PDF Changes
def calculate_pdf_hash():
sha256_hash = hashlib.sha256()
🧠 Why this matters
If the PDF changes:
- Old embeddings become invalid
So we:
- Generate hash
- Compare with stored hash
- Rebuild only if needed
👉 Small addition, big impact.
🔥 Step 5: Improving Retrieval with Re-ranking
Even FAISS isn’t perfect.
So we add another layer:
results = ranker.rerank(rerank_request)
🧠 What’s happening here?
- FAISS retrieves top 10 chunks
- Re-ranker evaluates relevance
- Returns best TOP_K
📊 Debug visibility
print("--- Re-ranker Scores ---")
Helps you:
- Understand ranking
- Debug results
💬 Step 6: Streaming Responses (UX Upgrade)
for chunk in generate_answer(user_query, context_llm):
print(chunk['response'], end='', flush=True)
🧠 Why this matters
- Feels real-time
- Improves perceived speed
- Better experience
🔁 Final System (Let’s Visualize It)
👉 This is what your ChatPDF system looks like now:
🧠 What this diagram represents
- Query → converted into embedding
- FAISS → retrieves relevant chunks
- Re-ranker → improves quality
- LLM → generates final answer
👉 This is a complete RAG pipeline
🚀 What We Achieved
| Feature | Part 1 (NumPy) | Part 2 (FAISS) |
|---|---|---|
| Search | Brute force | Indexed ⚡ |
| Speed | Slow | Fast |
| Persistence | ❌ | ✅ |
| Accuracy | Basic | Improved |
| UX | Basic | Streaming |
🧠 Final Thoughts
This is where things became real.
From “I understand RAG”
to
“I can build something scalable”
If you’re learning RAG:
- Start with NumPy ✅
- Move to FAISS ✅
That transition is where the real understanding happens.
📂 Project Repo
👉 https://github.com/SharathKurup/chatPDF/tree/faiss_indexing
🔜 What’s Next?
In Part 3:
👉 We’ll build a Streamlit UI
👉 Turn this into a proper app
💬 Let’s Connect
If you're building something similar or exploring local LLMs, I’d love to hear your thoughts 👇


Top comments (0)