Sharath Kurup

Posted on Apr 14 • Edited on Apr 21

Understanding RAG by Building a ChatPDF App: From NumPy to FAISS (Part 2)

#ai #python #tutorial #faiss

⚡ From NumPy to FAISS: Making ChatPDF Fast & Scalable

In Part 1, we made it work.
In Part 2, we make it usable 🚀

📌 Recap from Part 1

In Part 1, we built a ChatPDF app using:

PDF → Text → Chunks
Embeddings using Ollama
Similarity search using NumPy
LLM to generate answers

It worked well for small PDFs and helped us understand RAG from first principles.

But once I started testing with slightly larger PDFs…

😅 The Problem Started Showing Up

The issue was not correctness — it was performance.

Let’s revisit what we were doing during search.

❌ NumPy Search (What we had before)

similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]

🧠 What’s actually happening here?

Every time you ask a question:

Compute similarity with every chunk
Store all similarity scores
Sort the entire list
Pick top K

🚨 Why this becomes a problem

Time complexity → O(n) per query
More chunks = slower search
Entire dataset scanned every time

To make this visible, I added timing:

start_time = time.perf_counter()
# similarity logic
end_time = time.perf_counter()
print(f"Total time with numpy: {execution_time}")

And as the number of chunks increased…
⏳ the delay became noticeable.

💡 So What’s the Solution?

Instead of:

“Search through everything every time”

We need:

“A system that knows where to look”

🔍 Let’s Visualize the Problem (This is the key moment)

👉 This is where the real difference becomes obvious:

💡 What this diagram shows:

NumPy → scans every single chunk
FAISS → directly jumps to the most relevant results

This is the exact shift from:

brute force → intelligent retrieval

🚀 Introducing FAISS

FAISS (Facebook AI Similarity Search) is built for:

Fast vector similarity search
Efficient indexing
Handling large datasets

The key idea:

👉 Build an index once → search efficiently many times

🔄 Step 1: Moving from Raw Vectors → FAISS Index

❌ Before (NumPy mindset)

We stored vectors like this:

vector_db = np.array(vectors, dtype=np.float32)

That’s it.

No structure. No optimization. Just raw data.

✅ After (FAISS approach)

vector_np = np.array(vectors).astype('float32')

dimension = vector_np.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(vector_np)

🧠 Let’s understand this properly

1️⃣ Converting to float32

vector_np = np.array(vectors).astype('float32')

FAISS requires vectors in float32.

Even if your embeddings are already floats, doing this ensures:

Compatibility
No runtime surprises

2️⃣ Getting the dimension

dimension = vector_np.shape[1]

Each embedding looks like:

[0.12, -0.45, 0.88, ...]

The number of elements = dimension

FAISS needs this to build the index correctly.

3️⃣ Creating the index

index = faiss.IndexFlatIP(dimension)

IndexFlatIP → Inner Product search
Since embeddings are normalized → 👉 Inner Product ≈ Cosine Similarity

So we are essentially saying:

“Store these vectors and allow fast similarity-based search.”

4️⃣ Adding vectors to FAISS

index.add(vector_np)

This step:

Loads all embeddings into FAISS
Builds the internal structure

👉 From here, we stop thinking in terms of arrays and start thinking in terms of an index

🎯 Big Concept Shift

NumPy	FAISS
Raw vectors	Indexed vectors
Manual search	Optimized search
Full scan	Smart retrieval

🔍 Step 2: Searching with FAISS

❌ Before (NumPy)

similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]

✅ After (FAISS)

distances, indices = index.search(query_vector.reshape(1, -1), k=TOP_K)

🧠 Let’s break this down

1️⃣ Why reshape?

query_vector.reshape(1, -1)

FAISS expects:

[number_of_queries, dimension]

Even a single query must be shaped like:

[[embedding]]

2️⃣ What does `search()` do?

distances, indices = index.search(...)

FAISS:

Finds nearest vectors
Sorts internally
Returns top K

3️⃣ Mapping results back

[text_metadata[i] for i in indices[0]]

We use indices to fetch:

Actual text chunks
Page numbers

💡 Why this is powerful

Instead of:

Writing similarity logic ❌
Writing sorting logic ❌

You now:

👉 Call one optimized function

💾 Step 3: Avoid Recomputing Everything

🚨 Problem in Part 1

Every run:

Read PDF
Chunk text
Generate embeddings
Build vectors

✅ Solution: Save the Index

faiss.write_index(index, "db/index.faiss")

with open("db/metadata.pkl", "wb") as f:
    pickle.dump(data, f)

🧠 What are we saving?

FAISS index → vector structure
Metadata → chunk + page info
PDF hash → detect changes

🔁 Loading instead of recomputing

index = faiss.read_index("db/index.faiss")

Now:

⚡ Faster startup
❌ No repeated embedding calls

🔐 Step 4: Detecting PDF Changes

def calculate_pdf_hash():
    sha256_hash = hashlib.sha256()

🧠 Why this matters

If the PDF changes:

Old embeddings become invalid

So we:

Generate hash
Compare with stored hash
Rebuild only if needed

👉 Small addition, big impact.

🔥 Step 5: Improving Retrieval with Re-ranking

Even FAISS isn’t perfect.

So we add another layer:

results = ranker.rerank(rerank_request)

🧠 What’s happening here?

FAISS retrieves top 10 chunks
Re-ranker evaluates relevance
Returns best TOP_K

📊 Debug visibility

print("--- Re-ranker Scores ---")

Helps you:

Understand ranking
Debug results

💬 Step 6: Streaming Responses (UX Upgrade)

for chunk in generate_answer(user_query, context_llm):
    print(chunk['response'], end='', flush=True)

🧠 Why this matters

Feels real-time
Improves perceived speed
Better experience

🔁 Final System (Let’s Visualize It)

👉 This is what your ChatPDF system looks like now:

🧠 What this diagram represents

Query → converted into embedding
FAISS → retrieves relevant chunks
Re-ranker → improves quality
LLM → generates final answer

👉 This is a complete RAG pipeline

🚀 What We Achieved

Feature	Part 1 (NumPy)	Part 2 (FAISS)
Search	Brute force	Indexed ⚡
Speed	Slow	Fast
Persistence	❌	✅
Accuracy	Basic	Improved
UX	Basic	Streaming

🧠 Final Thoughts

This is where things became real.

From “I understand RAG”
to
“I can build something scalable”

If you’re learning RAG:

Start with NumPy ✅
Move to FAISS ✅

That transition is where the real understanding happens.

📂 Project Repo

👉 https://github.com/SharathKurup/chatPDF/tree/faiss_indexing

🔜 What’s Next?

In Part 3:

👉 We’ll build a Streamlit UI
👉 Turn this into a proper app

💬 Let’s Connect

If you're building something similar or exploring local LLMs, I’d love to hear your thoughts 👇

⚡ From NumPy to FAISS: Making ChatPDF Fast & Scalable

📌 Recap from Part 1

😅 The Problem Started Showing Up

❌ NumPy Search (What we had before)

🧠 What’s actually happening here?

🚨 Why this becomes a problem

💡 So What’s the Solution?

🔍 Let’s Visualize the Problem (This is the key moment)

🚀 Introducing FAISS

🔄 Step 1: Moving from Raw Vectors → FAISS Index

❌ Before (NumPy mindset)

✅ After (FAISS approach)

🧠 Let’s understand this properly

1️⃣ Converting to float32

2️⃣ Getting the dimension

3️⃣ Creating the index

4️⃣ Adding vectors to FAISS

🎯 Big Concept Shift

🔍 Step 2: Searching with FAISS

❌ Before (NumPy)

✅ After (FAISS)

🧠 Let’s break this down

1️⃣ Why reshape?

2️⃣ What does search() do?

3️⃣ Mapping results back

💡 Why this is powerful

💾 Step 3: Avoid Recomputing Everything

🚨 Problem in Part 1

✅ Solution: Save the Index

🧠 What are we saving?

🔁 Loading instead of recomputing

🔐 Step 4: Detecting PDF Changes

🧠 Why this matters

🔥 Step 5: Improving Retrieval with Re-ranking

🧠 What’s happening here?

📊 Debug visibility

💬 Step 6: Streaming Responses (UX Upgrade)

🧠 Why this matters

🔁 Final System (Let’s Visualize It)

🧠 What this diagram represents

🚀 What We Achieved

🧠 Final Thoughts

📂 Project Repo

🔜 What’s Next?

💬 Let’s Connect

2️⃣ What does `search()` do?