(A survival guide you didn’t ask for)
Building an ANN system like Faiss is not hard.
Building a fast ANN system like Faiss will make you question every life decision you’ve ever made.
If you’re thinking, “How hard can vector search be?”.... Congrats - this article is for you.
Act 1: The Innocent Beginning (Python Era)
- You start in Python. Life is good - NumPy Works
- Accuracy looks decent.
- Latency is… acceptable.
You tell yourself:
“I’ll just prototype it. Later I’ll optimize.”
Classic mistake. Rookie energy.
Act 2: “Let’s Rewrite It in C++” (Boss Music Starts)
At some point, queries feel slow.
You say the forbidden words:
"Let’s rewrite it in C++ for speed."
This is where the tutorial ends and the boss fight begins.
Suddenly:
- You’re not debugging logic
- You’re debugging existence
Segfaults.
Undefined behavior.
Memory crashes… for reasons you swear are illegal.
You fix one bug → three new ones spawn.
Act 3:Speed Bound → Memory Bound (The Plot Twist)
At first, you’re speed-bound:
- Bad loops
- Bad data layout
- Unoptimized math
You fix those.
Latency drops.
You feel powerful.
Then… nothing improves.
Welcome to the realization:
You are no longer speed-bound.
You are memory-bound.
And memory-bound is where real suffering begins.
Act 4: Milliseconds Matter (You Finally Understand Big Tech)
Seconds were easy.
Milliseconds are war.
You change one file.
Latency spikes.
QPS drops.
Cache misses explode.
Now your life is:
Change code → Build → Benchmark → Cry → Repeat
You learn:
- Cache misses cost hundreds of QPS
- Memory access > CPU speed
- “Fast code” means nothing if data is in the wrong place
You finally understand why every millisecond matters in tech.
Act 5: SIMD, AVX, OpenMP (False Hope Arc)
You go full tryhard:
- SIMD
- AVX2 / AVX-512
- OpenMP
- BLAS
Hand-tuned loops
Then reality hits again:Small batches → OpenMP overhead > benefit
Threads fight for cache
More cores ≠ more speed
Optimizations now need optimization.
Beautiful. Right..?
Act 6: Python Bindings (New Boss, Same Pain)
“Fine,” you say,
“I’ll just expose this with Python bindings.”
Welcome to pybind11 + CMake hell.
- CMake can’t find pybind
- pybind exists but CMake denies it
- Errors you didn’t know were possible
- Compiler messages that feel personally insulting
Also:
- Python memory
- C++ memory
- NumPy memory
- Recall drops
- Speed lies
At some point you realize:
NumPy math ≠ C++ speed
And yes, you briefly consider throwing your CPU out the window.
Act 7: Scalar C++ Reality Check
You try pure scalar C++.
Surprise:
Well-optimized NumPy / Cython can beat naïve C++
Congrats.
Your ego just segfaulted.
Now you:
- Learn data alignment
- Learn cache lines
- Learn prefetching
- Learn why “just C++” is not enough
Final Act: The Faiss Reality Check
After all this:
- Memory tuning
- Cache tuning
- Layout tuning
- QPS tuning
- Latency tuning
You benchmark against Faiss.
You are…
nowhere near it.
And that’s when it hits:
Faiss isn’t just algorithms.
It’s years of low-level pain, tuning, and memory mastery.
Advice From Someone Who Survived (Barely)
If you’re starting out:
Step 1: Start in Python
Build the algorithm first.
Validate accuracy.
If it’s good enough - stop here. Be happy.
Step 2: Move to C++ only if:
- You hit real memory limits
- You hit real latency ceilings
- You understand what you’re signing up for
Step 3: Optimization Hell
- SIMD
- AVX
- OpenMP (carefully)
- Cache-aware design
- Memory-first thinking
If you reach this stage…
Congrats.
This is where hating your life officially begins.
Writing an ANN engine is fun.
Writing a fast ANN engine is pain.
Writing one that competes with Faiss?
That’s not a project.
That’s a boss fight marathon.
If you’re still here - respect. 🫡
If you’re thinking of starting - I warned you.
Now excuse me while I benchmark again and cry over cache misses.
So yeah… Its already halfway.
there is an unfinished business.
The ANN is coming.
It will be open-sourced.
Not “soon™”.
Not “startup soon”.
But soon - the kind of soon where code already exists and pain is already paid for.
Top comments (0)