DEV Community

Jashwanth Thatipamula
Jashwanth Thatipamula

Posted on

Making an ANN Like Faiss Is Not Everyone’s Cup of Tea

(A survival guide you didn’t ask for)

Building an ANN system like Faiss is not hard.
Building a fast ANN system like Faiss will make you question every life decision you’ve ever made.

If you’re thinking, “How hard can vector search be?”.... Congrats - this article is for you.


Act 1: The Innocent Beginning (Python Era)

  • You start in Python. Life is good - NumPy Works
  • Accuracy looks decent.
  • Latency is… acceptable.

You tell yourself:

I’ll just prototype it. Later I’ll optimize.

Classic mistake. Rookie energy.


Act 2: “Let’s Rewrite It in C++” (Boss Music Starts)

At some point, queries feel slow.
You say the forbidden words:

"Let’s rewrite it in C++ for speed."
This is where the tutorial ends and the boss fight begins.

Suddenly:

  • You’re not debugging logic
  • You’re debugging existence

Segfaults.
Undefined behavior.
Memory crashes… for reasons you swear are illegal.

You fix one bug → three new ones spawn.


Act 3:Speed Bound → Memory Bound (The Plot Twist)
At first, you’re speed-bound:

  • Bad loops
  • Bad data layout
  • Unoptimized math

You fix those.
Latency drops.
You feel powerful.

Then… nothing improves.

Welcome to the realization:

You are no longer speed-bound.
You are memory-bound.

And memory-bound is where real suffering begins.


Act 4: Milliseconds Matter (You Finally Understand Big Tech)

Seconds were easy.
Milliseconds are war.

You change one file.
Latency spikes.
QPS drops.
Cache misses explode.

Now your life is:
Change codeBuildBenchmarkCryRepeat

You learn:

  • Cache misses cost hundreds of QPS
  • Memory access > CPU speed
  • Fast code” means nothing if data is in the wrong place

You finally understand why every millisecond matters in tech.


Act 5: SIMD, AVX, OpenMP (False Hope Arc)
You go full tryhard:

  • SIMD
  • AVX2 / AVX-512
  • OpenMP
  • BLAS
  • Hand-tuned loops
    Then reality hits again:

  • Small batches → OpenMP overhead > benefit

  • Threads fight for cache

  • More cores ≠ more speed
    Optimizations now need optimization.

Beautiful. Right..?


Act 6: Python Bindings (New Boss, Same Pain)
“Fine,” you say,
“I’ll just expose this with Python bindings.”

Welcome to pybind11 + CMake hell.

  • CMake can’t find pybind
  • pybind exists but CMake denies it
  • Errors you didn’t know were possible
  • Compiler messages that feel personally insulting

Also:

  • Python memory
  • C++ memory
  • NumPy memory
  • Recall drops
  • Speed lies

At some point you realize:

NumPy math ≠ C++ speed
And yes, you briefly consider throwing your CPU out the window.


Act 7: Scalar C++ Reality Check

You try pure scalar C++.

Surprise:

Well-optimized NumPy / Cython can beat naïve C++

Congrats.
Your ego just segfaulted.

Now you:

  • Learn data alignment
  • Learn cache lines
  • Learn prefetching
  • Learn why “just C++” is not enough

Final Act: The Faiss Reality Check
After all this:

  • Memory tuning
  • Cache tuning
  • Layout tuning
  • QPS tuning
  • Latency tuning

You benchmark against Faiss.

You are…
nowhere near it.

And that’s when it hits:

Faiss isn’t just algorithms.
It’s years of low-level pain, tuning, and memory mastery.


Advice From Someone Who Survived (Barely)

If you’re starting out:
Step 1: Start in Python

Build the algorithm first.
Validate accuracy.
If it’s good enough - stop here. Be happy.

Step 2: Move to C++ only if:

  • You hit real memory limits
  • You hit real latency ceilings
  • You understand what you’re signing up for

Step 3: Optimization Hell

  • SIMD
  • AVX
  • OpenMP (carefully)
  • Cache-aware design
  • Memory-first thinking

If you reach this stage…

Congrats.
This is where hating your life officially begins.


Writing an ANN engine is fun.
Writing a fast ANN engine is pain.
Writing one that competes with Faiss?

That’s not a project.
That’s a boss fight marathon.

If you’re still here - respect. 🫡
If you’re thinking of starting - I warned you.

Now excuse me while I benchmark again and cry over cache misses.

So yeah… Its already halfway.
there is an unfinished business.
The ANN is coming.
It will be open-sourced.
Not “soon™”.
Not “startup soon”.
But soon - the kind of soon where code already exists and pain is already paid for.

Top comments (0)