DEV Community

Jashwanth Thatipamula
Jashwanth Thatipamula

Posted on

If your ANN is slow, stop blaming the math...your memory is already plotting against you.

Everyone loves algorithms. Nobody respects memory.
That’s why most “fast” ANN systems collapse the moment real queries show up.

Speed isn’t about FLOPs.
It’s about how often you annoy the cache.


RAM Is Not Your Friend

“Touching RAM is not data access. It’s a cry for help.”

If your query path hits RAM frequently, you already lost.
Modern CPUs are absurdly fast until they have to wait.

ANN systems don’t die from computation.
They die from memory latency wearing a nice benchmark suit.


Cache Is King, Everything Else Is Just Vibes

Your goal is simple:

  • Keep data small
  • Keep it contiguous
  • Keep it reused

If the cache isn’t doing most of the work, your CPU is just stretching its legs.


Memory Layout > Model Architecture

You optimized the model. The layout optimized you.

  • AoS vs SoA isn’t academic.
  • Pointer chasing isn’t a design choice.
  • It’s self-sabotage.

Contiguous arrays win because:

  • Fewer cache lines
  • Predictable access
  • Hardware prefetch actually works

Random access kills performance quietly.


Threads Fighting for Data Is Not Parallelism

“If your threads are fighting, the CPU already lost interest.”

False sharing is the silent assassin.
Locks aren’t your main enemy cache line contention is.

If multiple threads touch the same cache line:

  • You’re not scaling
  • You’re arguing in silicon

Parallelism only works when data ownership is clean.


Single-Pass > Multi-Pass (Unless You Hate Yourself)

Single-pass designs:

  • Load once
  • Compute everything
  • Move on

Multi-pass designs:

  • Reload data
  • Miss cache
  • Regret life choices

ANN pipelines should feel like a conveyor belt, not a boomerang.


Cache Should Be Hot, Not On Vacation

  • Warm-up matters.
  • Batching matters.
  • Access order matters.

If your working set doesn’t fit in cache, shrink it.
If it can fit, reuse it aggressively.

Idle cache is wasted performance.


Prefetching Is Free Performance (If You Deserve It)

Sequential access lets the CPU help you.
Random jumps make it give up.

Design layouts so the CPU can guess what you’ll need next.
Yes,** CPUs are psychic**. No, you’re not using it.


Branches Are Also Memory Problems

“Branch misprediction is just cache miss with extra drama.”

Unpredictable branches:

  • Break instruction flow
  • Stall pipelines
  • Trash performance

Branchless or predictable code keeps execution smooth and cache-friendly.


Alignment, Padding, and the Stuff Everyone Ignores

  • lignment matters.
  • Padding matters.
  • Cache line size matters.

Misaligned structures don’t fail loudly.
They fail slowly.


Predictability Beats Peak Speed

ANN systems must be:

  • Stable
  • Predictable
  • Boring under load

Spiky latency is worse than slightly slower averages.
Caches like consistency. So do users.


“ANN is not algorithm engineering. It’s memory diplomacy.”

If your system:

  • Rarely touches RAM
  • Keeps cache hot
  • Avoids contention
  • Moves linearly through data

Then and only then you get speed.

Everything else is just math cosplay.

Top comments (0)