Jashwanth

Posted on Feb 5

If your ANN is slow, stop blaming the math...your memory is already plotting against you.

#career #opensource #learning #discuss

Everyone loves algorithms. Nobody respects memory.
That’s why most “fast” ANN systems collapse the moment real queries show up.

Speed isn’t about FLOPs.
It’s about how often you annoy the cache.

RAM Is Not Your Friend

“Touching RAM is not data access. It’s a cry for help.”

If your query path hits RAM frequently, you already lost.
Modern CPUs are absurdly fast until they have to wait.

ANN systems don’t die from computation.
They die from memory latency wearing a nice benchmark suit.

Cache Is King, Everything Else Is Just Vibes

Your goal is simple:

Keep data small
Keep it contiguous
Keep it reused

If the cache isn’t doing most of the work, your CPU is just stretching its legs.

Memory Layout > Model Architecture

“You optimized the model. The layout optimized you.”

AoS vs SoA isn’t academic.
Pointer chasing isn’t a design choice.
It’s self-sabotage.

Contiguous arrays win because:

Fewer cache lines
Predictable access
Hardware prefetch actually works

Random access kills performance quietly.

Threads Fighting for Data Is Not Parallelism

“If your threads are fighting, the CPU already lost interest.”

False sharing is the silent assassin.
Locks aren’t your main enemy cache line contention is.

If multiple threads touch the same cache line:

You’re not scaling
You’re arguing in silicon

Parallelism only works when data ownership is clean.

Single-Pass > Multi-Pass (Unless You Hate Yourself)

Single-pass designs:

Load once
Compute everything
Move on

Multi-pass designs:

Reload data
Miss cache
Regret life choices

ANN pipelines should feel like a conveyor belt, not a boomerang.

Cache Should Be Hot, Not On Vacation

Warm-up matters.
Batching matters.
Access order matters.

If your working set doesn’t fit in cache, shrink it.
If it can fit, reuse it aggressively.

Idle cache is wasted performance.

Prefetching Is Free Performance (If You Deserve It)

Sequential access lets the CPU help you.
Random jumps make it give up.

Design layouts so the CPU can guess what you’ll need next.
Yes,** CPUs are psychic**. No, you’re not using it.

Branches Are Also Memory Problems

“Branch misprediction is just cache miss with extra drama.”

Unpredictable branches:

Break instruction flow
Stall pipelines
Trash performance

Branchless or predictable code keeps execution smooth and cache-friendly.

Alignment, Padding, and the Stuff Everyone Ignores

lignment matters.
Padding matters.
Cache line size matters.

Misaligned structures don’t fail loudly.
They fail slowly.

Predictability Beats Peak Speed

ANN systems must be:

Stable
Predictable
Boring under load

Spiky latency is worse than slightly slower averages.
Caches like consistency. So do users.

“ANN is not algorithm engineering. It’s memory diplomacy.”

If your system:

Rarely touches RAM
Keeps cache hot
Avoids contention
Moves linearly through data

Then and only then you get speed.

Everything else is just math cosplay.

DEV Community

If your ANN is slow, stop blaming the math...your memory is already plotting against you.

Top comments (0)