Everyone loves algorithms. Nobody respects memory.
That’s why most “fast” ANN systems collapse the moment real queries show up.
Speed isn’t about FLOPs.
It’s about how often you annoy the cache.
RAM Is Not Your Friend
“Touching RAM is not data access. It’s a cry for help.”
If your query path hits RAM frequently, you already lost.
Modern CPUs are absurdly fast until they have to wait.
ANN systems don’t die from computation.
They die from memory latency wearing a nice benchmark suit.
Cache Is King, Everything Else Is Just Vibes
Your goal is simple:
- Keep data small
- Keep it contiguous
- Keep it reused
If the cache isn’t doing most of the work, your CPU is just stretching its legs.
Memory Layout > Model Architecture
“You optimized the model. The layout optimized you.”
- AoS vs SoA isn’t academic.
- Pointer chasing isn’t a design choice.
- It’s self-sabotage.
Contiguous arrays win because:
- Fewer cache lines
- Predictable access
- Hardware prefetch actually works
Random access kills performance quietly.
Threads Fighting for Data Is Not Parallelism
“If your threads are fighting, the CPU already lost interest.”
False sharing is the silent assassin.
Locks aren’t your main enemy cache line contention is.
If multiple threads touch the same cache line:
- You’re not scaling
- You’re arguing in silicon
Parallelism only works when data ownership is clean.
Single-Pass > Multi-Pass (Unless You Hate Yourself)
Single-pass designs:
- Load once
- Compute everything
- Move on
Multi-pass designs:
- Reload data
- Miss cache
- Regret life choices
ANN pipelines should feel like a conveyor belt, not a boomerang.
Cache Should Be Hot, Not On Vacation
- Warm-up matters.
- Batching matters.
- Access order matters.
If your working set doesn’t fit in cache, shrink it.
If it can fit, reuse it aggressively.
Idle cache is wasted performance.
Prefetching Is Free Performance (If You Deserve It)
Sequential access lets the CPU help you.
Random jumps make it give up.
Design layouts so the CPU can guess what you’ll need next.
Yes,** CPUs are psychic**. No, you’re not using it.
Branches Are Also Memory Problems
“Branch misprediction is just cache miss with extra drama.”
Unpredictable branches:
- Break instruction flow
- Stall pipelines
- Trash performance
Branchless or predictable code keeps execution smooth and cache-friendly.
Alignment, Padding, and the Stuff Everyone Ignores
- lignment matters.
- Padding matters.
- Cache line size matters.
Misaligned structures don’t fail loudly.
They fail slowly.
Predictability Beats Peak Speed
ANN systems must be:
- Stable
- Predictable
- Boring under load
Spiky latency is worse than slightly slower averages.
Caches like consistency. So do users.
“ANN is not algorithm engineering. It’s memory diplomacy.”
If your system:
- Rarely touches RAM
- Keeps cache hot
- Avoids contention
- Moves linearly through data
Then and only then you get speed.
Everything else is just math cosplay.
Top comments (0)