Database Partitioning Lessons That Apply Directly to Embedding Storage

#database #architecture

I’ve worked on large partitioned PostgreSQL systems (tens of TBs, heavy ingestion, high query fan-out).

One thing that’s been interesting while diving into AI infrastructure:
Embedding storage systems repeat many of the same distributed storage lessons we already learned in relational systems.
The difference? The failure modes are just harder to see.
Let’s break down a few parallels.

1. Hot Partitions Kill Performance (Even in Vector Systems)

In large time-series databases, recent partitions get hammered:

Most writes go to the newest partition
Most reads target recent data
Vacuum and index maintenance concentrate there

You get a write hotspot and a read hotspot at the same time.

Embedding systems have similar patterns:

Frequently queried namespaces
Recently ingested documents
Popular tenants
Trending content

If all of that lands in a single logical index (or shard), your ANN structure becomes the hotspot. Even HNSW doesn’t save you here. Why?
Because ANN search still depends on memory locality and graph traversal efficiency.

When a single shard handles most traffic, Cache pressure and Graph traversal depth increases with increased memory fragmentation while CPU/GPU utilisations stays uneven. Partitioning strategy often matters more than the ANN algorithm choice.

2. Partition Pruning > Index Cleverness

In large relational systems, performance gains don’t come from better indexes alone. They come from avoiding touching irrelevant data entirely.
Partition pruning reduces the working set before the query planner even considers index scans.
In embedding systems, the equivalent pattern is:

Tenant-level isolation
Time-based segmentation
Metadata filtering before vector search

If you’re not aggressively reducing the candidate set before running similarity search, you’re just brute-forcing with extra math.
For example:
Instead of:

Search entire 100M vector space

You do:

Filter tenant + content_type + time range
Then run ANN on the reduced subset

Cost per query will be reduced and latency will be improved.
Just classic database engineering.

3. Too Many Partitions Is a Problem

Partitioning is powerful (with right partitions count)
In large Postgres systems, too many partitions cause:

Increased planning time
Metadata bloat
Autovacuum lag
More file handles

In embedding systems:

Too many small indexes fragment memory
Background rebuild jobs multiply
GPU memory utilization becomes uneven

Sharding everything blindly is not architecture. It’s postponing hard decisions.

The sweet spot usually involves:

Logical grouping
Monitoring slow queries
Periodic rebalancing

Exactly like relational systems.

4. Re-indexing Cost Is Underestimated

In traditional databases, index rebuilds are expensive and operationally sensitive.
In embedding systems, the cost is even more dangerous.
Because model evolution forces:

Full re-embeddings
Bulk writes
Index rebuilds
Graph regeneration

If you don’t design storage with model versioning in mind, you’ll eventually hit:

Dual-index storage spikes
Downtime windows
Cost explosions

A practical pattern:

Store embeddings with model_version column
Maintain versioned indexes
Gradually phase traffic
Garbage collect old versions

Treat model upgrades like schema migrations.

5. Memory Locality Still Wins

ANN performance depends heavily on memory layout.

Fragmented shards → worse locality → more cache misses → higher tail latency.

Same principle as B-tree depth, BRIN locality. Even in “AI systems,” performance still collapses into memory access patterns.

The fundamentals haven’t changed.

6. At Scale, Embedding Systems Become Storage Systems

I initially thought embedding systems are purely ML-driven. That’s certainly true for a POC stage.

Once you go to production, the worry shifts to handling big data efficiently. So like relational systems, there's tuning partition boundaries, managing index lifecycles, optimizing candidate pruning and managing cost per query

You’re building distributed storage systems. Vector math is just one layer.