I built a managed pgvector service and here's what I learned about vector search performance

#postgres #vectordatabase #ai #webdev

I Ran pgvector on NVMe vs Cloud SSD. The Difference Shocked Me.
2,000 queries per second at under 4ms. That's what I'm getting on a $35/month server. Let me tell you how I got there and what I had to build to make it work.

The problem started with a side project

I was building a RAG pipeline. Standard stuff: OpenAI embeddings, PostgreSQL, pgvector extension, HNSW index. Everything worked fine in development. Then I moved it to a managed database on a regular cloud provider and watched my query latency go from 4ms to 47ms under any real load.

I spent two days thinking my index was wrong. Wrong ef_search value. Wrong m parameter. Wrong dimension count. None of that was it.
The problem was the disk.

HNSW does not behave like a normal database query
Most database queries are sequential reads. Your disk reads a chunk of data in order, hands it back, done. SSDs are fast at this. Even gp3 cloud SSDs are fast at this.

HNSW is different. An HNSW index traversal is essentially a graph walk. You start at an entry point, compare distances, jump to neighbors, compare again, jump again. Each jump is a random read to a different location on disk. The more vectors you have, the more jumps, the more random reads.

Cloud SSDs are slow at random reads. NVMe drives are very fast at random reads. That gap, which barely matters for most database workloads, is everything for pgvector at scale.
I didn't believe it until I benchmarked it myself.

The actual numbers

I tested 1 million vectors at 1536 dimensions (OpenAI text-embedding-3-small), HNSW index, cosine distance, ef_search=40, 16 concurrent clients.

On a cloud SSD backed instance: around 410 QPS, p95 latency at 18ms.
On NVMe: 2,150 QPS, p95 at 2.8ms.
Same PostgreSQL version. Same pgvector version. Same query. Same index parameters. Five times the throughput, six times lower latency. Just from the storage layer.

That's when I decided to build Rivestack.

Why PostgreSQL and not a dedicated vector database

Honest answer: because I didn't want to manage two databases.
The moment you move your vectors to Pinecone or Qdrant or Weaviate, you have split your data across two systems. Your relational data lives in Postgres. Your vectors live somewhere else. Every query that needs both involves a round trip between systems. That latency adds up fast.

With pgvector you write one query. You can filter by user_id, join against your documents table, and do a vector similarity search in a single SQL statement. That's not a minor convenience. It changes how you architect the whole application.
The only real argument against pgvector is scale. If you have a billion vectors, you need a dedicated system. For the other 99% of applications, pgvector on NVMe is genuinely competitive.

What I actually built

Rivestack is a managed PostgreSQL service where pgvector is the whole point, not an afterthought. NVMe storage on every plan. HNSW pre-configured with sensible defaults. Daily backups, point-in-time recovery, HA failover if you want it.

The thing I kept running into with other managed Postgres providers is that pgvector is just... there. An extension you can enable. Nobody has tuned the storage layer for it. Nobody has written documentation for RAG use cases. Nobody has thought about what happens to your index when you have 5 million vectors and 50 concurrent clients.

That's the gap I'm filling.

The honest trade-offs

Rivestack is not for everyone. If you need built-in auth, storage buckets, or a real-time WebSocket layer, use Supabase. They're great at that. If you have hundreds of millions of vectors and need automatic sharding, use Pinecone. They're built for that.
Rivestack is for teams who need pgvector to actually perform under load and don't want to spend three days tuning PostgreSQL configuration to get there.

What I'd do differently

I spent too long on the benchmark tooling before I had a single user. Classic founder mistake. The infrastructure is solid but I should have shipped earlier and iterated based on real workloads instead of synthetic ones.

Also I underestimated how much developers care about EU data residency. It's come up in every conversation I've had since launch. If you're building anything GDPR-adjacent, knowing your vectors never leave EU territory is not a minor detail.

Try it if you're building with embeddings
Free shared tier at rivestack.io, no credit card required. Paid plans start at $35/month for a dedicated node with 2 vCPU, 4GB RAM and 55GB NVMe.
If you're already running pgvector somewhere, migration is just pg_dump and pg_restore. Takes about five minutes for most databases.

What are you using for vector storage right now? And what's the biggest pain point you've hit with it? Genuinely curious whether the storage layer issue I ran into is common or whether I just had unusually bad luck with my cloud provider.

DEV Community

I built a managed pgvector service and here's what I learned about vector search performance

Top comments (0)