Vector Database Selection Is Not a Performance Decision

#ai #rag #architecture #database

Everyone is benchmarking the wrong thing.

The conversations I keep seeing in enterprise AI architecture circles treat vector database selection as a performance optimization problem. Which database has the best recall at k=10? Which has the lowest query latency at a million vectors? Which scales most efficiently to a billion records?

These are real questions. They are also mostly irrelevant to the actual decision most enterprises need to make.

Here is the uncomfortable truth about vector database selection for enterprise RAG deployments: at the scale of most enterprise knowledge bases — tens of millions of vectors, not billions — every serious vector database performs adequately. The performance differences between Pinecone, Weaviate, Qdrant, Milvus, and pgvector at 10 million vectors are not going to be the factor that determines whether your enterprise AI deployment succeeds.

The factors that determine success are almost entirely about operational fit, security architecture, and deployment model. Not benchmark scores.

The Questions Nobody Puts in the Benchmark

When a team benchmarks vector databases, they typically measure: queries per second, recall at k, indexing throughput, and latency percentiles. These metrics tell you how the system performs under ideal conditions with clean data and standard query patterns.

They don't tell you:

How does the system handle multi-tenant access control, where user A should not be able to retrieve vectors that user B's documents contributed to? This is the most common enterprise requirement and the most common gap in vector database capabilities.

How does the system behave when the embedding index and the document metadata are out of sync — when documents have been updated or deleted but the vector index hasn't been updated yet? In production environments with active document corpora, this state is the norm, not the exception.

What does the operational maintenance burden look like? Index compaction, garbage collection for deleted vectors, backup and restore procedures, version upgrades — these operational costs don't show up in benchmarks but accumulate over years of production operation.

How does the system integrate with your existing identity provider and permission model? An enterprise that runs everything through Okta or Azure AD needs a vector database that can enforce access controls consistent with those policies, not a separate permission model that must be manually kept in sync.

What is the vendor's posture on data residency and subprocessor chains? For a managed vector database service, your indexed embeddings — which are derived from your proprietary documents — live on the vendor's infrastructure. The data handling implications are distinct from the inference API question but no less significant.

The Access Control Problem Is Harder Than It Looks

I want to spend a moment on multi-tenant access control because it is consistently the vector database failure that enterprise architects discover too late.

The naive implementation of enterprise RAG — index everything, retrieve based on semantic similarity, filter by access control after retrieval — has a fundamental problem: the retrieval step returns results without respect to permissions, and the post-retrieval filtering can inadvertently expose that restricted content exists.

If user A runs a query that retrieves a chunk from a restricted document before the permission filter removes it, the chunk was transmitted to the application layer. The filter removes it from the response, but the existence of the document was confirmed by the retrieval. In some enterprise contexts, this is a compliance issue even if the content never reaches the user.

The correct architecture is pre-retrieval access control: the vector database query itself is scoped to vectors that the requesting user is authorized to access, so restricted content never enters the retrieval pipeline. This requires the vector database to support attribute filtering at query time — the ability to filter by metadata fields including access control attributes before computing similarity.

Not all vector databases implement this efficiently. The ones that don't create a fundamental architectural problem for multi-tenant enterprise deployments that no amount of application-layer filtering can cleanly resolve.

Self-Hosted versus Managed: The Decision That Matters More Than Which Database

The most consequential vector database decision most enterprises will make is not which database to use. It is whether to run it themselves or use a managed service.

Managed vector database services offer operational simplicity: no infrastructure to manage, automatic scaling, vendor-handled upgrades and maintenance. The trade-off is that your indexed embeddings — derived from your proprietary documents — exist on the vendor's infrastructure.

This is not a hypothetical concern. Embeddings are not the raw text they represent, but they are semantically rich representations of that text. Membership inference attacks on embedding spaces are an active research area. The risk is not equivalent to storing the original documents externally, but it is not zero.

For enterprises that have made the architectural decision to keep their AI inference self-hosted specifically to avoid proprietary data leaving their infrastructure, running a managed external vector database is an inconsistency in that security posture. The inference is self-hosted but the retrieval layer sends embedding queries to an external service.

A self-hosted vector database — Weaviate, Qdrant, or pgvector running on your own infrastructure — closes this gap. It adds operational overhead. For enterprises where the data sovereignty argument is the primary driver of the self-hosted decision, it is the architecturally consistent choice.

What the Selection Decision Should Actually Look Like

Start with three questions in order.

First: what are your access control requirements? If you need document-level permissions enforced at the retrieval layer for multi-tenant data, eliminate any option that doesn't support attribute filtering at query time with acceptable performance.

Second: self-hosted or managed? If your data governance requirements or security architecture mandate self-hosted, eliminate managed services regardless of their other merits. If managed is acceptable, the operational simplicity benefit is real and worth weighting.

Third: what does your operational team look like? A self-hosted vector database requires someone who can maintain it. If your team has the capacity, the operational overhead is manageable. If it doesn't, a managed service may be the pragmatic choice even with its data handling trade-offs.

Performance benchmarks belong at the end of this process, as a tiebreaker between options that have passed the first three filters — not at the beginning, as the primary selection criterion.

The fastest vector database that can't enforce your access control requirements is not a viable enterprise option. The one that can, and that fits your operational and governance constraints, is the right answer regardless of where it lands on a benchmark leaderboard.

DEV Community

Vector Database Selection Is Not a Performance Decision

Top comments (0)