Sabarish Sathasivan for AWS Community Builders

Posted on Mar 30 • Edited on Apr 9

AWS Vector Databases – Part 3 : Choosing the Right Vector Database on AWS

#aws #vectordatabase #rag #genai

This is where everything comes together.

👉 In case you missed it:

Part 1 → Embeddings, Dimensions, and Similarity Search
Part 2 → Search Patterns, Filtering, and Chunking

By now, you understand the fundamentals and how retrieval works. The real question is: which AWS service should you actually use?

There's no single "best" option. Each service was designed for a different primary workload and inherited vector search as a capability. That origin story shapes everything — its strengths, limitations, and the cost you'll pay.

Let's break them down.

The Services

OpenSearch Serverless

What it is: A distributed search engine with native vector, keyword, and hybrid search.

Why choose it: OpenSearch is the only AWS service that handles full-text search and vector search natively in one engine. Its Neural Search feature automates the entire hybrid pipeline — you send a query, and it runs keyword + semantic search, then merges results using normalization and combination techniques. No manual score merging required.

Key details:

Supports HNSW engine
Distance metrics: Euclidean, Cosine, Inner Product (Dot Product)
Hybrid search with Neural Search pipeline (keyword + vector, merged automatically)
GPU-accelerated vector indexing (launched Dec 2025) for faster large-scale ingestion
Bedrock Knowledge Bases: Yes (most commonly used vector store for Bedrock KB)

The catch — cost:
OpenSearch Serverless bills by OCU-hours (OpenSearch Compute Units). The minimum is 2 OCUs for production (1 indexing + 1 search, each with HA redundancy) — roughly $350/month before you store a single vector. A dev/test mode drops this to ~$174/month with 0.5 OCUs each. Vector collections also require their own dedicated OCUs — they can't share with search/time-series collections.

For small projects or prototypes, this minimum cost is the biggest friction point. But at scale, the automatic scaling and mature feature set make it the go-to for production RAG.

Scaling: Automatic. OCUs scale up based on workload and scale back down. You set a maximum OCU limit to cap costs.

Aurora PostgreSQL (pgvector)

What it is: A relational database with the open-source pgvector extension for vector search.

Why choose it: If your application already runs on PostgreSQL, pgvector lets you add vector search without introducing a new service. Your vectors live alongside your relational data — same transactions, same SQL, same backups. This is powerful when your queries combine traditional filters (WHERE category = 'shoes' AND price < 100) with vector similarity.

Key details:

pgvector 0.8.0 (April 2025) brought major improvements: up to 9x faster filtered queries with iterative index scans, and significantly better recall on filtered searches
Supports HNSW and IVFFlat indexing
Distance metrics: Euclidean, Cosine, Inner Product
Hybrid search: Manual — combine tsvector (keyword) and pgvector (semantic) in a single SQL query
Bedrock Knowledge Bases: Yes

The catch — you own the tuning:
pgvector gives you control, but that means you're responsible for index parameter tuning (ef_construction, m, ef_search), choosing between relaxed_order and strict_order for iterative scans, and managing the trade-off between recall and latency. It's not "plug and play" like OpenSearch Neural Search.

Scaling: Aurora Serverless v2 scales compute in fine-grained ACU increments. Read replicas handle query scale-out. I/O-Optimized configuration helps with cost predictability for vector workloads.

Amazon S3 Vectors

What it is: The first cloud object store with native vector support. Purpose-built for storing and querying vectors at massive scale and minimal cost.

Why choose it: When cost is the primary concern and you don't need millisecond latencies. S3 Vectors can reduce the total cost of storing and querying vectors by up to 90% compared to traditional vector databases.

Key details:

Up to 2 billion vectors per index, up to 10,000 indexes per vector bucket
Distance metrics: Cosine and Euclidean only (Inner Product not supported)
Metadata filtering applied during the vector search itself (not purely pre- or post-filter)
Fully serverless — no infrastructure to provision or manage
Does not support hybrid search (semantic search only)
Bedrock Knowledge Bases: Yes

The catch — latency and throughput:
S3 Vectors is designed for infrequent-to-moderate query patterns. Infrequent queries return in under 1 second; more frequent queries get down to ~100ms. Write throughput caps at ~2,500 vectors/second per index, and query throughput is in the hundreds of requests/second per index. This is not the right choice for real-time, high-QPS applications.

Cost example:
For 250K vectors across 40 indexes with 1M queries/month: approximately $11/month. Compare that to OpenSearch Serverless's $350/month minimum.

Scaling: Fully elastic. No capacity planning required. Costs scale linearly with storage and queries.

Amazon MemoryDB

What it is: A Redis-compatible, durable, in-memory database with vector search.

Why choose it: When you need single-digit millisecond vector search latency with strong durability. MemoryDB keeps both the vectors and the HNSW index in memory, which is why it's the fastest vector search option on AWS — supporting tens of thousands of queries/second at >99% recall.

Key details:

Supports FLAT (exact KNN) and HNSW indexing
Distance metrics: Euclidean, Cosine, Inner Product
Single-digit millisecond query and update latency
Multi-AZ durability (unlike typical in-memory caches)
Uses FT.SEARCH and FT.AGGREGATE commands for vector queries
Bedrock Knowledge Bases: No

The catch — single shard and RAM cost:
Vector search is limited to a single shard — no horizontal scaling for vectors. You can scale vertically (bigger instances) and add read replicas, but your total vector dataset must fit in the memory of one node. For a 10M vector dataset with 1024 dimensions, you might need a db.r7g.4xlarge (~105 GB usable memory). RAM is expensive.

Best for: Real-time RAG where freshness matters (index updates propagate in milliseconds), fraud detection, and real-time recommendation engines where every millisecond counts.

Scaling: Vertical (bigger nodes) + read replicas for query throughput. No horizontal shard scaling for vector workloads.

Amazon ElastiCache (Valkey)

What it is: A managed Valkey (open-source Redis fork) service with vector search, optimized for caching and ephemeral workloads.

Why choose it: Valkey is purpose-built for semantic caching and agent memory. If you're building agentic AI systems and need to cache LLM responses, store conversational memory, or implement fast vector lookups in the hot path of every request — this is the service.

Key details:

Supports HNSW and FLAT indexing
Distance metrics: Euclidean, Cosine, Inner Product
Microsecond-level latency for cached data
Integrates with LangGraph and mem0 for agent memory layers
Compatible with Amazon Bedrock AgentCore Runtime
Horizontal scaling supported — adding shards gives linear improvement in ingestion and recall
Bedrock Knowledge Bases: No

How it differs from MemoryDB:

ElastiCache Valkey supports multi-shard horizontal scaling for vectors (MemoryDB is single-shard only)
MemoryDB provides Multi-AZ durability (writes acknowledged only after replication); Valkey is designed more as a cache layer — it's durable but not to the same degree
Valkey includes mature cache primitives (TTLs, eviction policies, atomic operations) that make it natural for caching use cases

Best for: Semantic caching to reduce LLM costs, short-term and long-term agent memory via mem0/LangGraph, and any use case where vectors are in the hot path of a latency-sensitive request.

Scaling: Vertical, horizontal (multi-shard), and replica-based. Most flexible scaling model among the in-memory options.

Amazon Neptune Analytics

What it is: A graph analytics engine that also supports vector search, designed to combine graph traversals with vector similarity.

Why choose it: When your data has explicit relationships and you want to combine graph-based reasoning with semantic search. Neptune Analytics powers GraphRAG in Bedrock Knowledge Bases — it automatically extracts entities, facts, and relationships from your documents and stores them as a graph, then combines vector search with graph traversal for more comprehensive, cross-document answers.

Key details:

Stores embeddings directly on graph nodes
Combines vector similarity search with graph algorithms (PageRank, shortest path, etc.)
Supports openCypher query language
GraphRAG integration with Bedrock Knowledge Bases — auto-builds knowledge graphs from your documents
Bedrock Knowledge Bases: Yes (GraphRAG)

The catch:

Pricing is based on memory-optimized compute units (m-NCU), billed per hour
Autoscaling is not supported — you choose your graph capacity upfront
You can pause graphs when not in use (pay 10% of compute cost while paused)
Best suited for analytical / batch workloads rather than high-QPS online serving

Best for: Knowledge graphs for compliance/regulatory data, entity-relationship analysis combined with semantic search, and use cases where understanding connections between documents matters more than raw search speed.

Scaling: Provisioned (memory-optimized). Choose capacity at creation. No autoscaling.

Amazon DocumentDB

What it is: A MongoDB-compatible document database with vector search support.

Why choose it: If you're already on DocumentDB (or have a MongoDB-based application) and want to add vector search without a new service. Similar logic to Aurora pgvector — keep vectors alongside your document data.

Key details:

Available on DocumentDB 5.0+ instance-based clusters
Supports HNSW and IVFFlat indexing
Distance metrics: Euclidean, Cosine, Dot Product
Up to 2,000 dimensions with an index (16,000 without index)
Bedrock Knowledge Bases: No (not a supported Bedrock KB vector store)

The catch:

Instance-based scaling (no serverless option for vector workloads)
I/O costs can add up significantly
Smaller vector ecosystem and less community tooling compared to pgvector

Best for: Existing DocumentDB/MongoDB workloads that need to add semantic search alongside existing JSON document queries.

Scaling: Instance-based. Vertical scaling + read replicas.

Amazon Kendra (GenAI Index)

What it is: A fully managed enterprise search service. Not a vector database — it's an end-to-end search solution.

Why choose it: When you need enterprise search with built-in connectors to 43+ data sources (SharePoint, Salesforce, Google Drive, Confluence, etc.) and don't want to build a RAG pipeline from scratch. The GenAI Index uses hybrid search (vector + keyword) with pre-optimized parameters — no tuning required.

Key details:

43+ built-in data source connectors with metadata and permission filtering
Hybrid index combining vector and keyword search, pre-optimized
Integrates with both Bedrock Knowledge Bases and Amazon Q Business
A single Kendra GenAI Index can serve multiple Q Business apps and Bedrock KBs
Bedrock Knowledge Bases: Yes

The catch — pricing:
Kendra is expensive for what it offers. The GenAI Enterprise Edition base index starts at $0.32/hour (~$230/month) for up to 20,000 documents. The Basic Enterprise Edition is $1.40/hour (~$1,008/month). This is enterprise pricing — it makes sense when you're connecting to many data sources and need managed connectors, permissions, and relevance tuning out of the box.

Best for: Enterprise search across dozens of data sources where you need managed connectors, user-level access control, and a fully managed experience. Not for custom RAG pipelines where you want control over chunking, embeddings, and retrieval logic.

Scaling: Fully managed. Add storage units and query units as needed.

Decision Matrix

Scenario	Recommended	Why	Alternative
General-purpose RAG	OpenSearch Serverless	Native hybrid search, most mature Bedrock KB integration	Aurora pgvector
Already on PostgreSQL	Aurora pgvector	Add vectors without a new service, SQL + vector in one query	OpenSearch Serverless
Cost-sensitive / massive scale	S3 Vectors	90% cheaper, 2B vectors/index, fully serverless	OpenSearch Serverless
Ultra-low latency (real-time)	MemoryDB	Single-digit ms queries, >99% recall, durable	ElastiCache Valkey
Semantic caching / reduce LLM cost	ElastiCache Valkey	Cache primitives + vector search, microsecond latency	MemoryDB
Agent memory (short-term + long-term)	ElastiCache Valkey	LangGraph/mem0 integration, horizontal scaling	MemoryDB
Knowledge graph + vectors	Neptune Analytics	GraphRAG, entity-relationship reasoning	OpenSearch Serverless
Enterprise search (managed)	Kendra GenAI Index	43+ connectors, permissions, zero tuning	Bedrock KB + S3 Vectors
Already on MongoDB/DocumentDB	DocumentDB	Add vectors alongside existing JSON data	Aurora pgvector

Cost Comparison

Service	Pricing Model	Minimum Monthly Cost	Best Cost Scenario
S3 Vectors	Storage + PUT + queries	~$11 (250K vectors, 1M queries)	Cheapest at any scale
Aurora pgvector	Instance hours + storage + I/O	~$50+ (Serverless v2 min)	Cheap if DB already exists
OpenSearch Serverless	OCU-hours + S3 storage	~$174 (dev/test), ~$350 (prod)	Good at scale, painful for prototypes
DocumentDB	Instance hours + I/O	~$100+	Reasonable if already on DocumentDB
MemoryDB	Node hours (in-memory)	~$200+ (r7g.large)	Expensive — RAM is the bottleneck
ElastiCache Valkey	Node hours (in-memory)	~$100+ (r7g.large)	Similar to MemoryDB, scales better
Neptune Analytics	m-NCU hours	Varies by graph size	Pausable (10% cost when idle)
Kendra GenAI Index	Hourly base + units	~$230 (GenAI), ~$1,008 (Enterprise)	Enterprise pricing

When NOT to Use a Vector Database

Before building anything, ask yourself:

Small dataset (<10K items)? → Use in-memory search (FAISS, numpy) with no infrastructure
Exact match queries only? → Use a traditional database or search index
Structured filtering only? → A regular database with indexes will be faster and cheaper
Static FAQ or lookup table? → Don't overcomplicate it. A simple key-value store works
Real-time transactional workload? → Use a relational or NoSQL database; vector search is a read-optimized pattern
Near-zero budget? → Use FAISS locally or with S3 for persistence

That's a wrap on the series. If you're building with RAG or semantic search on AWS, you now have both the conceptual foundation and practical guidance to choose the right architecture.

👉 Missed the earlier parts?