This is where everything comes together.
👉 In case you missed it:
- Part 1 → Embeddings, Dimensions, and Similarity Search
- Part 2 → Search Patterns, Filtering, and Chunking
By now, you understand the fundamentals and how retrieval works. The real question is: which AWS service should you actually use?
There's no single "best" option. Each service was designed for a different primary workload and inherited vector search as a capability. That origin story shapes everything — its strengths, limitations, and the cost you'll pay.
Let's break them down.
The Services
OpenSearch Serverless
What it is: A distributed search engine with native vector, keyword, and hybrid search.
Why choose it: OpenSearch is the only AWS service that handles full-text search and vector search natively in one engine. Its Neural Search feature automates the entire hybrid pipeline — you send a query, and it runs keyword + semantic search, then merges results using normalization and combination techniques. No manual score merging required.
Key details:
- Supports HNSW engine
- Distance metrics: Euclidean, Cosine, Inner Product (Dot Product)
- Hybrid search with Neural Search pipeline (keyword + vector, merged automatically)
- GPU-accelerated vector indexing (launched Dec 2025) for faster large-scale ingestion
- Bedrock Knowledge Bases: Yes (most commonly used vector store for Bedrock KB)
The catch — cost:
OpenSearch Serverless bills by OCU-hours (OpenSearch Compute Units). The minimum is 2 OCUs for production (1 indexing + 1 search, each with HA redundancy) — roughly $350/month before you store a single vector. A dev/test mode drops this to ~$174/month with 0.5 OCUs each. Vector collections also require their own dedicated OCUs — they can't share with search/time-series collections.
For small projects or prototypes, this minimum cost is the biggest friction point. But at scale, the automatic scaling and mature feature set make it the go-to for production RAG.
Scaling: Automatic. OCUs scale up based on workload and scale back down. You set a maximum OCU limit to cap costs.
Aurora PostgreSQL (pgvector)
What it is: A relational database with the open-source pgvector extension for vector search.
Why choose it: If your application already runs on PostgreSQL, pgvector lets you add vector search without introducing a new service. Your vectors live alongside your relational data — same transactions, same SQL, same backups. This is powerful when your queries combine traditional filters (WHERE category = 'shoes' AND price < 100) with vector similarity.
Key details:
- pgvector 0.8.0 (April 2025) brought major improvements: up to 9x faster filtered queries with iterative index scans, and significantly better recall on filtered searches
- Supports HNSW and IVFFlat indexing
- Distance metrics: Euclidean, Cosine, Inner Product
- Hybrid search: Manual — combine
tsvector(keyword) and pgvector (semantic) in a single SQL query - Bedrock Knowledge Bases: Yes
The catch — you own the tuning:
pgvector gives you control, but that means you're responsible for index parameter tuning (ef_construction, m, ef_search), choosing between relaxed_order and strict_order for iterative scans, and managing the trade-off between recall and latency. It's not "plug and play" like OpenSearch Neural Search.
Scaling: Aurora Serverless v2 scales compute in fine-grained ACU increments. Read replicas handle query scale-out. I/O-Optimized configuration helps with cost predictability for vector workloads.
Amazon S3 Vectors
What it is: The first cloud object store with native vector support. Purpose-built for storing and querying vectors at massive scale and minimal cost.
Why choose it: When cost is the primary concern and you don't need millisecond latencies. S3 Vectors can reduce the total cost of storing and querying vectors by up to 90% compared to traditional vector databases.
Key details:
- Up to 2 billion vectors per index, up to 10,000 indexes per vector bucket
- Distance metrics: Cosine and Euclidean only (Inner Product not supported)
- Metadata filtering applied during the vector search itself (not purely pre- or post-filter)
- Fully serverless — no infrastructure to provision or manage
- Does not support hybrid search (semantic search only)
- Bedrock Knowledge Bases: Yes
The catch — latency and throughput:
S3 Vectors is designed for infrequent-to-moderate query patterns. Infrequent queries return in under 1 second; more frequent queries get down to ~100ms. Write throughput caps at ~2,500 vectors/second per index, and query throughput is in the hundreds of requests/second per index. This is not the right choice for real-time, high-QPS applications.
Cost example:
For 250K vectors across 40 indexes with 1M queries/month: approximately $11/month. Compare that to OpenSearch Serverless's $350/month minimum.
Scaling: Fully elastic. No capacity planning required. Costs scale linearly with storage and queries.
Amazon MemoryDB
What it is: A Redis-compatible, durable, in-memory database with vector search.
Why choose it: When you need single-digit millisecond vector search latency with strong durability. MemoryDB keeps both the vectors and the HNSW index in memory, which is why it's the fastest vector search option on AWS — supporting tens of thousands of queries/second at >99% recall.
Key details:
- Supports FLAT (exact KNN) and HNSW indexing
- Distance metrics: Euclidean, Cosine, Inner Product
- Single-digit millisecond query and update latency
- Multi-AZ durability (unlike typical in-memory caches)
- Uses
FT.SEARCHandFT.AGGREGATEcommands for vector queries - Bedrock Knowledge Bases: No
The catch — single shard and RAM cost:
Vector search is limited to a single shard — no horizontal scaling for vectors. You can scale vertically (bigger instances) and add read replicas, but your total vector dataset must fit in the memory of one node. For a 10M vector dataset with 1024 dimensions, you might need a db.r7g.4xlarge (~105 GB usable memory). RAM is expensive.
Best for: Real-time RAG where freshness matters (index updates propagate in milliseconds), fraud detection, and real-time recommendation engines where every millisecond counts.
Scaling: Vertical (bigger nodes) + read replicas for query throughput. No horizontal shard scaling for vector workloads.
Amazon ElastiCache (Valkey)
What it is: A managed Valkey (open-source Redis fork) service with vector search, optimized for caching and ephemeral workloads.
Why choose it: Valkey is purpose-built for semantic caching and agent memory. If you're building agentic AI systems and need to cache LLM responses, store conversational memory, or implement fast vector lookups in the hot path of every request — this is the service.
Key details:
- Supports HNSW and FLAT indexing
- Distance metrics: Euclidean, Cosine, Inner Product
- Microsecond-level latency for cached data
- Integrates with LangGraph and mem0 for agent memory layers
- Compatible with Amazon Bedrock AgentCore Runtime
- Horizontal scaling supported — adding shards gives linear improvement in ingestion and recall
- Bedrock Knowledge Bases: No
How it differs from MemoryDB:
- ElastiCache Valkey supports multi-shard horizontal scaling for vectors (MemoryDB is single-shard only)
- MemoryDB provides Multi-AZ durability (writes acknowledged only after replication); Valkey is designed more as a cache layer — it's durable but not to the same degree
- Valkey includes mature cache primitives (TTLs, eviction policies, atomic operations) that make it natural for caching use cases
Best for: Semantic caching to reduce LLM costs, short-term and long-term agent memory via mem0/LangGraph, and any use case where vectors are in the hot path of a latency-sensitive request.
Scaling: Vertical, horizontal (multi-shard), and replica-based. Most flexible scaling model among the in-memory options.
Amazon Neptune Analytics
What it is: A graph analytics engine that also supports vector search, designed to combine graph traversals with vector similarity.
Why choose it: When your data has explicit relationships and you want to combine graph-based reasoning with semantic search. Neptune Analytics powers GraphRAG in Bedrock Knowledge Bases — it automatically extracts entities, facts, and relationships from your documents and stores them as a graph, then combines vector search with graph traversal for more comprehensive, cross-document answers.
Key details:
- Stores embeddings directly on graph nodes
- Combines vector similarity search with graph algorithms (PageRank, shortest path, etc.)
- Supports openCypher query language
- GraphRAG integration with Bedrock Knowledge Bases — auto-builds knowledge graphs from your documents
- Bedrock Knowledge Bases: Yes (GraphRAG)
The catch:
- Pricing is based on memory-optimized compute units (m-NCU), billed per hour
- Autoscaling is not supported — you choose your graph capacity upfront
- You can pause graphs when not in use (pay 10% of compute cost while paused)
- Best suited for analytical / batch workloads rather than high-QPS online serving
Best for: Knowledge graphs for compliance/regulatory data, entity-relationship analysis combined with semantic search, and use cases where understanding connections between documents matters more than raw search speed.
Scaling: Provisioned (memory-optimized). Choose capacity at creation. No autoscaling.
Amazon DocumentDB
What it is: A MongoDB-compatible document database with vector search support.
Why choose it: If you're already on DocumentDB (or have a MongoDB-based application) and want to add vector search without a new service. Similar logic to Aurora pgvector — keep vectors alongside your document data.
Key details:
- Available on DocumentDB 5.0+ instance-based clusters
- Supports HNSW and IVFFlat indexing
- Distance metrics: Euclidean, Cosine, Dot Product
- Up to 2,000 dimensions with an index (16,000 without index)
- Bedrock Knowledge Bases: No (not a supported Bedrock KB vector store)
The catch:
- Instance-based scaling (no serverless option for vector workloads)
- I/O costs can add up significantly
- Smaller vector ecosystem and less community tooling compared to pgvector
Best for: Existing DocumentDB/MongoDB workloads that need to add semantic search alongside existing JSON document queries.
Scaling: Instance-based. Vertical scaling + read replicas.
Amazon Kendra (GenAI Index)
What it is: A fully managed enterprise search service. Not a vector database — it's an end-to-end search solution.
Why choose it: When you need enterprise search with built-in connectors to 43+ data sources (SharePoint, Salesforce, Google Drive, Confluence, etc.) and don't want to build a RAG pipeline from scratch. The GenAI Index uses hybrid search (vector + keyword) with pre-optimized parameters — no tuning required.
Key details:
- 43+ built-in data source connectors with metadata and permission filtering
- Hybrid index combining vector and keyword search, pre-optimized
- Integrates with both Bedrock Knowledge Bases and Amazon Q Business
- A single Kendra GenAI Index can serve multiple Q Business apps and Bedrock KBs
- Bedrock Knowledge Bases: Yes
The catch — pricing:
Kendra is expensive for what it offers. The GenAI Enterprise Edition base index starts at $0.32/hour (~$230/month) for up to 20,000 documents. The Basic Enterprise Edition is $1.40/hour (~$1,008/month). This is enterprise pricing — it makes sense when you're connecting to many data sources and need managed connectors, permissions, and relevance tuning out of the box.
Best for: Enterprise search across dozens of data sources where you need managed connectors, user-level access control, and a fully managed experience. Not for custom RAG pipelines where you want control over chunking, embeddings, and retrieval logic.
Scaling: Fully managed. Add storage units and query units as needed.
Decision Matrix
| Scenario | Recommended | Why | Alternative |
|---|---|---|---|
| General-purpose RAG | OpenSearch Serverless | Native hybrid search, most mature Bedrock KB integration | Aurora pgvector |
| Already on PostgreSQL | Aurora pgvector | Add vectors without a new service, SQL + vector in one query | OpenSearch Serverless |
| Cost-sensitive / massive scale | S3 Vectors | 90% cheaper, 2B vectors/index, fully serverless | OpenSearch Serverless |
| Ultra-low latency (real-time) | MemoryDB | Single-digit ms queries, >99% recall, durable | ElastiCache Valkey |
| Semantic caching / reduce LLM cost | ElastiCache Valkey | Cache primitives + vector search, microsecond latency | MemoryDB |
| Agent memory (short-term + long-term) | ElastiCache Valkey | LangGraph/mem0 integration, horizontal scaling | MemoryDB |
| Knowledge graph + vectors | Neptune Analytics | GraphRAG, entity-relationship reasoning | OpenSearch Serverless |
| Enterprise search (managed) | Kendra GenAI Index | 43+ connectors, permissions, zero tuning | Bedrock KB + S3 Vectors |
| Already on MongoDB/DocumentDB | DocumentDB | Add vectors alongside existing JSON data | Aurora pgvector |
Cost Comparison
| Service | Pricing Model | Minimum Monthly Cost | Best Cost Scenario |
|---|---|---|---|
| S3 Vectors | Storage + PUT + queries | ~$11 (250K vectors, 1M queries) | Cheapest at any scale |
| Aurora pgvector | Instance hours + storage + I/O | ~$50+ (Serverless v2 min) | Cheap if DB already exists |
| OpenSearch Serverless | OCU-hours + S3 storage | ~$174 (dev/test), ~$350 (prod) | Good at scale, painful for prototypes |
| DocumentDB | Instance hours + I/O | ~$100+ | Reasonable if already on DocumentDB |
| MemoryDB | Node hours (in-memory) | ~$200+ (r7g.large) | Expensive — RAM is the bottleneck |
| ElastiCache Valkey | Node hours (in-memory) | ~$100+ (r7g.large) | Similar to MemoryDB, scales better |
| Neptune Analytics | m-NCU hours | Varies by graph size | Pausable (10% cost when idle) |
| Kendra GenAI Index | Hourly base + units | ~$230 (GenAI), ~$1,008 (Enterprise) | Enterprise pricing |
When NOT to Use a Vector Database
Before building anything, ask yourself:
- Small dataset (<10K items)? → Use in-memory search (FAISS, numpy) with no infrastructure
- Exact match queries only? → Use a traditional database or search index
- Structured filtering only? → A regular database with indexes will be faster and cheaper
- Static FAQ or lookup table? → Don't overcomplicate it. A simple key-value store works
- Real-time transactional workload? → Use a relational or NoSQL database; vector search is a read-optimized pattern
- Near-zero budget? → Use FAISS locally or with S3 for persistence
That's a wrap on the series. If you're building with RAG or semantic search on AWS, you now have both the conceptual foundation and practical guidance to choose the right architecture.
👉 Missed the earlier parts?
Top comments (0)