This is Part 2 of the AWS vector database series.
👉 Missed Part 1? Start here: Embeddings, Dimensions, and Similarity Search
In Part 1, we covered the fundamentals of embeddings and how similarity is measured. Now we move into how retrieval actually works in practice.
In this part, we’ll look at search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking strategies — the building blocks of effective RAG systems.
Vector Search Types
| Approach | How it works | When to use | AWS services |
|---|---|---|---|
| KNN — Exact Nearest Neighbor Search | Check every single item, compare it to your query, return the best matches. Perfectly accurate, but slow. | Small datasets (under 100K vectors) or situations where you absolutely cannot afford to miss a result — like medical diagnostics or legal compliance checks. | All vector services support KNN as a fallback, but it's not practical at scale. |
| ANN — Approximate Nearest Neighbor Search | Uses a smart index structure (graph or cluster) to find very likely nearest neighbors without checking everything | Almost everything in production. If you're building a RAG chatbot, semantic search, or recommendation engine, this is your default. | OpenSearch Serverless, Aurora pgvector, MemoryDB, ElastiCache Valkey, DocumentDB, S3 Vectors, Neptune Analytics. |
ANN Index Structures
To avoid checking every vector, ANN uses smart indexing. The two most common types on AWS are:
| Index | Simple idea | Trade-off |
|---|---|---|
| HNSW | Connects similar vectors like a network and “walks” through it to find matches | Uses more memory and takes longer to build, but gives faster and more accurate results. Default in most AWS services. |
| IVFFlat | Groups vectors into clusters and only searches the closest groups | Faster to build and uses less memory, but needs tuning and may miss some results |
Intuitive way to think about it
HNSW — like navigating a city with highways
- Start with highways to get close
- Then use local roads to find the exact place
HNSW does the same:
- Moves from broad → detailed search
- Finds results quickly and accurately
IVFFlat — like searching in neighborhoods
- First pick a few likely neighborhoods
- Then search inside them
IVFFlat works similarly:
- Reduces search space
- But can miss results if the right cluster isn’t picked
Which one should you use?
- Go with HNSW → best performance and accuracy (default choice)
- Use IVFFlat → faster to build, lower memory, but slightly less accurate
Hybrid Search
Hybrid search runs two searches at the same time—one that understands meaning (vector search) and one that looks for exact words (keyword search)—and then combines the results.
For example, a user might search: “lambda timeout issue nodejs.”
- The Vector Search understands the intent (performance/debugging)
- The Keyword Search ensures exact terms like lambda and nodejs are matched.
Note: The scoring method used to combine these two result sets is called Reciprocal Rank Fusion (RRF). It doesn’t simply add scores—it prioritizes documents that rank highly in both searches. For example, if a document ranks #1 in keyword search and #2 in vector search, RRF will push it to the top of the final results.
This is especially useful for enterprise RAG. Users rarely search with purely natural language or purely exact keywords—they usually mix both.
| Service | Implementation |
|---|---|
| OpenSearch Serverless | Supports Native (RRF). The most robust option; its "Neural Search" feature handles the hybrid merging automatically. |
| Aurora pgvector | This is sql based and best for relational data; you manually combine tsvector (keywords) and vector (meaning) in one query. |
Metadata Filtering
Metadata filtering narrows down results using structured data like date, category, or user ID—before or after the vector search runs.
Think of it like this: a vector search finds books similar to Harry Potter. But you only want books published after 2010 and available in English. Metadata filtering ensures you don’t waste time on the wrong results.
Pre-filtering vs Post-filtering
| Approach | How it works | Trade-offs |
|---|---|---|
| Pre-filtering | Applies filters first, then runs vector search on the remaining data | Accurate and secure, but can be slower depending on the engine |
| Post-filtering | Runs vector search first, then filters the results | Fast, but may return zero results if none match the filters |
Note: S3 Vectors applies metadata filters during the vector search itself, combining the accuracy of pre-filtering with the performance of post-filtering.
Chunking
Chunking is simply breaking a long document into smaller, meaningful pieces before creating embeddings. If your chunks are too small, you lose context. If they’re too big, the important meaning gets buried in noise. The goal is to find the right balance.
Common Chunking Strategies
| Strategy | How it works | Chunk size | Best for |
|---|---|---|---|
| Fixed-size | Split every N tokens/characters with optional overlap | 256–512 tokens | Simple content like logs or short descriptions |
| Recursive | Split by paragraphs → sentences → words while preserving structure | Variable | General-purpose text (default in most tools) |
| Semantic | Use an embedding model to split based on topic boundaries | Variable | Long-form content like whitepapers or legal docs |
| Document-structure | Split using headings, sections, or document layout | Variable | Structured docs like manuals, HTML, or code |
| Sentence-window | Store sentences, return surrounding context at query time | 1 sentence (store) / window (return) | High-precision Q&A |
Bedrock Chunking Options
| Bedrock option | What it does | Equivalent concept |
|---|---|---|
| Default | ~300-token chunks that respect sentence boundaries | Recursive (baseline) |
| Fixed-size | You control chunk size and overlap | Fixed-size |
| Hierarchical | Searches small chunks but returns larger context | Sentence-window |
| Semantic | Splits based on topic boundaries | Semantic |
| None | No splitting — entire file treated as one chunk | Document-structure (manual) |
👉 Continue reading: In Part 3, we’ll compare AWS vector database options and build a practical decision framework to help you choose the right one.
Top comments (0)