Sabarish Sathasivan for AWS Community Builders

Posted on Mar 30 • Edited on Jul 11

AWS Vector Databases – Part 2: Search, Filtering, and Chunking

#aws #vectordatabase #rag #ai

This is Part 2 of the AWS vector database series.

👉 Missed Part 1? Start here: Embeddings, Dimensions, and Similarity Search

In Part 1, we covered the fundamentals of embeddings and how similarity is measured. Now we move into how retrieval actually works in practice.

In this part, we’ll look at search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking strategies — the building blocks of effective RAG systems.

Vector Search Types

Approach	How it works	When to use	AWS services
KNN — Exact Nearest Neighbor Search	Check every single item, compare it to your query, return the best matches. Perfectly accurate, but slow.	Small datasets (under 100K vectors) or situations where you absolutely cannot afford to miss a result — like medical diagnostics or legal compliance checks.	All vector services support KNN as a fallback, but it's not practical at scale.
ANN — Approximate Nearest Neighbor Search	Uses a smart index structure (graph or cluster) to find very likely nearest neighbors without checking everything	Almost everything in production. If you're building a RAG chatbot, semantic search, or recommendation engine, this is your default.	OpenSearch Serverless, Aurora pgvector, MemoryDB, ElastiCache Valkey, DocumentDB, S3 Vectors, Neptune Analytics.

ANN Index Structures

To avoid checking every vector, ANN uses smart indexing. The two most common types on AWS are:

Index	Simple idea	Trade-off
HNSW	Connects similar vectors like a network and “walks” through it to find matches	Uses more memory and takes longer to build, but gives faster and more accurate results. Default in most AWS services.
IVFFlat	Groups vectors into clusters and only searches the closest groups	Faster to build and uses less memory, but needs tuning and may miss some results

Intuitive way to think about it

HNSW — like navigating a city with highways

Start with highways to get close
Then use local roads to find the exact place

HNSW does the same:

Moves from broad → detailed search
Finds results quickly and accurately

IVFFlat — like searching in neighborhoods

First pick a few likely neighborhoods
Then search inside them

IVFFlat works similarly:

Reduces search space
But can miss results if the right cluster isn’t picked

Which one should you use?

Go with HNSW → best performance and accuracy (default choice)
Use IVFFlat → faster to build, lower memory, but slightly less accurate

Hybrid Search

Hybrid search runs two searches at the same time—one that understands meaning (vector search) and one that looks for exact words (keyword search)—and then combines the results.

For example, a user might search: “lambda timeout issue nodejs.”

The Vector Search understands the intent (performance/debugging)
The Keyword Search ensures exact terms like lambda and nodejs are matched.

Note: The scoring method used to combine these two result sets is called Reciprocal Rank Fusion (RRF). It doesn’t simply add scores—it prioritizes documents that rank highly in both searches. For example, if a document ranks #1 in keyword search and #2 in vector search, RRF will push it to the top of the final results.

This is especially useful for enterprise RAG. Users rarely search with purely natural language or purely exact keywords—they usually mix both.

Service	Implementation
OpenSearch Serverless	Supports Native (RRF). The most robust option; its "Neural Search" feature handles the hybrid merging automatically.
Aurora pgvector	This is sql based and best for relational data; you manually combine `tsvector` (keywords) and `vector` (meaning) in one query.

Metadata Filtering

Metadata filtering narrows down results using structured data like date, category, or user ID—before or after the vector search runs.

Think of it like this: a vector search finds books similar to Harry Potter. But you only want books published after 2010 and available in English. Metadata filtering ensures you don’t waste time on the wrong results.

Pre-filtering vs Post-filtering

Approach	How it works	Trade-offs
Pre-filtering	Applies filters first, then runs vector search on the remaining data	Accurate and secure, but can be slower depending on the engine
Post-filtering	Runs vector search first, then filters the results	Fast, but may return zero results if none match the filters

Note: S3 Vectors applies metadata filters during the vector search itself, combining the accuracy of pre-filtering with the performance of post-filtering.

Chunking

Chunking is simply breaking a long document into smaller, meaningful pieces before creating embeddings. If your chunks are too small, you lose context. If they’re too big, the important meaning gets buried in noise. The goal is to find the right balance.

Common Chunking Strategies

Strategy	How it works	Chunk size	Best for
Fixed-size	Split every N tokens/characters with optional overlap	256–512 tokens	Simple content like logs or short descriptions
Recursive	Split by paragraphs → sentences → words while preserving structure	Variable	General-purpose text (default in most tools)
Semantic	Use an embedding model to split based on topic boundaries	Variable	Long-form content like whitepapers or legal docs
Document-structure	Split using headings, sections, or document layout	Variable	Structured docs like manuals, HTML, or code
Sentence-window	Store sentences, return surrounding context at query time	1 sentence (store) / window (return)	High-precision Q&A

Bedrock Chunking Options

Bedrock option	What it does	Equivalent concept
Default	~300-token chunks that respect sentence boundaries	Recursive (baseline)
Fixed-size	You control chunk size and overlap	Fixed-size
Hierarchical	Searches small chunks but returns larger context	Sentence-window
Semantic	Splits based on topic boundaries	Semantic
None	No splitting — entire file treated as one chunk	Document-structure (manual)

👉 Continue reading: In Part 3, we’ll compare AWS vector database options and build a practical decision framework to help you choose the right one.

DEV Community