DEV Community

Cover image for AWS Vector Databases – Part 2: Search, Filtering, and Chunking
Sabarish Sathasivan
Sabarish Sathasivan

Posted on

AWS Vector Databases – Part 2: Search, Filtering, and Chunking

This is Part 2 of the AWS vector database series.

👉 Missed Part 1? Start here: Embeddings, Dimensions, and Similarity Search

In Part 1, we covered the fundamentals of embeddings and how similarity is measured. Now we move into how retrieval actually works in practice.

In this part, we’ll look at search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking strategies — the building blocks of effective RAG systems.

Vector Search Types

Approach How it works When to use AWS services
KNN — Exact Nearest Neighbor Search Check every single item, compare it to your query, return the best matches. Perfectly accurate, but slow. Small datasets (under 100K vectors) or situations where you absolutely cannot afford to miss a result — like medical diagnostics or legal compliance checks. All vector services support KNN as a fallback, but it's not practical at scale.
ANN — Approximate Nearest Neighbor Search Uses a smart index structure (graph or cluster) to find very likely nearest neighbors without checking everything Almost everything in production. If you're building a RAG chatbot, semantic search, or recommendation engine, this is your default. OpenSearch Serverless, Aurora pgvector, MemoryDB, ElastiCache Valkey, DocumentDB, S3 Vectors, Neptune Analytics.

ANN Index Structures

To avoid checking every vector, ANN uses smart indexing. The two most common types on AWS are:

Index Simple idea Trade-off
HNSW Connects similar vectors like a network and “walks” through it to find matches Uses more memory and takes longer to build, but gives faster and more accurate results. Default in most AWS services.
IVFFlat Groups vectors into clusters and only searches the closest groups Faster to build and uses less memory, but needs tuning and may miss some results

Intuitive way to think about it

HNSW — like navigating a city with highways

  • Start with highways to get close
  • Then use local roads to find the exact place

HNSW does the same:

  • Moves from broad → detailed search
  • Finds results quickly and accurately

IVFFlat — like searching in neighborhoods

  • First pick a few likely neighborhoods
  • Then search inside them

IVFFlat works similarly:

  • Reduces search space
  • But can miss results if the right cluster isn’t picked

Which one should you use?

  • Go with HNSW → best performance and accuracy (default choice)
  • Use IVFFlat → faster to build, lower memory, but slightly less accurate

Hybrid Search

Hybrid search runs two searches at the same time—one that understands meaning (vector search) and one that looks for exact words (keyword search)—and then combines the results.

For example, a user might search: “lambda timeout issue nodejs.”

  • The Vector Search understands the intent (performance/debugging)
  • The Keyword Search ensures exact terms like lambda and nodejs are matched.

Note: The scoring method used to combine these two result sets is called Reciprocal Rank Fusion (RRF). It doesn’t simply add scores—it prioritizes documents that rank highly in both searches. For example, if a document ranks #1 in keyword search and #2 in vector search, RRF will push it to the top of the final results.

This is especially useful for enterprise RAG. Users rarely search with purely natural language or purely exact keywords—they usually mix both.

Service Implementation
OpenSearch Serverless Supports Native (RRF). The most robust option; its "Neural Search" feature handles the hybrid merging automatically.
Aurora pgvector This is sql based and best for relational data; you manually combine tsvector (keywords) and vector (meaning) in one query.

Metadata Filtering

Metadata filtering narrows down results using structured data like date, category, or user ID—before or after the vector search runs.

Think of it like this: a vector search finds books similar to Harry Potter. But you only want books published after 2010 and available in English. Metadata filtering ensures you don’t waste time on the wrong results.

Pre-filtering vs Post-filtering

Approach How it works Trade-offs
Pre-filtering Applies filters first, then runs vector search on the remaining data Accurate and secure, but can be slower depending on the engine
Post-filtering Runs vector search first, then filters the results Fast, but may return zero results if none match the filters

Note: S3 Vectors applies metadata filters during the vector search itself, combining the accuracy of pre-filtering with the performance of post-filtering.

Chunking

Chunking is simply breaking a long document into smaller, meaningful pieces before creating embeddings. If your chunks are too small, you lose context. If they’re too big, the important meaning gets buried in noise. The goal is to find the right balance.

Common Chunking Strategies

Strategy How it works Chunk size Best for
Fixed-size Split every N tokens/characters with optional overlap 256–512 tokens Simple content like logs or short descriptions
Recursive Split by paragraphs → sentences → words while preserving structure Variable General-purpose text (default in most tools)
Semantic Use an embedding model to split based on topic boundaries Variable Long-form content like whitepapers or legal docs
Document-structure Split using headings, sections, or document layout Variable Structured docs like manuals, HTML, or code
Sentence-window Store sentences, return surrounding context at query time 1 sentence (store) / window (return) High-precision Q&A

Bedrock Chunking Options

Bedrock option What it does Equivalent concept
Default ~300-token chunks that respect sentence boundaries Recursive (baseline)
Fixed-size You control chunk size and overlap Fixed-size
Hierarchical Searches small chunks but returns larger context Sentence-window
Semantic Splits based on topic boundaries Semantic
None No splitting — entire file treated as one chunk Document-structure (manual)

👉 Continue reading: In Part 3, we’ll compare AWS vector database options and build a practical decision framework to help you choose the right one.

Top comments (0)