DEV Community

Cover image for Bedrock Knowledge Base Advanced RAG with Terraform: Chunking, Hybrid Search, and Reranking 🧠
Suhas Mallesh
Suhas Mallesh

Posted on

Bedrock Knowledge Base Advanced RAG with Terraform: Chunking, Hybrid Search, and Reranking 🧠

Fixed-size chunking is just the starting point. Semantic chunking, hierarchical retrieval, hybrid search, reranking, and metadata filtering turn a basic RAG pipeline into a production system. All configurable in

In RAG Post 1, we deployed a basic Bedrock Knowledge Base with fixed-size chunking. It works, but retrieval quality is mediocre. Your users ask complex questions and get incomplete answers. The model pulls in irrelevant chunks while missing the ones that matter.

The fix isn't a better model. It's better retrieval. Bedrock Knowledge Bases supports four chunking strategies, hybrid search, reranking models, metadata filtering, and query decomposition. All of these are configurable through Terraform and the retrieval API. This post covers the production patterns that separate a demo from a system your users actually trust. 🎯

🧱 Chunking Strategies: Choosing the Right One

Chunking is the single biggest lever for RAG quality. How you split documents determines what gets retrieved. Bedrock supports four strategies:

Strategy How It Works Best For Terraform Value
FIXED_SIZE Split every N tokens with overlap General purpose, predictable costs FIXED_SIZE
HIERARCHICAL Parent/child chunks - search on children, return parents Long docs with nested structure (manuals, legal) HIERARCHICAL
SEMANTIC Split by meaning using embedding similarity Dense prose, technical docs SEMANTIC
NONE Each file = one chunk Pre-processed documents NONE

Fixed-Size (Baseline)

Good starting point. Predictable chunk sizes make cost estimation easy:

vector_ingestion_configuration {
  chunking_configuration {
    chunking_strategy = "FIXED_SIZE"
    fixed_size_chunking_configuration {
      max_tokens         = 512
      overlap_percentage = 20
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Tuning guide: Start with 512 tokens and 20% overlap. If answers lack context, increase to 1024. If retrieval returns too much noise, decrease to 256.

Hierarchical (Best for Complex Documents)

This is where retrieval quality jumps. Bedrock searches on small child chunks for precision, then returns the larger parent chunk for context:

vector_ingestion_configuration {
  chunking_configuration {
    chunking_strategy = "HIERARCHICAL"
    hierarchical_chunking_configuration {
      level_configuration {
        max_tokens = 1500
      }
      level_configuration {
        max_tokens = 300
      }
      overlap_tokens = 60
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

How it works: A 10-page legal document gets split into parent chunks (~1500 tokens, roughly a page) and child chunks (~300 tokens, roughly a paragraph). When a user asks a question, Bedrock matches against the precise child chunks, then replaces them with the broader parent chunks before passing context to the model. This means the model sees a full page of context rather than an isolated paragraph.

Trade-off: You may get fewer results than numberOfResults requests, because multiple child chunks can map to the same parent.

Semantic (Best for Dense Prose)

Semantic chunking uses an embedding model to split text at natural semantic boundaries:

vector_ingestion_configuration {
  chunking_configuration {
    chunking_strategy = "SEMANTIC"
    semantic_chunking_configuration {
      max_tokens            = 512
      buffer_size           = 1
      breakpoint_percentile_threshold = 95
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

When to use: Documents where meaning doesn't align with fixed boundaries - research papers, narrative reports, policy documents. The breakpoint_percentile_threshold (0-99) controls sensitivity: higher values create fewer, larger chunks.

Cost note: Semantic chunking calls a foundation model during ingestion, adding cost per document. Factor this into your ingestion pipeline budget.

πŸ” Hybrid Search

By default, Bedrock uses semantic (vector) search only. Hybrid search combines vector similarity with keyword (BM25) matching. This catches cases where exact terminology matters - product codes, legal references, technical terms:

response = client.retrieve(
    knowledgeBaseId="YOUR_KB_ID",
    retrievalQuery={"text": "What is policy ABC-123?"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "overrideSearchType": "HYBRID"
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

When to enable: Any knowledge base where users search for specific identifiers, codes, or exact phrases alongside natural language questions. Hybrid search is supported on OpenSearch Serverless and Amazon RDS vector stores.

πŸ“Š Reranking

Retrieval returns the top-K chunks by vector similarity. But similarity isn't the same as relevance. A reranking model re-scores those chunks using a deeper understanding of the query-document relationship:

response = client.retrieve_and_generate(
    input={"text": "What are the penalties for late payment?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "YOUR_KB_ID",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-20250514",
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": 15,
                    "overrideSearchType": "HYBRID"
                }
            },
            "orchestrationConfiguration": {
                "rerankingConfiguration": {
                    "bedrockRerankingConfiguration": {
                        "modelConfiguration": {
                            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/cohere.rerank-v3-5:0"
                        },
                        "numberOfRerankedResults": 5
                    }
                }
            }
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

Pattern: Retrieve 15 chunks with hybrid search, rerank down to 5 with Cohere Rerank. This "retrieve wide, rerank narrow" pattern consistently outperforms retrieving 5 directly.

🏷️ Metadata Filtering

Not every chunk is equal. Metadata filtering lets you scope retrieval to specific document categories, dates, or sources:

response = client.retrieve(
    knowledgeBaseId="YOUR_KB_ID",
    retrievalQuery={"text": "What changed in the refund policy?"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "filter": {
                "andAll": [
                    {"equals": {"key": "department", "value": "legal"}},
                    {"greaterThan": {"key": "year", "value": 2024}}
                ]
            }
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

Metadata is defined per-document using a companion .metadata.json file in S3:

{
  "metadataAttributes": {
    "department": { "value": "legal", "type": "STRING" },
    "year": { "value": 2025, "type": "NUMBER" },
    "confidential": { "value": false, "type": "BOOLEAN" }
  }
}
Enter fullscreen mode Exit fullscreen mode

Place this file alongside your document: policy.pdf gets policy.pdf.metadata.json.

πŸ”€ Query Decomposition

Complex questions often contain multiple intents. Query decomposition breaks them into sub-queries that are each answered independently:

response = client.retrieve_and_generate(
    input={"text": "Compare the 2024 and 2025 refund policies and highlight what changed"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "YOUR_KB_ID",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-20250514",
            "orchestrationConfiguration": {
                "queryTransformationConfiguration": {
                    "type": "QUERY_DECOMPOSITION"
                }
            }
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

Bedrock may decompose this into: "What is the 2024 refund policy?" and "What is the 2025 refund policy?" - retrieving separately then synthesizing. This increases API calls but significantly improves answers for comparative or multi-part questions.

πŸ“ Putting It All Together: Production Config

Here's a production-ready Terraform data source configuration combining hierarchical chunking with the retrieval optimizations above:

# rag/data_source_prod.tf

resource "aws_bedrockagent_data_source" "s3_prod" {
  name              = "${var.environment}-${var.kb_name}-s3-source"
  knowledge_base_id = aws_bedrockagent_knowledge_base.this.id

  data_source_configuration {
    type = "S3"
    s3_configuration {
      bucket_arn              = aws_s3_bucket.knowledge_base_docs.arn
      inclusion_prefixes      = var.s3_inclusion_prefixes
    }
  }

  vector_ingestion_configuration {
    chunking_configuration {
      chunking_strategy = var.chunking_strategy

      dynamic "hierarchical_chunking_configuration" {
        for_each = var.chunking_strategy == "HIERARCHICAL" ? [1] : []
        content {
          level_configuration {
            max_tokens = var.parent_chunk_tokens
          }
          level_configuration {
            max_tokens = var.child_chunk_tokens
          }
          overlap_tokens = var.overlap_tokens
        }
      }

      dynamic "semantic_chunking_configuration" {
        for_each = var.chunking_strategy == "SEMANTIC" ? [1] : []
        content {
          max_tokens                      = var.semantic_max_tokens
          buffer_size                     = var.semantic_buffer_size
          breakpoint_percentile_threshold = var.semantic_breakpoint_threshold
        }
      }

      dynamic "fixed_size_chunking_configuration" {
        for_each = var.chunking_strategy == "FIXED_SIZE" ? [1] : []
        content {
          max_tokens         = var.fixed_max_tokens
          overlap_percentage = var.fixed_overlap_percentage
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With environment-specific variables:

# environments/dev.tfvars
chunking_strategy    = "FIXED_SIZE"
fixed_max_tokens     = 300
fixed_overlap_percentage = 10

# environments/prod.tfvars
chunking_strategy    = "HIERARCHICAL"
parent_chunk_tokens  = 1500
child_chunk_tokens   = 300
overlap_tokens       = 60
Enter fullscreen mode Exit fullscreen mode

πŸ”„ Azure vs AWS vs GCP: Advanced RAG Comparison

Feature Azure AI Search AWS Bedrock KB GCP RAG Engine
Chunking Fixed-size + Document Layout skill Fixed, hierarchical, semantic, Lambda Fixed-size only
Hybrid search BM25 + vector via RRF (built-in) Supported on OpenSearch Alpha-weighted dense/sparse
Semantic reranking Built-in transformer ranker (L2) Cohere Rerank Rank Service + LLM Ranker
Query decomposition Agentic retrieval (native) Native API parameter Not built-in
Metadata filtering Filterable index fields + OData JSON metadata files in S3 Filter string at query time
Strictness control 1-5 scale on data source Not built-in Vector distance threshold
Reranker score range 0-4 (calibrated, cross-query consistent) Model-dependent Model-dependent

πŸ’‘ Decision Framework

Your Documents Recommended Chunking Search Type Reranking
Short FAQs, structured content FIXED_SIZE (256 tokens) Hybrid Optional
Long manuals, legal docs HIERARCHICAL (1500/300) Hybrid Yes
Research papers, dense prose SEMANTIC (512 tokens) Semantic Yes
Pre-chunked content NONE Hybrid Optional
Mixed document types HIERARCHICAL (safest default) Hybrid Yes

Start with HIERARCHICAL + hybrid search + reranking for production. It's the most robust default. Only switch to SEMANTIC if your documents are uniformly dense prose. Use FIXED_SIZE in dev to iterate quickly.

⏭️ What's Next

This is Post 2 of the AWS RAG Pipeline with Terraform series.


Your RAG pipeline just went from demo to production. Hierarchical chunking for context, hybrid search for precision, reranking for relevance, metadata filtering for scope - all driven by Terraform variables per environment. 🧠

Found this helpful? Follow for the full RAG Pipeline with Terraform series! πŸ’¬

Top comments (0)