Dipayan Das

Posted on Dec 16, 2025

Create a Knowledge Base in Amazon Bedrock (Step-by-Step Console Guide)

#aws #rag #ai #tutorial

Below is a step-by-step technical guide to create an Amazon Bedrock Knowledge Base using the attached console screenshot flow. I’ve written it so a reader can follow it end-to-end, with clear points where to insert each screenshot.

This guide walks you through creating an Amazon Bedrock Knowledge Base from the AWS console so you can build a RAG (Retrieval Augmented Generation) experience—where foundation models answer questions grounded in your enterprise documents.

Prerequisites

Before you start, confirm you have:

An AWS account with permissions for Amazon Bedrock, S3, and the vector store you’ll use (commonly OpenSearch Serverless or Aurora PostgreSQL/pgvector, depending on your setup).
Your source documents prepared (PDF, text, HTML, doc exports, etc.).
A target AWS Region where Bedrock Knowledge Bases is available.
A document storage location (typically an S3 bucket).

Step 1 — Open Amazon Bedrock and Navigate to Knowledge Bases

Sign in to the AWS Console.
Search for Amazon Bedrock.
In the Bedrock left navigation, locate Knowledge bases (under the appropriate section such as Builder tools or similar).
Click Knowledge bases.

Step 2 — Start Creating a Knowledge Base

From the Knowledge Bases page:

Click Create knowledge base.

This begins a guided workflow (wizard-style).

Step 3 — Choose the Knowledge Base Setup Type

In the wizard, you will typically see a choice such as:

Knowledge base with vector store (recommended for RAG)

Other options depending on account/region features

Select the Knowledge Base option that uses a vector store.

Knowledge Bases store embeddings in a vector index so Bedrock can retrieve relevant chunks of text at query time.Embeddings are numerical representations of data (text, images, audio, or other content) that capture the meaning and context of that data in a mathematical form. They allow machines to understand similarity, relationships, and intent rather than just keywords.In one liner, embeddings turn human knowledge into math that AI can search, compare, and reason over.

Step 4 — Define Knowledge Base Details

Provide:

Knowledge Base Name (example: utility-ops-kb)

Description (optional but recommended)

Any organizational tags (optional)

I have provide name here as 'knowledge-base-dipayan' but best practice to have a name aligned to domain and environment, e.g., us-outage-kb-dev.

Step 5 — Configure Data Source (S3)

Choose Amazon S3 as the data source.

Select the S3 bucket and prefix/folder where documents are stored.

Confirm document formats and inclusion rules (if prompted).

I used the bedrock-dipayan S3 bucket to upload the file containing Jeff Bezos’ 2022 shareholder letter. Best practice to Keep a dedicated prefix for the KB, for example:s3://my-company-kb/energy-utility/outage-procedures/

Bedrock needs permissions to:

Read documents from S3

Write embeddings to the vector store
Perform sync operations

In the wizard, you will either:

Let Bedrock create a new IAM role, or
Choose an existing IAM role

Best practice to use least privilege. If you use an existing role, ensure it can:

s3:GetObject (for your bucket/prefix)
Required permissions for your selected vector store

Under S3 URI, provide the Amazon S3 bucket location that contains your source documents.

Example:

s3://bedrock-dipayan/

You can click Browse S3 to select the bucket or View to inspect its contents.

Optionally, you may provide a customer-managed KMS key if the S3 data is encrypted with CMK.

Parsing determines how Bedrock extracts content from your source files before embedding.

Based on the screenshot, three parser options are available:

Option 1 – Amazon Bedrock Default Parser (Selected)

This is the recommended choice for most text-based knowledge bases.

Best for:

Text-heavy documents
PDFs, Word, Excel, HTML, Markdown, CSV, TXT

Parser output:

Extracted plain text

This parser works well when paired with Amazon Titan Embeddings or other text embedding models.

Recommended for: Enterprise documentation, SOPs, reports, policies, letters, and manuals.

Option 2 – Amazon Bedrock Data Automation Parser

Designed for multimodal content.

Best for:

PDFs with complex layouts
Images, audio, and video files

Parser output:

Extracted text
Image descriptions and captions
Audio/video transcripts and summaries

Use this option when your Knowledge Base includes non-text content that must be converted into searchable text.

Option 3 – Foundation Models as Parser

Uses foundation models to parse rich or complex documents.

Best for:

Tables, forms, structured documents
Visual-rich PDFs

Parser output:

Extracted text
Descriptions of figures, visuals, and tables

This option provides advanced parsing but may increase cost and processing time.

Configure Chunking Strategy

Chunking controls how documents are split into smaller segments before embeddings are generated.

From the screenshot:

Default chunking is selected.

Bedrock automatically:

Splits text into chunks of approximately 500 tokens
Applies overlap where necessary to preserve context
Skips chunking if a document is already smaller than the chunk size

Why chunking matters:

Smaller chunks improve retrieval precision.
Overlapping chunks preserve semantic continuity.
Proper chunking reduces hallucinations and improves grounding.

Best practice to use default chunking unless you have a strong reason to customize (e.g., very long legal documents or structured data).

Step 6 — Select Embedding Model

Select the embeddings model used to convert your documents into vectors (embeddings). Common choices include:

Amazon Titan Embeddings (typical default choice)

Other provider embedding models (based on what your account has enabled)

Step 7 – Configure Embeddings Model and Vector Store

In this step, configure how Amazon Bedrock will convert your documents into embeddings and store them for semantic retrieval.

Under Configure data storage and processing, select an Embeddings model.

Click Select model and choose an embeddings model (for example, Amazon Titan Embeddings) to transform your documents into vector representations.

Next, choose a Vector Store where Bedrock will store and manage the embeddings.
Available options include:

Amazon OpenSearch Serverless (recommended for most use cases)

Provides fully managed, scalable vector search optimized for semantic and hybrid search.

Amazon Aurora PostgreSQL Serverless (pgvector)

Suitable if you already use relational databases and want SQL-based vector queries.

Amazon Neptune Analytics (GraphRAG)

Used for graph-based retrieval and advanced relationship-driven RAG scenarios.

Select Amazon OpenSearch Serverless (as shown in the screenshot) for a fully managed vector database optimized for high-performance semantic search.

Once selected, Bedrock will automatically create and manage the required vector index for storing embeddings.

Click Next to proceed.

Step 8 — Review and Create

Review the full configuration summary:

Knowledge base name
Data source path
Embedding model
Vector store configuration
IAM role

Click Create knowledge base.

At this point, the Knowledge Base object is created, but it still needs to ingest/sync documents.

Knowledge Base is ready but you can see 'Test Knowledge Base' option is gared out as documnt need to sync before testing KB.

Step 9 — Sync (Ingest) Your Documents

Once created:

Open your Knowledge Base.

Start a Sync (sometimes labeled Sync data source).

Monitor the sync status until it shows Completed/Ready.

What sync does: It chunks documents, generates embeddings, and stores them in the vector index.

Step 10 — Test Created Knowledge Base

The Test Knowledge Base option in Amazon Bedrock allows you to interactively validate that your Knowledge Base (KB) is working as expected before integrating it into an application or agent. It is essentially a built-in RAG testing console.

This view lets you: