Below is a step-by-step technical guide to create an Amazon Bedrock Knowledge Base using the attached console screenshot flow. I’ve written it so a reader can follow it end-to-end, with clear points where to insert each screenshot.
This guide walks you through creating an Amazon Bedrock Knowledge Base from the AWS console so you can build a RAG (Retrieval Augmented Generation) experience—where foundation models answer questions grounded in your enterprise documents.
Prerequisites
Before you start, confirm you have:
- An AWS account with permissions for Amazon Bedrock, S3, and the vector store you’ll use (commonly OpenSearch Serverless or Aurora PostgreSQL/pgvector, depending on your setup).
- Your source documents prepared (PDF, text, HTML, doc exports, etc.).
- A target AWS Region where Bedrock Knowledge Bases is available.
- A document storage location (typically an S3 bucket).
Step 1 — Open Amazon Bedrock and Navigate to Knowledge Bases
- Sign in to the AWS Console.
- Search for Amazon Bedrock.
- In the Bedrock left navigation, locate Knowledge bases (under the appropriate section such as Builder tools or similar).
- Click Knowledge bases.
Step 2 — Start Creating a Knowledge Base
From the Knowledge Bases page:
Click Create knowledge base.
This begins a guided workflow (wizard-style).
Step 3 — Choose the Knowledge Base Setup Type
In the wizard, you will typically see a choice such as:
Knowledge base with vector store (recommended for RAG)
Other options depending on account/region features
Select the Knowledge Base option that uses a vector store.
Knowledge Bases store embeddings in a vector index so Bedrock can retrieve relevant chunks of text at query time.Embeddings are numerical representations of data (text, images, audio, or other content) that capture the meaning and context of that data in a mathematical form. They allow machines to understand similarity, relationships, and intent rather than just keywords.In one liner, embeddings turn human knowledge into math that AI can search, compare, and reason over.
Step 4 — Define Knowledge Base Details
Provide:
Knowledge Base Name (example: utility-ops-kb)
Description (optional but recommended)
Any organizational tags (optional)
I have provide name here as 'knowledge-base-dipayan' but best practice to have a name aligned to domain and environment, e.g., us-outage-kb-dev.
Step 5 — Configure Data Source (S3)
Choose Amazon S3 as the data source.
Select the S3 bucket and prefix/folder where documents are stored.
Confirm document formats and inclusion rules (if prompted).
I used the bedrock-dipayan S3 bucket to upload the file containing Jeff Bezos’ 2022 shareholder letter. Best practice to Keep a dedicated prefix for the KB, for example:s3://my-company-kb/energy-utility/outage-procedures/
Bedrock needs permissions to:
Read documents from S3
- Write embeddings to the vector store
- Perform sync operations
In the wizard, you will either:
- Let Bedrock create a new IAM role, or
- Choose an existing IAM role
Best practice to use least privilege. If you use an existing role, ensure it can:
- s3:GetObject (for your bucket/prefix)
- Required permissions for your selected vector store
Under S3 URI, provide the Amazon S3 bucket location that contains your source documents.
Example:
s3://bedrock-dipayan/
You can click Browse S3 to select the bucket or View to inspect its contents.
Optionally, you may provide a customer-managed KMS key if the S3 data is encrypted with CMK.
Parsing determines how Bedrock extracts content from your source files before embedding.
Based on the screenshot, three parser options are available:
Option 1 – Amazon Bedrock Default Parser (Selected)
This is the recommended choice for most text-based knowledge bases.
Best for:
- Text-heavy documents
- PDFs, Word, Excel, HTML, Markdown, CSV, TXT
Parser output:
- Extracted plain text
This parser works well when paired with Amazon Titan Embeddings or other text embedding models.
Recommended for: Enterprise documentation, SOPs, reports, policies, letters, and manuals.
Option 2 – Amazon Bedrock Data Automation Parser
Designed for multimodal content.
Best for:
- PDFs with complex layouts
- Images, audio, and video files
Parser output:
- Extracted text
- Image descriptions and captions
- Audio/video transcripts and summaries
Use this option when your Knowledge Base includes non-text content that must be converted into searchable text.
Option 3 – Foundation Models as Parser
Uses foundation models to parse rich or complex documents.
Best for:
- Tables, forms, structured documents
- Visual-rich PDFs
Parser output:
- Extracted text
- Descriptions of figures, visuals, and tables
This option provides advanced parsing but may increase cost and processing time.
Configure Chunking Strategy
Chunking controls how documents are split into smaller segments before embeddings are generated.
From the screenshot:
Default chunking is selected.
Bedrock automatically:
- Splits text into chunks of approximately 500 tokens
- Applies overlap where necessary to preserve context
- Skips chunking if a document is already smaller than the chunk size
Why chunking matters:
- Smaller chunks improve retrieval precision.
- Overlapping chunks preserve semantic continuity.
- Proper chunking reduces hallucinations and improves grounding.
Best practice to use default chunking unless you have a strong reason to customize (e.g., very long legal documents or structured data).
Step 6 — Select Embedding Model
Select the embeddings model used to convert your documents into vectors (embeddings). Common choices include:
Amazon Titan Embeddings (typical default choice)
Other provider embedding models (based on what your account has enabled)
Step 7 – Configure Embeddings Model and Vector Store
In this step, configure how Amazon Bedrock will convert your documents into embeddings and store them for semantic retrieval.
Under Configure data storage and processing, select an Embeddings model.
Click Select model and choose an embeddings model (for example, Amazon Titan Embeddings) to transform your documents into vector representations.
Next, choose a Vector Store where Bedrock will store and manage the embeddings.
Available options include:
Amazon OpenSearch Serverless (recommended for most use cases)
Provides fully managed, scalable vector search optimized for semantic and hybrid search.
Amazon Aurora PostgreSQL Serverless (pgvector)
Suitable if you already use relational databases and want SQL-based vector queries.
Amazon Neptune Analytics (GraphRAG)
Used for graph-based retrieval and advanced relationship-driven RAG scenarios.
Select Amazon OpenSearch Serverless (as shown in the screenshot) for a fully managed vector database optimized for high-performance semantic search.
Once selected, Bedrock will automatically create and manage the required vector index for storing embeddings.
Click Next to proceed.
Step 8 — Review and Create
- Review the full configuration summary:
- Knowledge base name
- Data source path
- Embedding model
- Vector store configuration
- IAM role
- Click Create knowledge base.
At this point, the Knowledge Base object is created, but it still needs to ingest/sync documents.
Knowledge Base is ready but you can see 'Test Knowledge Base' option is gared out as documnt need to sync before testing KB.
Step 9 — Sync (Ingest) Your Documents
Once created:
Open your Knowledge Base.
Start a Sync (sometimes labeled Sync data source).
Monitor the sync status until it shows Completed/Ready.
What sync does: It chunks documents, generates embeddings, and stores them in the vector index.
Step 10 — Test Created Knowledge Base
The Test Knowledge Base option in Amazon Bedrock allows you to interactively validate that your Knowledge Base (KB) is working as expected before integrating it into an application or agent. It is essentially a built-in RAG testing console.
This view lets you:
- Ask natural-language questions
- Control retrieval and generation behavior
- Inspect source chunks used for answers
- Verify grounding and relevance
- Tune configuration settings in real time














Top comments (0)