TL;DR:
In this lab, you’ll build a complete RAG workflow using Amazon Bedrock Knowledge Bases on top of your own PDFs stored in S3. Titan Embeddings will automatically create the vector index so you can run grounded, document-aware queries with a Bedrock model. In about 25–35 minutes, you’ll deploy the full architecture in us-east-1, validate responses, review costs, monitoring, and security, and walk away with a clean blueprint to truly understand GenAI—and explain it to others.
I built this lab because I want to truly understand GenAI — not just for an exam, but to teach it, document it, and help others who are starting like me.
These topics were tricky inside sandboxes, so now I’m rebuilding everything from scratch in a real AWS account to see how things work internally.
🧭 Quick Metadata
| Field | Value |
|---|---|
| CB Category | AI/ML |
| AWS Services | Amazon Bedrock, Knowledge Bases, Amazon S3, Titan Embeddings |
| Prerequisites | AWS account, Bedrock enabled, S3 permissions, region: us-east-1
|
| Estimated Cost | Under \$0.50 |
| Architecture | See diagram below |
🗺️ Table of Contents
- Why it matters
- Architecture / What you will build
- Prerequisites
- Step-by-step
- Validation & Testing
- Observability (CloudWatch)
- Security Best Practices
- Cost Analysis
- Troubleshooting
- What’s Next
- Official Resources
💡 Why it matters
Companies want to integrate generative AI into their apps, but raw models aren’t enough. They need answers based on internal knowledge: PDFs, manuals, policies, emails, reports.
That’s what RAG does. AWS makes this easier with Knowledge Bases for Amazon Bedrock, which handles:
- Document extraction
- Embedding generation (Titan)
- Vector indexing
- RAG orchestration
- Grounded responses
This lab was hard for me when using sandboxes, so I rebuilt it from scratch in a AWS Free Tier, and now I’m sharing the exact steps.
🧰 Architecture / What you will build
Simple flow diagram:
📄 Documents (PDF/Markdown/TXT)
⬇️
🪣 Amazon S3 (raw documents)
⬇️
🧠 Knowledge Base
Titan Embeddings
OpenSearch Vector Index
⬇️
🤖 Amazon Bedrock Model (Nova Micro / Claude)
⬇️
💬 Grounded, context-aware answer
Key Notes
- Titan automatically generates embeddings.
- Vector store is fully managed (OpenSearch Serverless).
- Bedrock uses ONLY your documents to answer.
✅ Prerequisites
- AWS Free Tier account
- Region us-east-1
- Bedrock enabled
- IAM permissions:
AmazonBedrockFullAccessAmazonS3FullAccess
🛠️ Step-by-step
1) Create an S3 bucket
- Open AWS Console → search S3
- Click Create bucket
- Use:
- Bucket name:
kb-sina-rag-lab
-
Region:
us-east-1 - Block Public Access: ALL ON
- Encryption: default (SSE-S3)
2) Upload your documents
- Go to your bucket:
kb-sina-rag-lab - Click Upload
- Click Add files
- Choose your PDFs or TXT files
- Click Upload
3) Create the Knowledge Base
Go to Amazon Bedrock → Knowledge Bases → Create knowledge base
3.1 Basic settings
- Knowledge Base Name:: Type the name of your Kb
rag-lab-sina-kb
- (Optional) Description: “RAG Knowledge Base for GenAI testing.”
3.2 Data source → Amazon S3
Select your S3 bucket.
3.3 Embeddings model
Choose:
Titan Text Embeddings v2: 3.3 Embeddings model: Titan Text Embeddings v2. Este modelo genera los embeddings para el vector store.
3.4 Vector store
Choose:
OpenSearch Serverless: OpenSearch Serverless (auto-configured). And leave the automatic configuration (it’s perfect for this lab).
Knowledge Base created:
4) Sync your data
- On the same screen, scroll to Data source
- Click: Sync
Status should show:
- ✔️ Sync successful
- ✔️ Status: Up to date
This may take between 30 seconds and 2 minutes depending on the size of the PDFs.
5) Test your RAG system
Here you will see:
- How the Knowledge Base searches for context in your documents
- How it selects the relevant chunks
1. On your current screen (rag-lab-sina-kb)
Look for the button: Test Knowledge Base (and click it). This will open a simple chat console.
⚠️ Important: Before testing the Knowledge Base, make sure you have at least one generative model enabled in Amazon Bedrock.
To use the Test tab of the Knowledge Base, you need to select a generative model. In new AWS accounts (including Free Tier), Titan and Claude models may not be available immediately.
The compatible option that is enabled by default is: Amazon Nova Micro
After enabling it, you'll be able to test your queries.
2.Test with a prompt:
Example:
What do the documents say about Amazon Bedrock?
3.Verify grounding:
📊 Observability (CloudWatch)
In this lab, we work with OpenSearch Serverless, which is managed by AWS. Unlike traditional OpenSearch, this version does not create log groups automatically unless we enable specific logging policies.
In CloudWatch → Log groups, you’ll notice that no logs are generated by default. This is normal for this type of managed service and also helps keep the lab within the Free Tier.
However, we can observe lab activity in:
✔️ Bedrock Metrics
CloudWatch → Metrics → Bedrock
Here you can see model invocations and the latency of each request.
- In the AWS Console → search for CloudWatch
- Left menu → Metrics
- Select Bedrock
- Then choose All Models Lds
Click on all metrics
✔️ CloudTrail
In the console, search for CloudTrail.
- Open the service
- In the left menu, select Event history
- In the filter bar, choose Event Source
- In the search box, type: bedrock.amazonaws.com
This confirms that the Knowledge Base is working even if OpenSearch logs are not generated automatically.
🔐 Security Best Practices
Even though this is a simple lab, it’s important to follow best practices from the start. Here are the recommendations specifically for a Bedrock Knowledge Base:
- Ever expose your bucket (keep Block Public Access ON).
- Create separate roles if you plan to connect the Knowledge Base to applications.
- Use plain-text documents whenever possible (better chunking).
💸 Cost Analysis
Estimated cost of the lab:
- Titan Embeddings → a few cents for generating embeddings (very cheap).
- Bedrock – Invocations → only a couple of calls (less than 1 cent each).
- OpenSearch Serverless (Vector Store) → on-demand mode, the lowest possible cost.
- Amazon S3 → very small storage (your PDFs and TXT files), practically $0.
- Approximate total: $0.20–$0.50
🧯 Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| KB doesn’t sync | Wrong S3 prefix/path | Re-check bucket |
| Missing documents | Missing IAM permissions | Update KB execution role |
| No grounding | Bad PDF parsing | Prefer TXT/Markdown |
🚀 What’s Next
- Connect this KB to a Bedrock Agent
- Build a serverless RAG API using Lambda + API Gateway
- Add authentication with Amazon Cognito





















Top comments (0)