DEV Community

Moslem Chalfouh
Moslem Chalfouh

Posted on • Originally published at awstip.com on

What I Learned Deploying My First RAG System on AWS Bedrock

For the past two years, I’ve been using Generative AI tools as an enthusiast — experimenting with prompts, testing models, seeing what they could do.

Over the last 12 months, I shifted my focus to understanding how to use these tools professionally on AWS. Not just “Can I build a chatbot?” but “How do I deploy this securely with proper infrastructure?”

To validate what I’d been learning about AWS AI workflows, I decided to build a concrete example: a RAG chatbot using AWS Bedrock Knowledge Bases , Aurora Serverless v2 , and Terraform.

This article walks through that build process — what worked, what didn’t, and what I had to do manually because Terraform couldn’t handle it yet.

1. The “Why”: RAG and Embeddings

The first step was understanding why we need this complexity. Large Language Models (LLMs) like Claude have a fixed knowledge cutoff and don’t know my private data.

We could paste documents into the prompt, but that hits token limits quickly. The standard solution is RAG:

  1. Ingest: Convert documents into vectors (embeddings).
  2. Store: Save them in a vector database.
  3. Retrieve: Find relevant chunks when a user asks a question.
  4. Generate: Send those chunks to the LLM to write an answer.

What Are Embeddings?

Embeddings are mathematical representations of text meaning. The Titan Embeddings model converts text into a vector of 1,536 numbers:

"Tesla Model Y range: 455 km" → [0.23, −0.45, 0.87, …, 0.12]
Enter fullscreen mode Exit fullscreen mode

The key insight: similar meanings produce similar vectors. “Range,” “autonomy,” and “distance per charge” all generate nearby vectors, even though the words are different. This solves the problem of traditional keyword search, which only finds exact matches.

Why Vector Databases?

Regular databases search for exact matches (WHERE id = 123). With embeddings, you need similarity search : "Find the 5 closest vectors to my query."

This requires:

  • Specialized indexes (HNSW) to organize vectors spatially
  • Distance calculations (cosine similarity) across thousands of vectors
  • Fast retrieval (milliseconds, not seconds)

That’s what pg_vector adds to Postgres—a vector(1536) column type and similarity search operators. Without it, searching embeddings would be impossibly slow.

2. The Stack: Turning Theory Into Practice

Now that we understand the concepts, how do we actually implement this on AWS? I wanted an architecture that handled all the embedding and vector search complexity while staying simple to operate.

Here’s what I chose:

  • Compute: AWS Bedrock (Serverless AI).
  • Embedding Model: Titan Embeddings G1 (managed by Bedrock).
  • Vector Storage: Aurora Serverless v2 with pg_vector extension.
  • Document Storage: S3 (for source PDFs).
  • IaC: Terraform.
  • UI: Streamlit (Python).

Why this stack?

Bedrock Knowledge Base handles the entire RAG workflow automatically — chunking documents, calling the Titan embedding model, and storing vectors in Aurora. I don’t write the embedding logic; I just configure where the vectors go.

Aurora with pg_vector was chosen over specialized vector databases (like Pinecone or Weaviate) for simplicity. It's Postgres with a vector extension—one SQL command to enable, and I can use standard database tooling I already know.

Aurora Limitations to Know

pg_vector works great for this use case, but keep in mind:

  • HNSW indexes load into memory. With ~10,000 documents (50k chunks), you’re looking at ~300MB of vector data.
  • Query performance may degrade above 100,000 vectors. At that scale, consider OpenSearch Serverless.
  • No distributed search — Aurora is single-instance.

For knowledge bases under 5,000 documents, Aurora + pg_vector is the simplest choice.


The complete workflow: User → Guardrail → Bedrock KB → Aurora → Claude


The final result: A Streamlit interface querying proprietary data via Bedrock

3. The Terraform Struggle: Circular Dependencies

When I tried to automate the deployment, I hit a logic problem. Bedrock Knowledge Base needs the Aurora Cluster ARN to know where to store data. However, the IAM Role for Bedrock needs permission to access specific Aurora tables.

Trying to do this in one terraform apply resulted in errors because Terraform couldn't resolve the dependencies.

The Solution: I split the project into two separate stacks.

Stack 1: Infrastructure

Deploys the VPC, S3 bucket, and Aurora Cluster.

# Aurora cluster
resource "aws_rds_cluster" "aurora_serverless" {
  engine = "aurora-postgresql"
  engine_mode = "provisioned"
  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 16
  }
}

# S3 bucket for documents
resource "aws_s3_bucket" "documents" {
  bucket = "my-bedrock-documents"
}
output "aurora_cluster_arn" {
  value = aws_rds_cluster.aurora_serverless.arn
}
Enter fullscreen mode Exit fullscreen mode

Stack 2: Bedrock Knowledge Base

Reads the outputs from Stack 1 via terraform_remote_state and deploys the Knowledge Base.

data "terraform_remote_state" "stack1" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key = "stack1/terraform.tfstate"
  }
}
resource "aws_bedrockagent_knowledge_base" "main" {
  name = "my-bedrock-kb"
  role_arn = aws_iam_role.bedrock_kb_role.arn
  knowledge_base_configuration {
    vector_knowledge_base_configuration {
      embedding_model_arn = "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v1"
    }
    type = "VECTOR"
  }
  storage_configuration {
    type = "RDS"
    rds_configuration {
      credentials_secret_arn = data.terraform_remote_state.stack1.outputs.aurora_secret_arn
      resource_arn = data.terraform_remote_state.stack1.outputs.aurora_cluster_arn
      database_name = "myapp"
      table_name = "bedrock_integration.bedrock_kb"
      field_mapping {
        vector_field = "embedding"
        text_field = "chunks"
        metadata_field = "metadata"
        primary_key_field = "id"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This separation made the state management much cleaner and avoided the circular dependency hell.


Success: The Knowledge Base deployed via Terraform and ready in the console

The IAM Role Bedrock Needs

Getting IAM right was critical. Bedrock needs specific permissions to talk to S3, Secrets Manager, and Aurora.

resource "aws_iam_role_policy" "bedrock_kb_policy" {
  role = aws_iam_role.bedrock_kb_role.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["s3:GetObject", "s3:ListBucket"]
        Resource = ["${aws_s3_bucket.documents.arn}", "${aws_s3_bucket.documents.arn}/*"]
      },
      {
        Effect = "Allow"
        Action = ["bedrock:InvokeModel"]
        Resource = "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v1"
      },
      {
        Effect = "Allow"
        Action = ["secretsmanager:GetSecretValue"]
        Resource = aws_secretsmanager_secret.aurora_credentials.arn
      },
      {
        Effect = "Allow"
        Action = ["rds-data:ExecuteStatement", "rds-data:BatchExecuteStatement"]
        Resource = aws_rds_cluster.aurora_serverless.arn
      }
    ]
  })
}

Enter fullscreen mode Exit fullscreen mode

Missing any of these results in silent failures during sync or query.

4. The Manual Parts (and One That Is Becoming Obsolete)

Despite using Terraform, I realized that AWS Bedrock isn’t fully automatable yet. However, the platform is maturing fast.

Model Access (The “Ghost” Step)

When I started this project back in October, I hit a wall: AccessDeniedException. I had to manually go into the AWS Console and request access for "Titan Embeddings" and "Claude". It was a one-time toggle that Terraform couldn't handle.


The Model Access screen (a necessary stop for older accounts)

Good news for you: As of late 2025, AWS has largely removed this requirement. Most serverless models are now enabled by default in supported regions. If you are building this today, you likely won’t need to touch this, but if you get a permission error, check the Model Access page just in case.

Database Schema

This is still a manual friction point. Bedrock expects the table to exist before it can sync, but it won’t create it for you. I had to connect to Aurora and run the SQL setup manually:


Manually running the SQL in the Query Editor to create the pg_vector schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE bedrock_integration.bedrock_kb (
  id uuid PRIMARY KEY,
  embedding vector(1536), -- Matches Titan G1 output
  chunks text,
  metadata json
);
CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode

Data Sync

Uploading a file to S3 doesn’t automatically trigger ingestion. You still have to manually click “Sync” in the console or trigger it via the API (start_ingestion_job).

5. The Python Layer: Keeping It Simple

I’m not a Python developer, so I kept the code minimal and modular. Two files handle everything:

bedrock_utils.py (The RAG Logic)

This file contains key functions using two different Bedrock clients.

import boto3

# Two separate clients for different purposes
bedrock_runtime = boto3.client('bedrock-runtime')  
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
Enter fullscreen mode Exit fullscreen mode
  • bedrock-runtime: For invoking foundation models (Claude)
  • bedrock-agent-runtime: For querying the Knowledge Base

This separation was confusing at first, but it makes sense once you understand that the Knowledge Base is technically an “agent” service, while model invocation is a “runtime” service.

app.py (The Streamlit Interface)

The UI is straightforward — 54 lines total. The Streamlit framework handles the chat history and UI rendering automatically.

6. The Guardrail Pattern (Saving Cost & Tokens)

💡 Most RAG tutorials skip this step. They assume every query is valid. In reality, users will ask off-topic questions (“What’s the weather?”), which triggers expensive vector searches for nothing.

Adding a 10-token classification step before RAG saves both money and UX.

During testing, I noticed that every question triggered the full RAG process, which takes time and costs money. A simple “Hello” shouldn’t trigger an expensive vector search.

I added a validation step before RAG using bedrock-runtime directly to classify the intent with a cheaper model (Claude Haiku). The function checks if the user's question falls into predefined categories before triggering the expensive RAG workflow.

Note: My categories are specific to my test documents (local biodiversity, town history, Tesla specs). For your use case, replace these with your own domain-specific categories. The key is to keep it simple — 3–4 categories maximum for reliable classification.


The logs showing the guardrail in action: Valid categories trigger RAG, while off-topic inputs are filtered out

This simple pattern saved tokens and made the application feel more responsive. The guardrail uses only ~10 tokens, while a full RAG query can use 200–500 tokens.

7. Conclusion

Building this project clarified a few things for me about the AWS AI ecosystem:

  1. Infrastructure is key: The Python code is short, but the IAM roles, Terraform configuration, and Network setup took the most time.
  2. Aurora is enough: You don’t necessarily need a specialized vector DB; Postgres works fine for this scale and is easier to maintain.
  3. Bedrock abstractions work: The retrieve_and_generate API effectively hides the complexity of vector search, letting you focus on the application logic.

What’s Missing for Production

This is a learning project , not a production system. To deploy this in a real enterprise environment, you’d need to add:

  • Authentication & Authorization: No login system, no role-based access control
  • API Gateway + Lambda: Replace Streamlit with a proper REST API
  • CI/CD Pipeline: Automated testing and deployment (GitHub Actions, CodePipeline)
  • Cost Monitoring: Budget alerts, usage tracking per user/department
  • Logging & Observability: CloudWatch dashboards, distributed tracing
  • Security hardening: VPC endpoints, encryption at rest, audit trails
  • Rate limiting: Prevent abuse and control Bedrock costs

The goal here was to understand how the pieces fit together, not to build a turnkey solution.

👉 View on GitHub: terraform-bedrock-rag


Top comments (0)