Amazon Bedrock: From Zero to Production in 30 Minutes
If you've been curious about Generative AI but haven't dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.
In this guide, I'll take you from zero to a working application that you can actually deploy to production.
What is Amazon Bedrock?
Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as "LLMs as a Service."
Available models include:
- Claude 4 & Claude 3.5 (Anthropic) - Best for complex reasoning and long documents
- Titan (Amazon) - Cost-effective for general tasks
- Llama 3 (Meta) - Open-source performance
- Mistral Large - Fast inference, great for code and chat
- Stable Diffusion 3 (Stability AI) - Image generation
Setting Up Your Environment
1. Enable Bedrock Models
First, request access to the models you want to use:
- Go to Amazon Bedrock in the AWS Console
- Navigate to "Model access"
- Click "Manage model access"
- Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)
- Submit the request
Most models are approved instantly. Some (like Claude 4) may take a few minutes.
2. Configure IAM Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.titan*"
]
}
]
}
3. Install Dependencies
pip install boto3 langchain-aws
Your First Bedrock Application
Let's build a simple text generator:
import boto3
import json
# Initialize the client
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
def generate_text(prompt: str, max_tokens: int = 500) -> str:
"""Generate text using Claude 3.5 Sonnet."""
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [
{
"role": "user",
"content": prompt
}
]
})
response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
contentType="application/json",
accept="application/json"
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
# Test it
result = generate_text("Explain Kubernetes in 3 sentences for a beginner.")
print(result)
Output:
Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.
Streaming Responses
For better user experience, stream the response:
def generate_text_streaming(prompt: str):
"""Stream text generation for real-time output."""
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [{"role": "user", "content": prompt}]
})
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
contentType="application/json"
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
yield chunk['delta'].get('text', '')
# Use it
for text_chunk in generate_text_streaming("Write a haiku about cloud computing"):
print(text_chunk, end='', flush=True)
Using LangChain for Production Apps
For more complex applications, LangChain provides a cleaner interface:
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage
# Initialize the model
llm = ChatBedrock(
model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
model_kwargs={
"max_tokens": 1000,
"temperature": 0.7
}
)
# Simple chat
response = llm.invoke([
SystemMessage(content="You are a helpful AWS architect."),
HumanMessage(content="What's the best way to set up a VPC?")
])
print(response.content)
Building a RAG Application
Retrieval-Augmented Generation (RAG) lets you query your own documents:
from langchain_aws import BedrockEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# 1. Initialize embeddings
embeddings = BedrockEmbeddings(
model_id="amazon.titan-embed-text-v1"
)
# 2. Load and split your documents
documents = [...] # Your documents here
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.split_documents(documents)
# 3. Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# 4. Create RAG chain
template = """Answer based on the following context:
Context: {context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
)
# 5. Query your documents
answer = rag_chain.invoke("What is our refund policy?")
print(answer.content)
Cost Optimization Tips
Bedrock pricing is based on input/output tokens. Here's how to optimize:
1. Choose the Right Model
| Use Case | Recommended Model | Cost |
|---|---|---|
| Simple Q&A | Titan Lite | $ |
| General chat | Claude 3.5 Haiku | $$ |
| Complex reasoning | Claude 3.5 Sonnet | $$$ |
| Advanced code & reasoning | Claude 4 Sonnet/Opus | $$$$ |
2. Use Provisioned Throughput for High Volume
# For production workloads with consistent traffic
response = bedrock.invoke_model(
modelId="arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model",
body=body
)
3. Cache Frequent Responses
import hashlib
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_generate(prompt_hash: str, prompt: str) -> str:
return generate_text(prompt)
def generate_with_cache(prompt: str) -> str:
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
return cached_generate(prompt_hash, prompt)
Security Best Practices
1. Use VPC Endpoints
resource "aws_vpc_endpoint" "bedrock" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.bedrock-runtime"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.bedrock_endpoint.id]
private_dns_enabled = true
}
2. Enable Model Invocation Logging
# CloudWatch logging for compliance
bedrock_client = boto3.client('bedrock')
bedrock_client.put_model_invocation_logging_configuration(
loggingConfig={
'cloudWatchConfig': {
'logGroupName': '/aws/bedrock/invocations',
'roleArn': 'arn:aws:iam::123456789:role/BedrockLogging'
},
'textDataDeliveryEnabled': True,
'imageDataDeliveryEnabled': False
}
)
3. Use Guardrails
Amazon Bedrock Guardrails help filter harmful content:
response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
guardrailIdentifier="my-guardrail-id",
guardrailVersion="DRAFT"
)
Real-World Architecture
Here's a production-ready architecture I use for enterprise clients:
┌─────────────────┐
│ CloudFront │
└────────┬────────┘
│
┌────────▼────────┐
│ API Gateway │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
│ Lambda (Chat) │ │ Lambda │ │ Lambda │
│ │ │ (RAG) │ │ (Streaming) │
└────────┬───────┘ └────┬────┘ └───────┬──────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
│ Bedrock │ │ OpenSrch│ │ DynamoDB │
│ (Foundation M) │ │ (Vector)│ │ (Sessions) │
└────────────────┘ └─────────┘ └──────────────┘
What's Next?
Now that you have the basics, here are some directions to explore:
- Agents for Bedrock - Create autonomous agents that can use tools
- Knowledge Bases - Managed RAG with automatic chunking and embeddings
- Fine-tuning - Customize models with your own data
- Multi-modal - Work with images and PDFs using Claude 4 Vision
Have questions about implementing Bedrock in your architecture? Drop a comment below!
About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.
Top comments (0)