David Marcelo Petrocelli

Posted on Jan 7

Amazon Bedrock: From Zero to Production in 30 Minutes

#aws #amazonbedrock #generativeai #python

Amazon Bedrock: From Zero to Production in 30 Minutes

If you've been curious about Generative AI but haven't dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.

In this guide, I'll take you from zero to a working application that you can actually deploy to production.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as "LLMs as a Service."

Available models include:

Claude 4 & Claude 3.5 (Anthropic) - Best for complex reasoning and long documents
Titan (Amazon) - Cost-effective for general tasks
Llama 3 (Meta) - Open-source performance
Mistral Large - Fast inference, great for code and chat
Stable Diffusion 3 (Stability AI) - Image generation

Setting Up Your Environment

1. Enable Bedrock Models

First, request access to the models you want to use:

Go to Amazon Bedrock in the AWS Console
Navigate to "Model access"
Click "Manage model access"
Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)
Submit the request

Most models are approved instantly. Some (like Claude 4) may take a few minutes.

2. Configure IAM Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*::foundation-model/amazon.titan*"
      ]
    }
  ]
}

3. Install Dependencies

pip install boto3 langchain-aws

Your First Bedrock Application

Let's build a simple text generator:

import boto3
import json

# Initialize the client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

def generate_text(prompt: str, max_tokens: int = 500) -> str:
    """Generate text using Claude 3.5 Sonnet."""

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    })

    response = bedrock.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
        body=body,
        contentType="application/json",
        accept="application/json"
    )

    response_body = json.loads(response['body'].read())
    return response_body['content'][0]['text']


# Test it
result = generate_text("Explain Kubernetes in 3 sentences for a beginner.")
print(result)

Output:

Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.

Streaming Responses

For better user experience, stream the response:

def generate_text_streaming(prompt: str):
    """Stream text generation for real-time output."""

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [{"role": "user", "content": prompt}]
    })

    response = bedrock.invoke_model_with_response_stream(
        modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
        body=body,
        contentType="application/json"
    )

    for event in response['body']:
        chunk = json.loads(event['chunk']['bytes'])
        if chunk['type'] == 'content_block_delta':
            yield chunk['delta'].get('text', '')


# Use it
for text_chunk in generate_text_streaming("Write a haiku about cloud computing"):
    print(text_chunk, end='', flush=True)

Using LangChain for Production Apps

For more complex applications, LangChain provides a cleaner interface:

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize the model
llm = ChatBedrock(
    model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
    model_kwargs={
        "max_tokens": 1000,
        "temperature": 0.7
    }
)

# Simple chat
response = llm.invoke([
    SystemMessage(content="You are a helpful AWS architect."),
    HumanMessage(content="What's the best way to set up a VPC?")
])
print(response.content)

Building a RAG Application

Retrieval-Augmented Generation (RAG) lets you query your own documents:

from langchain_aws import BedrockEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 1. Initialize embeddings
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1"
)

# 2. Load and split your documents
documents = [...]  # Your documents here
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# 3. Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 4. Create RAG chain
template = """Answer based on the following context:

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# 5. Query your documents
answer = rag_chain.invoke("What is our refund policy?")
print(answer.content)

Cost Optimization Tips

Bedrock pricing is based on input/output tokens. Here's how to optimize:

1. Choose the Right Model

Use Case	Recommended Model	Cost
Simple Q&A	Titan Lite	$
General chat	Claude 3.5 Haiku	$$
Complex reasoning	Claude 3.5 Sonnet	$$$
Advanced code & reasoning	Claude 4 Sonnet/Opus	$$$$

2. Use Provisioned Throughput for High Volume

# For production workloads with consistent traffic
response = bedrock.invoke_model(
    modelId="arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model",
    body=body
)

3. Cache Frequent Responses

import hashlib
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_generate(prompt_hash: str, prompt: str) -> str:
    return generate_text(prompt)

def generate_with_cache(prompt: str) -> str:
    prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
    return cached_generate(prompt_hash, prompt)

Security Best Practices

1. Use VPC Endpoints

resource "aws_vpc_endpoint" "bedrock" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.bedrock-runtime"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.bedrock_endpoint.id]
  private_dns_enabled = true
}

2. Enable Model Invocation Logging

# CloudWatch logging for compliance
bedrock_client = boto3.client('bedrock')

bedrock_client.put_model_invocation_logging_configuration(
    loggingConfig={
        'cloudWatchConfig': {
            'logGroupName': '/aws/bedrock/invocations',
            'roleArn': 'arn:aws:iam::123456789:role/BedrockLogging'
        },
        'textDataDeliveryEnabled': True,
        'imageDataDeliveryEnabled': False
    }
)

3. Use Guardrails

Amazon Bedrock Guardrails help filter harmful content:

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=body,
    guardrailIdentifier="my-guardrail-id",
    guardrailVersion="DRAFT"
)

Real-World Architecture

Here's a production-ready architecture I use for enterprise clients:

                    ┌─────────────────┐
                    │   CloudFront    │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │   API Gateway   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │ Lambda (Chat)  │ │ Lambda  │ │   Lambda     │
     │                │ │ (RAG)   │ │ (Streaming)  │
     └────────┬───────┘ └────┬────┘ └───────┬──────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │    Bedrock     │ │ OpenSrch│ │  DynamoDB    │
     │ (Foundation M) │ │ (Vector)│ │  (Sessions)  │
     └────────────────┘ └─────────┘ └──────────────┘

What's Next?

Now that you have the basics, here are some directions to explore:

Agents for Bedrock - Create autonomous agents that can use tools
Knowledge Bases - Managed RAG with automatic chunking and embeddings
Fine-tuning - Customize models with your own data
Multi-modal - Work with images and PDFs using Claude 4 Vision

Have questions about implementing Bedrock in your architecture? Drop a comment below!

About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.

DEV Community

Amazon Bedrock: From Zero to Production in 30 Minutes

Amazon Bedrock: From Zero to Production in 30 Minutes

What is Amazon Bedrock?

Setting Up Your Environment

1. Enable Bedrock Models

2. Configure IAM Permissions

3. Install Dependencies

Your First Bedrock Application

Streaming Responses

Using LangChain for Production Apps

Building a RAG Application

Cost Optimization Tips

1. Choose the Right Model

2. Use Provisioned Throughput for High Volume

3. Cache Frequent Responses

Security Best Practices

1. Use VPC Endpoints

2. Enable Model Invocation Logging

3. Use Guardrails

Real-World Architecture

What's Next?

Top comments (0)