DEV Community

David Marcelo Petrocelli
David Marcelo Petrocelli

Posted on

Amazon Bedrock: From Zero to Production in 30 Minutes

Amazon Bedrock: From Zero to Production in 30 Minutes

If you've been curious about Generative AI but haven't dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.

In this guide, I'll take you from zero to a working application that you can actually deploy to production.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as "LLMs as a Service."

Available models include:

  • Claude 4 & Claude 3.5 (Anthropic) - Best for complex reasoning and long documents
  • Titan (Amazon) - Cost-effective for general tasks
  • Llama 3 (Meta) - Open-source performance
  • Mistral Large - Fast inference, great for code and chat
  • Stable Diffusion 3 (Stability AI) - Image generation

Setting Up Your Environment

1. Enable Bedrock Models

First, request access to the models you want to use:

  1. Go to Amazon Bedrock in the AWS Console
  2. Navigate to "Model access"
  3. Click "Manage model access"
  4. Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)
  5. Submit the request

Most models are approved instantly. Some (like Claude 4) may take a few minutes.

2. Configure IAM Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*::foundation-model/amazon.titan*"
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

3. Install Dependencies

pip install boto3 langchain-aws
Enter fullscreen mode Exit fullscreen mode

Your First Bedrock Application

Let's build a simple text generator:

import boto3
import json

# Initialize the client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

def generate_text(prompt: str, max_tokens: int = 500) -> str:
    """Generate text using Claude 3.5 Sonnet."""

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    })

    response = bedrock.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
        body=body,
        contentType="application/json",
        accept="application/json"
    )

    response_body = json.loads(response['body'].read())
    return response_body['content'][0]['text']


# Test it
result = generate_text("Explain Kubernetes in 3 sentences for a beginner.")
print(result)
Enter fullscreen mode Exit fullscreen mode

Output:

Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

For better user experience, stream the response:

def generate_text_streaming(prompt: str):
    """Stream text generation for real-time output."""

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [{"role": "user", "content": prompt}]
    })

    response = bedrock.invoke_model_with_response_stream(
        modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
        body=body,
        contentType="application/json"
    )

    for event in response['body']:
        chunk = json.loads(event['chunk']['bytes'])
        if chunk['type'] == 'content_block_delta':
            yield chunk['delta'].get('text', '')


# Use it
for text_chunk in generate_text_streaming("Write a haiku about cloud computing"):
    print(text_chunk, end='', flush=True)
Enter fullscreen mode Exit fullscreen mode

Using LangChain for Production Apps

For more complex applications, LangChain provides a cleaner interface:

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize the model
llm = ChatBedrock(
    model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
    model_kwargs={
        "max_tokens": 1000,
        "temperature": 0.7
    }
)

# Simple chat
response = llm.invoke([
    SystemMessage(content="You are a helpful AWS architect."),
    HumanMessage(content="What's the best way to set up a VPC?")
])
print(response.content)
Enter fullscreen mode Exit fullscreen mode

Building a RAG Application

Retrieval-Augmented Generation (RAG) lets you query your own documents:

from langchain_aws import BedrockEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 1. Initialize embeddings
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1"
)

# 2. Load and split your documents
documents = [...]  # Your documents here
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# 3. Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 4. Create RAG chain
template = """Answer based on the following context:

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# 5. Query your documents
answer = rag_chain.invoke("What is our refund policy?")
print(answer.content)
Enter fullscreen mode Exit fullscreen mode

Cost Optimization Tips

Bedrock pricing is based on input/output tokens. Here's how to optimize:

1. Choose the Right Model

Use Case Recommended Model Cost
Simple Q&A Titan Lite $
General chat Claude 3.5 Haiku $$
Complex reasoning Claude 3.5 Sonnet $$$
Advanced code & reasoning Claude 4 Sonnet/Opus $$$$

2. Use Provisioned Throughput for High Volume

# For production workloads with consistent traffic
response = bedrock.invoke_model(
    modelId="arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model",
    body=body
)
Enter fullscreen mode Exit fullscreen mode

3. Cache Frequent Responses

import hashlib
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_generate(prompt_hash: str, prompt: str) -> str:
    return generate_text(prompt)

def generate_with_cache(prompt: str) -> str:
    prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
    return cached_generate(prompt_hash, prompt)
Enter fullscreen mode Exit fullscreen mode

Security Best Practices

1. Use VPC Endpoints

resource "aws_vpc_endpoint" "bedrock" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.bedrock-runtime"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.bedrock_endpoint.id]
  private_dns_enabled = true
}
Enter fullscreen mode Exit fullscreen mode

2. Enable Model Invocation Logging

# CloudWatch logging for compliance
bedrock_client = boto3.client('bedrock')

bedrock_client.put_model_invocation_logging_configuration(
    loggingConfig={
        'cloudWatchConfig': {
            'logGroupName': '/aws/bedrock/invocations',
            'roleArn': 'arn:aws:iam::123456789:role/BedrockLogging'
        },
        'textDataDeliveryEnabled': True,
        'imageDataDeliveryEnabled': False
    }
)
Enter fullscreen mode Exit fullscreen mode

3. Use Guardrails

Amazon Bedrock Guardrails help filter harmful content:

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=body,
    guardrailIdentifier="my-guardrail-id",
    guardrailVersion="DRAFT"
)
Enter fullscreen mode Exit fullscreen mode

Real-World Architecture

Here's a production-ready architecture I use for enterprise clients:

                    ┌─────────────────┐
                    │   CloudFront    │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │   API Gateway   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │ Lambda (Chat)  │ │ Lambda  │ │   Lambda     │
     │                │ │ (RAG)   │ │ (Streaming)  │
     └────────┬───────┘ └────┬────┘ └───────┬──────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │    Bedrock     │ │ OpenSrch│ │  DynamoDB    │
     │ (Foundation M) │ │ (Vector)│ │  (Sessions)  │
     └────────────────┘ └─────────┘ └──────────────┘
Enter fullscreen mode Exit fullscreen mode

What's Next?

Now that you have the basics, here are some directions to explore:

  1. Agents for Bedrock - Create autonomous agents that can use tools
  2. Knowledge Bases - Managed RAG with automatic chunking and embeddings
  3. Fine-tuning - Customize models with your own data
  4. Multi-modal - Work with images and PDFs using Claude 4 Vision

Have questions about implementing Bedrock in your architecture? Drop a comment below!


About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.

Top comments (0)