Model Invocation: Amazon Bedrock

#ai #aws #bedrock

🖥️ What is Amazon Bedrock?

Amazon Bedrock is an AWS service that makes it easy to build and scale generative AI applications using pre-trained Foundational Models. Key benefits include:

No infrastructure management
Unified API access to multiple model providers
Seamless integration with other AWS services (like Lambda, API Gateway, etc.)

🔑 Prerequisites

To follow along, ensure you have:

Python 3.7+

python --version

pip installed
An AWS account with access to Amazon Bedrock

⚙️ Step 1: Set Up Your Environment

Install the AWS SDK for Python:

pip install boto3

Configure your AWS credentials:

aws configure

🧠 Step 2: Choose your foundational model

There are a variety of foundational models that Amazon Bedrock gives you access to from leading AI model providers. For example:

Anthropic Sonnet 3.5/3.7 - Great for complex tasks.
Stability AI Stable Diffusion – Image generation from text prompts.

For a complete list of all supported FMs, please click on this link.

📤Step 3: Invoke the Model

For this demo, we will choose Claude Sonnet 3.7 by Anthropic.

❗ Make sure you have access to the model on Amazon Bedrock using this link.

Here’s a basic example to call the Claude model using Bedrock:

📦 Cell 1 – Import libraries

import boto3
import json

🌐 Cell 2 – Create Bedrock client

bedrock = boto3.client(service_name="bedrock-runtime",region_name="us-east-1")

📝 Cell 3 – Prepare the request body

# Define the request payload
body = json.dumps({
    "max_tokens": 256,  # Maximum number of tokens in the response
    "messages": [
        {"role": "user", "content": "Hello, world"}  # User message to the AI model
    ],
    "anthropic_version": "bedrock-2023-05-31"  # Required version for Claude 3 models
})

NOTE: Each model has a its own required input structure (body format).

➡️ You can find the correct request format for each model in the Amazon Bedrock Model Catalog.

🤖 Cell 4 – Invoke the model

# Invoke the Claude 3 Sonnet model with the given input
response = bedrock.invoke_model(
    body=body,
    modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0"  # Model ID for Claude 3.7 Sonnet
)

NOTE: You can find the ID for each model under the Amazon Bedrock Model Catalog.

⚠️ If you get such error: “Invocation of model ID with on-demand throughput isn’t supported.”
✅ You will need to append “us.” to the beginning of the ID.
ℹ️ What happens when “us.” is added ? It enables cross-region inference which enables your inference (model invocation) requests to be dynamically routed across multiple AWS Regions. This helps manage unplanned traffic bursts, improve throughput, and enhance application resilience during peak usage times.

📥 Cell 5 – Parse and display the response

# Parse the response JSON and print the model's reply
response_body = json.loads(response.get("body").read())
print(response_body.get("content"))  # Display the generated output

Let’s put it all together:

import boto3
import json

bedrock = boto3.client(service_name="bedrock-runtime",region_name="us-east-1")

body = json.dumps({
  "max_tokens": 256,
  "messages": [{"role": "user", "content": "Hello, world"}],
  "anthropic_version": "bedrock-2023-05-31"
})

response = bedrock.invoke_model(body=body, modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0")

response_body = json.loads(response.get("body").read())
print(response_body.get("content"))

📌 Wrapping Up

Amazon Bedrock + Python is a powerful combo for integrating generative AI into your workflows. With just a few lines of Python, you can tap into best-in-class models from multiple providers and build intelligent applications without ever managing infrastructure.

💡 Bonus: Streaming with `invoke_model_with_response_stream`

For real-time applications or when you want to show partial model responses as they're generated (like chat apps), you can use invoke_model_with_response_stream instead of invoke_model.

Here’s an example:

# Use the native inference API to send a text message to Anthropic Claude
# and print the response stream.

import boto3
import json

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID, e.g., Claude 3.7 Sonnet.
model_id = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"

# Define the prompt for the model.
prompt = "Give me the longest possible story in english"

# Format the request payload using the model's native structure.
native_request = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": prompt}],
        }
    ],
}

# Convert the native request to JSON.
request = json.dumps(native_request)

# Invoke the model with the request.
streaming_response = client.invoke_model_with_response_stream(
    modelId=model_id, body=request
)

# Extract and print the response text in real-time.
for event in streaming_response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    if chunk["type"] == "content_block_delta":
        print(chunk["delta"].get("text", ""), end="")

🔍 What’s Happening Here?

Instead of waiting for the full model response, you get small parts (“chunks”) as soon as they’re available.
This is super useful for smoother user experiences in live UIs.

🛠️ Note: The format of streamed responses may vary slightly between models, so refer to the Bedrock Model Documentation for exact schema.

🙋 Need Help?

If you have any questions, run into issues, or just want to explore more advanced use cases like chaining Bedrock with Lambda, Step Functions, or building chatbots — don’t hesitate to ask!