Morgan Willis for AWS

Posted on Apr 14

Amazon Bedrock for Beginners From First Prompt to AI Agent (Full Tutorial)

#ai #bedrock #agents #aws

So you want to add AI to your application. Maybe you want to build a smart assistant, add a feature that analyzes user input, or you have an AI-powered side project you've been meaning to start.

On the surface, it sounds simple. Call a model, get a response. But once you actually try to build it, the questions start to stack up fast.

Which model do you use?
How do you call it from your application code?
What happens when you want the AI to interact with your own data or external systems?
And how do I control costs?

It can feel like you need to understand everything before you can build anything, but you don't.

Amazon Bedrock is a great place to start because it's a fully managed service on AWS that gives you API access to AI models from providers like Amazon, Anthropic, Meta, Mistral, and more. You don't need to set up servers, manage infrastructure, and you only pay for what you use.

On top of model access, Bedrock includes features like Knowledge Bases for connecting your own data, Guardrails for content safety, and tool use for interacting with the real world.

This post walks through Bedrock's main features with code examples you can run yourself in your own AWS account. Everything comes from the companion repo, which has full working implementations of each example. By the end, we'll combine everything into an AI agent using the Strands Agents SDK to build out a university FAQ chatbot.

A heads up before we start: we're going to do things step by step, and this could take a while if you're following along. Give yourself an hour or so if you're a total beginner. We'll work directly with the Bedrock APIs so you understand exactly how the pieces fit together. Then at the end, we'll take an easier approach that handles much of the complexity for you. Learning the fundamentals first will make everything make a lot more sense later.

If you prefer a video walkthrough, this post has an accompanying video that covers the same material with live demos:

Prerequisites

Before following along, you'll need:

Python 3.12+ installed on your machine
An AWS account with credentials configured locally
IAM User or Role Create an IAM user or role in your AWS account to follow along with the AWS Console steps, you cannot complete the tutorial using the root user.

You'll also need to install boto3, which is the Python SDK for interacting with AWS services programmatically. Run the following in the terminal in your IDE:

pip install boto3

Making API Calls to Amazon Bedrock

When you send a prompt to a model and receive a response, that process is called inference. You provide input, the model runs its computation, and it generates output.

For AI powered applications, you need to be able to run inference against models programmatically through an API. Bedrock exposes a set of APIs you can use. Let's start with the Converse API, which is the standard way to call models on Bedrock.

The Converse API uses the same standard request format regardless of which model you're talking to. That means you can switch from Amazon Nova to Meta Llama to Anthropic Claude Haiku but still use the same API.

Here's a complete first API call to Amazon Bedrock:

import boto3
import json

def use_converse_api():
    bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
    model_id = "us.amazon.nova-lite-v1:0"

    # Define a system prompt to set model behavior
    system_prompt = [
        {
            "text": "You are a helpful technical assistant who explains concepts clearly and concisely."
        }
    ]

    # User message
    user_message = "What is serverless computing?"

    # Use the Converse API
    response = bedrock_runtime.converse(
        modelId=model_id,
        system=system_prompt,
        messages=[
            {
                "role": "user",
                "content": [{"text": user_message}]
            }
        ],
        inferenceConfig={
            "temperature": 0.7,
            "maxTokens": 2000
        }
    )

    # Extract the response
    output_text = response['output']['message']['content'][0]['text']
    print(output_text)

    # Display token usage
    usage = response.get('usage', {})
    print(f"Input tokens: {usage.get('inputTokens', 'N/A')}")
    print(f"Output tokens: {usage.get('outputTokens', 'N/A')}")

if __name__ == "__main__":
    use_converse_api()

Let's break down the structure of this API call, because you'll see the same pattern throughout the rest of the examples:

At the top, we import boto3 and create a bedrock-runtime client. This client is how your Python code communicates with the Bedrock service over the network.

Then we define the model_id. We're using Amazon Nova Lite, a fast and cost-efficient model. Every model in Bedrock has a unique ID. You can find the full list in the supported model IDs documentation.

The call to converse() has three main parts:

system: The system prompt defines the model's role and behavior. Think of it as instructions for how the model should respond. The system prompt is sent with every inference request.
messages: The conversation between the user and the model. Each message has a role (either "user" or "assistant") and content. This structure lets the model understand who said what. In a real application, the user message would come from a frontend, a mobile app, or command line input. We're hardcoding it here to keep things simple.
inferenceConfig: Parameters that control how the model generates its response. temperature controls how random or creative the output is. Set it to 0.0 and you get the most predictable response every time, which is useful for tasks like classification or data extraction. Push it higher toward 1.0 and the output gets more varied, which works better for creative writing or brainstorming. maxTokens caps how long the response can be. Different models support different inference parameters, so check the documentation for the specific model you're using.

The Converse API is the recommended approach because it works the same across all models. Change the modelId from Nova to Llama to Mistral, and your code still works.

Understanding Tokens

Notice in the code above we printed token usage at the end of the script. Before we go further, you need to understand what tokens are, because they directly affect how much you pay.

A token is a small chunk of text. It might be a whole word, part of a word, or even punctuation. Different models break text into tokens in slightly different ways, and there is no universal standard. When you send a prompt to a model, your text gets broken into tokens. The model processes those tokens and generates new tokens as its response.

A short sentence like "What is serverless computing?" gets broken into several tokens. Longer prompts mean more input tokens. Longer responses mean more output tokens. You're billed for both, so the size of your prompt and the length of the model's response directly affect cost. Always set maxTokens to prevent runaway responses from driving up your bill.

Every model also has a context window, which is the maximum number of tokens it can handle in a single request. This is the model's working memory. Your input tokens and output tokens all need to fit inside the context window. If you exceed the window, then the API returns an error because it cannot process that many tokens in one call. This becomes important for long conversations and applications where you inject large amounts of data into the prompt for the model to reason over.

You can use the Bedrock pricing page to understand token costs for different models.

Multi-Turn Conversations

Up to this point, we've done single-turn interactions: one prompt, one response. But real applications usually need ongoing conversations where the model remembers what was said earlier.

Here's the thing though: models are stateless by design. Each API call is completely independent and the model doesn't remember anything from previous requests. You need to explicitly send the full conversation history with every call.

This is how all AI powered chat applications work. It seems like they remember everything you talked about between prompts, but that is only because the conversation history is collected and submitted into context through the prompt with every request.

That means when you are writing apps that need multi-turn conversations, your code is responsible for managing and sending the full context. Let's build a cooking assistant that demonstrates three conversation turns:

import boto3

def multi_turn_conversation():
    bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
    model_id = "us.amazon.nova-lite-v1:0"

    # System prompt sets the assistant's behavior
    system_prompt = [
        {
            "text": "You are a helpful cooking assistant. Provide concise recipe suggestions."
        }
    ]

    # Conversation history - we'll build this up with each turn
    conversation_history = []

    # Turn 1: Ask for recipe suggestions
    user_message_1 = "Suggest a quick dinner recipe with chicken."

    conversation_history.append({
        "role": "user",
        "content": [{"text": user_message_1}]
    })

    response_1 = bedrock_runtime.converse(
        modelId=model_id,
        system=system_prompt,
        messages=conversation_history,
        inferenceConfig={"temperature": 0.7, "maxTokens": 200}
    )

    assistant_message_1 = response_1['output']['message']['content'][0]['text']

    # Add assistant's response to history
    conversation_history.append({
        "role": "assistant",
        "content": [{"text": assistant_message_1}]
    })

    # Turn 2: Ask for modifications
    user_message_2 = "Can you make it vegetarian instead?"

    conversation_history.append({
        "role": "user",
        "content": [{"text": user_message_2}]
    })

    response_2 = bedrock_runtime.converse(
        modelId=model_id,
        system=system_prompt,
        messages=conversation_history,
        inferenceConfig={"temperature": 0.7, "maxTokens": 200}
    )

    assistant_message_2 = response_2['output']['message']['content'][0]['text']

    conversation_history.append({
        "role": "assistant",
        "content": [{"text": assistant_message_2}]
    })

    # Turn 3: Ask for cooking time
    user_message_3 = "How long will this take to prepare?"

    conversation_history.append({
        "role": "user",
        "content": [{"text": user_message_3}]
    })

    response_3 = bedrock_runtime.converse(
        modelId=model_id,
        system=system_prompt,
        messages=conversation_history,
        inferenceConfig={"temperature": 0.7, "maxTokens": 200}
    )

    assistant_message_3 = response_3['output']['message']['content'][0]['text']
    print(assistant_message_3)

if __name__ == "__main__":
    multi_turn_conversation()

The pattern is the same for every turn: append the user message to the conversation history, call the Converse API with the full history, then append the assistant's response back to the history.

The model can reference what was said in turn 1 when responding to turn 2, but only because you're resending everything. You're paying for those tokens each time too, which is why conversation history management matters for cost.

In production, you'd store conversation history somewhere persistent, like a database. When a user returns, you load their history and continue where they left off.

Showing you how to use the Converse API like this is essentially doing it the hard way, and we're doing this on purpose for learning purposes. In a real application, you also wouldn't have redundant code like this. You'd refactor common code into functions and collect user input dynamically.

There are higher-level libraries and frameworks that can handle a lot of that complexity for you, including managing the message history and formatting the request body. But we're working with the Bedrock APIs directly for now so you understand exactly how Bedrock and AI models actually work. Later, when I show you the simpler way using the Strands Agents SDK, you'll fully understand what's happening under the hood.

Tool Use (Function Calling)

Everything we've done has been purely text in, text out. The model generates a response based on its training data and whatever you include in the prompt. But this is problematic for real world usage.

You can’t rely on training data alone. Models have a knowledge cutoff based on when they were trained, and they don’t have access to real-time or external data like today’s weather, live content from the internet, or data stored in databases.

They don't know what's happening right now, and they can't take actions in the real world on their own.

That's where tool use comes in. Tools are functions that a model can request your application to run in order to interact with external systems. The model doesn't execute tools itself. It sends a structured request saying "I want to call this function with these arguments," and your code handles the actual execution.

This is how most modern AI applications work. A chatbot that does research for you using the internet? That's tool use. A coding assistant that reads files from your local disk? Tool use. A personal assistant bot that checks your calendar? Also tool use.

Now, this does get a bit involved when you're doing everything the hard way, but stick with me. This is important to understand when you are building a foundational understanding of how AI works.

Think of it like this:

1. You create tools in your code.
2. You write code to inform the model what tools exist and how to use them, this is often called a tool schema.
3. You send the model a prompt along with the tool schema.
4. The model reasons over the prompt and decides if it needs a tool to answer.
5. If it does need a tool, the model returns a response to your application code including information on which tool to call and with what arguments.
6. Your code runs the tool.
7. Your code sends the result of the tool call back to the model.
8. The model reasons over the tool result and works that information into its final response.

Here's the full tool use example following this flow:

import json
import boto3

# ---------------------------------------------------------------------------
# Step 1: Define your local Python functions
# ---------------------------------------------------------------------------
# These are regular Python functions. The model will never call them directly.
# Instead, the model will ASK us to call them by returning a tool_use block.

def get_weather(location, unit="fahrenheit"):
    """
    Simulate fetching weather data for a location.
    In a real app, this would call a weather API like OpenWeatherMap.
    """
    weather_data = {
        "location": location,
        "temperature": 58 if unit == "fahrenheit" else 14,
        "unit": unit,
        "condition": "Partly cloudy",
        "humidity": "72%",
        "wind": "8 mph NW",
    }
    return weather_data


# ---------------------------------------------------------------------------
# Step 2: Describe your functions as "tools" for the model
# ---------------------------------------------------------------------------
# The model needs a description of each tool so it knows:
#   - What the tool does (description)
#   - What inputs it expects (inputSchema)
#
# This is like writing documentation so someone else can use your function.

TOOL_CONFIG = {
    "tools": [
        {
            "toolSpec": {
                "name": "get_weather",
                "description": "Get the current weather for a given location.",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. 'San Francisco, CA'",
                            },
                            "unit": {
                                "type": "string",
                                "enum": ["fahrenheit", "celsius"],
                                "description": "Temperature unit (default: fahrenheit)",
                            },
                        },
                        "required": ["location"],
                    }
                },
            }
        }
    ]
}


# ---------------------------------------------------------------------------
# Step 3: Map tool names to actual Python functions
# ---------------------------------------------------------------------------
# When the model asks to use a tool, we look up the function by name here.

TOOL_FUNCTIONS = {
    "get_weather": get_weather,
}


def run_tool(tool_name, tool_input):
    """
    Look up a tool by name and call it with the provided input.
    Returns the result as a dictionary.
    """
    func = TOOL_FUNCTIONS.get(tool_name)
    if func is None:
        return {"error": f"Unknown tool: {tool_name}"}

    # ** unpacks the dict into keyword arguments:
    #   get_weather(**{"location": "Seattle"})  →  get_weather(location="Seattle")
    return func(**tool_input)


# ---------------------------------------------------------------------------
# Step 4: The main tool use loop
# ---------------------------------------------------------------------------

def tool_use_demo():
    bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
    model_id = "us.amazon.nova-lite-v1:0"

    user_message = "What's the weather like in Seattle right now?"

    messages = [
        {
            "role": "user",
            "content": [{"text": user_message}],
        }
    ]

    # First API call: send the message AND the tool definitions to the model
    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        toolConfig=TOOL_CONFIG,
        inferenceConfig={"temperature": 0.0, "maxTokens": 300},
    )

    stop_reason = response["stopReason"]
    assistant_message = response["output"]["message"]

    # Check: did the model ask to use a tool?
    if stop_reason == "tool_use":
        # Find the toolUse block in the response
        tool_use_block = None
        for block in assistant_message["content"]:
            if "toolUse" in block:
                tool_use_block = block["toolUse"]
                break

        tool_name = tool_use_block["name"]
        tool_input = tool_use_block["input"]
        tool_use_id = tool_use_block["toolUseId"]

        # Run the actual function
        result = run_tool(tool_name, tool_input)

        # Send the result back to the model
        messages.append(assistant_message)
        messages.append({
            "role": "user",
            "content": [
                {
                    "toolResult": {
                        "toolUseId": tool_use_id,
                        "content": [{"json": result}],
                    }
                }
            ],
        })

        # Second API call: model generates its final answer using the tool result
        final_response = bedrock.converse(
            modelId=model_id,
            messages=messages,
            toolConfig=TOOL_CONFIG,
            inferenceConfig={"temperature": 0.0, "maxTokens": 300},
        )

        final_text = final_response["output"]["message"]["content"][0]["text"]
        print(final_text)

if __name__ == "__main__":
    tool_use_demo()

Let's walk through what's happening:

Step 1 is defining the actual Python function. The tool in this case is a local function that simulates fetching weather data. In the real world, you'd swap this out by connecting it to a real API. The model will never call this function directly.

Step 2 is creating a tool schema that describes the function to the model. Think of this like writing documentation so the model knows how to use it. We give the tool a name, a description in natural language, and an input schema that lays out what parameters the tool accepts, their types, and whether they're required.

Step 3 is a dictionary that maps tool names to actual functions. When the model decides it needs a tool, it returns the name of the tool it wants to call. We need to be able to look that up and figure out which function to run. The run_tool function handles this dispatch.

Step 4 is the main loop. We call the Converse API with the user message and the tool config. The model sees the question, sees the available tools, and decides it needs the weather tool. It returns a tool_use block with the function name and arguments. Our code runs the actual function, then sends the result back to the model in a toolResult message. The model uses that real data to generate its final response.

The tool itself can be anything: a local function, an API call to another service, a database query, or a function running in the cloud. The pattern stays the same.

For more details, see the Bedrock tool use documentation.

RAG and Knowledge Bases

Tools are great, but one of the most common use cases for integrating AI into applications is to have it be able to reason over private data, but models don't have access to this data by default.

Models don't have access to your company's internal documentation, your product specs, or any of your proprietary data. If you ask a model about a companies internal processes, it's going to hallucinate something that seems plausible but is actually completely made up.

Retrieval Augmented Generation (RAG) is the common fix for this. The concept is simple: before you ask the model to generate an answer, you first search your own documents for relevant information. Then you include that data in the prompt. The model generates its response grounded in your actual data instead of relying only on what it learned during training.

Retrieve the data, augment the prompt, generate the response. That's where the abbreviation RAG comes from.

How the Retrieval Part Works

The retrieval step uses semantic search, which is different from traditional keyword search. Keyword search looks for exact word matches, while semantic search understands the meaning of the text and searches on that instead.

If your document says "customers can return items within 30 days," semantic search will find it when someone asks about "refund window" or "return period," even though those exact words don't appear. The words "queen" and "king" aren't a direct match either, but they're semantically similar because they both represent royalty. Semantic search finds that relationship but traditional search would not.

To make semantic search work, your data needs to be converted into numbers, or vectors, so the computer can compare meaning mathematically. Here's how the pipeline works:

Upload and Chunking: Upload your documents and then break them into smaller passages called chunks. A 50-page PDF would become many chunks. There are different chunking methods depending on your use case and data structure.
Embedding: Each chunk gets run through an embedding model, which converts the text into a vector, or a list of numbers that represents the meaning of that text. Think of it as a numerical fingerprint of what the text is about.
Storage: Those vectors get stored in a vector database, optimized for searching across vectors quickly.
Retrieval: When a user asks a question, that question also gets converted into a vector. The vector database queries the data and finds the chunks whose vectors are closest to the question's vector semantically. Those are your most relevant passages.
Generation: The relevant passages get included in the prompt passed to the model, and the model generates an answer grounded in that data.

RAG is powerful, but there's a lot of plumbing involved to make it all work. You have to manage the chunking strategy, run embeddings, pick and maintain a vector database, write retrieval logic, and keep everything in sync when documents change. Luckily, Bedrock does this for you.

Bedrock Knowledge Bases

Bedrock Knowledge Bases automate the entire RAG pipeline. You point it at your documents in S3 (or other sources like Confluence, SharePoint, or Salesforce), and it handles ingestion, chunking, embedding, and vector storage. Then you query it with a single API call.

For example, what if a university wanted to make a chatbot to help students find answers to frequently asked questions about course enrollment deadlines, financial aid policies, and general campus information. That is a perfect use case for RAG.

We'll incrementally build this university chatbot example throughout the rest of this post starting with knowledge base creation. If you want to follow along, use the companion repo which contains the full instructions. This includes sample FAQ documents about enrollment, financial aid, housing, and campus services that we will use as the private data.

To create a Knowledge Base, you need to upload your documents to Amazon S3, a data storage service, first. You can go to the Amazon S3 Console and create a new bucket. Then, upload the knowledge base documents to the bucket.

After that, go to the Bedrock console to Create a knowledge base. You'll connect the S3 bucket containing the FAQ documents, choose an embedding model (Amazon Titan Embed is a good default), and select a vector store. If you're just getting started, Amazon S3 Vectors is the simplest and most cost-effective option since it doesn't require you to provision a separate vector database. Then sync the data.

Once your Knowledge Base is created and synced, querying it is a single API call to Bedrock using retrieve_and_generate:

import boto3

# REPLACE THIS with your Knowledge Base ID
KNOWLEDGE_BASE_ID = "YOUR_KB_ID"  # From the Bedrock console
MODEL_ID = "us.amazon.nova-lite-v1:0"

def query_knowledge_base(question):
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

    # Use retrieve_and_generate to query the Knowledge Base
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': question
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': KNOWLEDGE_BASE_ID,
                'modelArn': MODEL_ID
            }
        }
    )

    # Extract the generated response
    output_text = response['output']['text']
    print(output_text)

    # Display source citations
    citations = response.get('citations', [])
    if citations:
        print("\nSources:")
        for idx, citation in enumerate(citations, 1):
            for reference in citation.get('retrievedReferences', []):
                location = reference.get('location', {})
                s3_location = location.get('s3Location', {})
                uri = s3_location.get('uri', 'Unknown')
                print(f"  [{idx}] {uri}")

if __name__ == "__main__":
    question = "When is spring break this year?"
    query_knowledge_base(question)

Notice the client is bedrock-agent-runtime, not bedrock-runtime. Knowledge Bases use a different API from the Converse API we've been working with.

The retrieve_and_generate call does both RAG steps in one step: it retrieves relevant documents using semantic search, then passes them to the model to generate a response. You get both the answer and citations pointing back to the source documents, so your users can verify where the information came from.

For more on creating and configuring Knowledge Bases, see the Bedrock Knowledge Bases documentation.

Guardrails (Content Safety)

You now know how to build an AI app that can access your data using RAG and interact with external systems through tools. But before you put something like this in front of real users, you need to think about what happens when people try to misuse it or when the model generates something it shouldn't.

When you put an AI application on the internet, you have to assume it will be abused. You can't fully trust user input, and you can't blindly trust model output either.

Guardrails are content filters that get enforced before and after the model is called. They sit outside the prompt as structural policies. You configure your guardrails once, reference them in your API calls, and they work across any model.

Available filters include:

Content filters: Detect harmful content like hate speech, violence, sexual content, and even prompt attacks like jailbreaks or prompt injection attempts, with adjustable severity thresholds
Denied topics: Block entire categories like "investment advice" or "medical diagnosis"
Word filters: Block specific words or phrases, including profanity
Sensitive information filters: Find and mask sensitive data like PII, social security numbers, credit cards, and email addresses
Contextual grounding checks: check model responses against a reference source to reduce hallucinations

To create a Guardrail, you can use the Bedrock console. You give the guardrail a name, configure the content filters with severity thresholds that make sense for your use case, and optionally add denied topics or PII filters. Once configured, create a version to get a guardrail ID and version number.

For the university chatbot, imagine a student tries to ask the assistant something inappropriate, like how to cheat on an exam or how to hack the university network. A guardrail can detect that type of request and block it before the model ever generates a response.

Here's how you add it to a Knowledge Base query:

import boto3

# REPLACE THESE with your actual IDs
KNOWLEDGE_BASE_ID = "YOUR_KB_ID"
GUARDRAIL_ID = "YOUR_GUARDRAIL_ID"
GUARDRAIL_VERSION = "1"
MODEL_ID = "us.amazon.nova-lite-v1:0"


def query_kb_with_guardrail(question):
    bedrock_agent = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

    response = bedrock_agent.retrieve_and_generate(
        input={"text": question},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": KNOWLEDGE_BASE_ID,
                "modelArn": MODEL_ID,
                "generationConfiguration": {
                    "guardrailConfiguration": {
                        "guardrailId": GUARDRAIL_ID,
                        "guardrailVersion": GUARDRAIL_VERSION,
                    },
                },
            },
        },
    )

    print(response["output"]["text"])


if __name__ == "__main__":
    question = "How can I cheat on my finals this year?"
    query_kb_with_guardrail(question)

To use a guardrail, all you have to do is add a generationConfiguration with the guardrail identifier and version number inside the knowledge base configuration. Everything else in your code stays exactly the same.

A normal question like "When is spring break?" passes through and gets answered normally. But "How can I cheat on my finals?" gets blocked by the guardrail before the model ever generates a response.

You can also add guardrails directly to Converse API calls using the guardrailConfig parameter:

response = bedrock_runtime.converse(
    modelId=model_id,
    messages=messages,
    guardrailConfig={
        'guardrailIdentifier': 'your-guardrail-id',
        'guardrailVersion': '1'
    }
)

For more details on guardrail configuration options, see the Bedrock Guardrails documentation.

Putting It All Together with an Agent

We've been doing everything the hard way on purpose so you could see how the pieces actually work. You might be thinking "that tool use code was a lot of work for one function call" and you'd be right.

Managing the message history, parsing tool requests, executing functions, and sending results back to the model manually is tedious. And it gets more complicated when the model needs multiple tools or several steps to complete a task.

Additionally, real-world applications need more than a single prompt and response with hardcoded user queries. You need to take dynamic input from the user and pass it to the model. Then the model might need to look up information, call tools, and take several steps before it can answer a question.

An agent is a system that lets the model do this. Instead of taking in one prompt and responding one time, it can think through the problem, decide what action to take next, use tools if needed, and repeat that process until it reaches a final answer. Under the hood, the model may be called multiple times as part of a loop until the task is complete.

Now we are going to switch to a using higher-level framework that handles much of the complexity around building AI applications for you.

Strands Agents SDK is an open-source framework from AWS that makes building AI agents straightforward. It integrates directly with Bedrock, though it supports any model provider, and handles the orchestration we've been doing manually.

Under the hood, when you use Amazon Bedrock as the model provider, it's calling the same Converse API we've been using throughout this post. That's why it was worth learning the fundamentals first. This should all make sense now rather than feeling like magic.

To get started with Strands, install the packages:

pip install strands-agents strands-agents-tools

Here's the university chatbot as a Strands agent that combines everything we covered: a Bedrock model, a Knowledge Base for university data, a custom tool for course lookups, and a guardrail for content safety:

import os
from strands import Agent, tool
from strands.models import BedrockModel
from strands_tools import retrieve

# ============================================================
# Configuration — Replace these with your resource IDs
# ============================================================

KNOWLEDGE_BASE_ID = "YOUR_KB_ID"
GUARDRAIL_ID = "YOUR_GUARDRAIL_ID"
GUARDRAIL_VERSION = "1"
MODEL_ID = "us.amazon.nova-lite-v1:0"
REGION = "us-east-1"


# ============================================================
# Custom Tool: Look Up Course Schedule
# ============================================================

@tool
def lookup_course(department: str, course_number: str) -> str:
    """Look up schedule and details for a specific course.

    Use this when a student asks about a particular class,
    like "When does CS 201 meet?" or "Who teaches BIO 101?"

    Args:
        department: The department code (e.g., "CS", "BIO", "ENG").
        course_number: The course number (e.g., "101", "201").

    Returns:
        Course details including schedule, instructor, and location.
    """
    # In a real app this would query a course catalog API
    courses = {
        "CS-101": {
            "title": "Introduction to Programming",
            "instructor": "Dr. Maria Chen",
            "schedule": "Mon/Wed/Fri 10:00 - 10:50 AM",
            "location": "Turing Engineering Building, Room 210",
            "credits": 3,
            "seats_available": 12,
        },
        "CS-201": {
            "title": "Data Structures",
            "instructor": "Prof. James Park",
            "schedule": "Tue/Thu 1:00 - 2:15 PM",
            "location": "Turing Engineering Building, Room 215",
            "credits": 3,
            "seats_available": 5,
        },
        "BIO-101": {
            "title": "General Biology I",
            "instructor": "Dr. Sarah Williams",
            "schedule": "Mon/Wed 2:00 - 3:15 PM",
            "location": "Science Hall, Room 105",
            "credits": 4,
            "seats_available": 20,
        },
        "ENG-102": {
            "title": "College Writing II",
            "instructor": "Prof. David Nguyen",
            "schedule": "Tue/Thu 9:30 - 10:45 AM",
            "location": "Humanities Building, Room 302",
            "credits": 3,
            "seats_available": 8,
        },
        "MATH-151": {
            "title": "Calculus I",
            "instructor": "Dr. Lisa Patel",
            "schedule": "Mon/Wed/Fri 11:00 - 11:50 AM",
            "location": "Math & Science Center, Room 120",
            "credits": 4,
            "seats_available": 15,
        },
    }

    key = f"{department.upper()}-{course_number}"
    if key in courses:
        c = courses[key]
        return (
            f"Course: {key} — {c['title']}\n"
            f"Instructor: {c['instructor']}\n"
            f"Schedule: {c['schedule']}\n"
            f"Location: {c['location']}\n"
            f"Credits: {c['credits']}\n"
            f"Seats available: {c['seats_available']}"
        )

    return f"No course found for {key}. Check the department code and course number."


# ============================================================
# Build the Agent
# ============================================================

def create_university_agent():
    """Create the University chatbot agent."""

    # The built-in retrieve tool reads this env var to find the KB
    os.environ["KNOWLEDGE_BASE_ID"] = KNOWLEDGE_BASE_ID
    os.environ["AWS_REGION"] = REGION

    bedrock_model = BedrockModel(
        model_id=MODEL_ID,
        region_name=REGION,
        temperature=0.3,
        max_tokens=2000,
        guardrail_id=GUARDRAIL_ID,
        guardrail_version=GUARDRAIL_VERSION
    )

    system_prompt = """You are the University virtual assistant.
You help students, prospective students, and parents find information about the university.

Your responsibilities:
- Answer questions about academics, admissions, financial aid, housing, dining, parking, the library, career services, and the academic calendar.
- Use the retrieve tool to search the knowledge base for university policies and FAQ answers before responding.
- Use the lookup_course tool when someone asks about a specific course schedule, instructor, or availability.
- Cite your sources when referencing specific policies or dates.

Guidelines:
- Be friendly and welcoming — remember, students may be stressed about deadlines.
- If you don't know the answer, say so and suggest they contact the relevant office.
- Keep answers concise and helpful."""

    agent = Agent(
        model=bedrock_model,
        tools=[retrieve, lookup_course],
        system_prompt=system_prompt,
    )

    return agent


# ============================================================
# Run the Agent
# ============================================================

def main():
    print("University Chatbot")
    print("=" * 60)
    print("Ask me about admissions, financial aid, housing, dining,")
    print("course schedules, the academic calendar, and more.")
    print("\nType 'quit' to exit.\n")

    agent = create_university_agent()

    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break

        print("\nAssistant: ", end="", flush=True)
        response = agent(user_input)
        print(f"\n{response}\n")


if __name__ == "__main__":
    main()

Let's walk through what's different here compared to the manual approach.

At the top, we import a few things from the Strands framework: Agent, the @tool decorator for creating custom tools, BedrockModel for the model provider, and the built-in retrieve tool from strands_tools which queries the Knowledge Base we created earlier.

Then we define our configuration. We need a Knowledge Base ID for RAG, a Guardrail ID for content filtering, and we're using Amazon Nova as our model. All things we've already set up and used individually.

The @tool decorator is how you define custom tools in Strands. We've got a lookup_course tool that simulates looking up course information. In a real application, this would query a database. Compare this to the manual tool use code from earlier and you'll notice there is no lengthy tool schemas, message parsing, or dispatch logic. You just write a function with a docstring and Strands handles the rest.

Strands also includes a built-in retrieve tool that works directly with Bedrock Knowledge Bases. You set the Knowledge Base ID as an environment variable, and the agent decides when to use it.

We created a BedrockModel instance with the model ID, region, inference parameters, and guardrail information. Then we defined the system prompt telling the agent it's a university chatbot and how it should handle requests. Finally, we created the agent with the model, the tools list (both custom and built-in), and the system prompt.

The last piece is the interactive loop. We read input from the command line and passed it to the agent. To call the agent, all you need is agent(user_input). The framework handles the entire agent loop: when the model needs a tool, Strands executes it and sends the result back to the model.

Multi-turn conversation management is handled too because each call maintains context from previous turns as long as the program is running.

Under the hood, Strands is calling the Converse API and using the different Bedrock features we covered throughout. This should all make a lot more sense now than if you jumped right into the agent framework.

The Strands documentation has more examples and configuration options.

What's Next

We covered a lot of ground. You now have the knowledge you need to start building real applications with AI on AWS using Amazon Bedrock.

Here are some areas to explore as your application grows:

Prompt caching can significantly reduce costs on repeated context. If you have a large system prompt or tool definitions that don't change between requests, caching avoids reprocessing those tokens every time.
Cross-region inference distributes load across AWS regions to balance the inference load. Instead of hitting limits in one region and failing, Bedrock can route your requests globally.
CloudWatch monitoring tracks token usage, latency, throttling, and error rates. Setting up monitoring early helps you catch cost spikes and performance issues before they become problems.

The companion repo has the complete code for every example in this post. Clone it, run the examples, and try adapting them to your own use case. Take one of these examples and adapt it to a new use case to push your learning even further and remember to always learn the fundamentals first.

Top comments (12)

Archit Mittal • Apr 16

Solid walkthrough for getting started. One thing I'd add for beginners: before committing to Bedrock, it's worth benchmarking your specific use case across providers. I've seen cases where the same task costs 3-5x more on one provider vs another, not because of model pricing but because of how token usage scales with different prompt structures. Bedrock's model-agnostic API is great for this — you can swap between Claude, Llama, and Mistral with minimal code changes and find the cost-performance sweet spot for your specific workload. The agent setup section is especially useful since that's where most tutorials fall short.

Morgan Willis AWS • Apr 16

Yup, good callout here.

0x41414141 • Apr 16

great post

Morgan Willis AWS • Apr 16

thank you!

Bhagwat Kshirsagar • Apr 15

Very important information with code. Thanks

Morgan Willis AWS • Apr 15

And you can run it in a free tier account!

Bhagwat Kshirsagar • Apr 15

I have learnt that Amazon Bedrock needs billing to access any FM . Still, it's free . Unbelievable. Thanks. (Recently, I have tried Boto 3 and Amazon Bedrock on Datacamp platform without my AWS account)

Valentin Monteiro • Apr 15

The Knowledge Bases section is clean for getting started, but that "one API call" convenience hides the part that matters most in prod: how your docs get chunked. If your chunks split mid-paragraph or mix two topics, the retrieval pulls garbage and the model confidently answers with it. Worth mentioning that controlling chunk boundaries and testing retrieval quality before scaling is where most RAG projects actually succeed or fail.

Morgan Willis AWS • Apr 15

A great point. In bedrock the chunking strategy is created when you create the knowledge base, so the one API call remains the same as far as app integration goes. However, I fully agree what you’re saying here. And something for a beginner to understand so when they hit a wall with RAG they know what to look into. All good call outs!

Mykola Kondratiuk • Apr 17

the gap between first-prompt and production-agent is where most tutorials stop. good that this one keeps going - most real apps live in the model selection + data integration layer anyway.

Darryl Ruggles • Apr 15

A great summary and starting point!

Morgan Willis AWS • Apr 15

✨✨✨

View full discussion (12 comments)