Olivier Lemaitre

Posted on Sep 14 • Edited on Sep 22

Understand how AI Agents work, with AWS Strands

#ai #genai #aws #architecture

What are AI Agents? Why are they so popular these days? What's the mechanics behind them? Do we have to learn a lot of new skills to build one?

I found myself asking these same questions recently.

Then I discovered Strands Agent (provided by AWS), and I found it so simple and interesting to answer all these questions that I wanted to share my discoveries here.

How does the world work without agents?

I was inspired by a simple, but good example in which an agent could be useful: Booking a table at a restaurant online.

First, let's see how you could potentially execute that workflow WITHOUT an AI agent.

1. You open your browser
2. You look for a restaurant near the place you want to go
3. You select some restaurants and look at the menu
4. You select the restaurant with your food preferences
5. You fill in a form (your name, date & time, ...)
6. You validate your reservation

N.B.: Our job as engineers is to create value (i.e. make users' lives easier), and shortening workflows is a very common way to create value. Actually, AI agents can shorten workflows; that's certainly why they are valuable. So, let's explore this further.

What can an AI agent do?

A Simple conversation example

With an AI agent we could imagine this conversation:

You : What is the best restaurant serving veggie burgers in Paris?
The Agent: Green Farmer's is the best veggie burger restaurant in Paris

You : Reserve a table at Green Farmer's tomorrow at 7pm please.
The Agent: ok, table is booked.

How is it possible? I don't have to make a search on the web, I don't even have to fill in a form or click on a button to validate!

That's because the agent will take care of that for you.

The use of tools behind the scenes

Technically, here is the generic and simplified mechanics of agents (as I understand it)

Let's take a concrete example.

A simple addition example

Imagine that you want to create an agent that computes additions.

Here is what happens.

After I send my prompt "compute 1 + 1":

1 - The agent sends the prompt to the LLM with the available tools (i.e. my add function), and the LLM answers with the tools to use if necessary (in this case it will ask using the add() tool)

2 - The agent calls the tools (in my case add(1+1)) and gets the answer

3 - The agent sends the context and the tools answer, so it can create the final response (in my case 1 + 1 = 2)

Of course, this simple addition can be done with the LLM without any tool, but only because the result is obvious and present in the model they are trained with.

However if you want to compute more complex things (a sinusoidal function for example), I would recommend not to rely too much on the NON deterministic output of the LLMs :)

A simple table booking example

So, an LLM needs tools to do things they cannot do on their own, like computing complex things, searching the web or writing to a database for example.

For instance, when we say "Reserve a table at Green Farmer's tomorrow at 7pm please.", here is what could be the mechanic behind the scenes:

In this case the LLM needs a tool to book the table at the restaurant (it cannot do it on its own), that's why we have a book() tool which is actually a function that executes standard code (e.g. python).

We could also add tools to search the web or tell when is "tomorrow" (which is something an LLM will never know).

In the example above, the agent process would be this one:
1 - Send the prompt to the LLM specifying that a book() tool is available
2 - Call the book() tool if the LLM answers to do so
3 - Send the tool answer to the LLM which creates a final response

That's it. So far you should have understood the basics of agents.

How to build an agent with Strands?

With your new knowledge in mind, you could code your own agents from scratch.

However, it's best to use a framework to handle the plumbing behind the scenes and actually AWS has it's own framework: Strands Agents.

Let's take the table reservation and let's see what would be the components to declare with Strands.

Actually a Strands agent is a Python class that you initiate with 3 main parameters:

The LLM model that you want to use (Claude4Sonnet for example)
The Tool list (added the tool registry) that contains all the tools we can send the the LLM so it can make a choice
The system prompts which defines the accurate role of the agent. For example "You are a restaurant assistant helping customers reserve a table...", to make sure it won't respond off topic.

Here is what the Python code of a Strands AI agent could contain

...
@tool
def create_booking(tool: ToolUse, **kwargs: Any) -> ToolResult:
  ...
...
system_prompt = "You are a restaurant assistant ..."
...
agent = Agent(
    model=model,
    system_prompt=system_prompt,
    tools=[create_booking, delete_booking],
)
...
result = agent("Reserve a table at Green Farmer's tomorrow at 7pm please.")

That's it. Now you know how to build an agent with Strands, and that's not very difficult as you may see. The difficulty resides more in the writing of the system prompt and the writing of the tools code. This is where you should focus on from my point of view.

Some more capabilities for agents & Strands

Call external systems like databases

Previously the "book(...)" tool was supposed to book a table at the restaurant. That could be done by calling an external API or/and writing to a database for instance.

Here is a representation of a tool calling a database. This database could be any kind of database like DynamoDB, RDS MySQL, ...

Call remote tools with MCP (Model Context Protocol)

Your tools (i.e your functions) can be called "locally", inside your agent process, but they can be called outside the agent process by calling a server.

This server can run locally (through stdio protocol) or remotely (through streamable http protocol).

We can see an MCP server like a classic http server that can communicate with an AI agent. The AI Agent can call this server to list the tools that it provides for example.

Strands integrate with MCP Servers as we can see below.

Agent to Agent (A2A) with Strands

We can imagine that an agent not only uses tools, but... other agents.

These agents actually like tools, but they contain a Strand agent.

We can imagine specialized agents with their own system prompts that interact with the LLM to achieve a task. For example, I can have an agent to reserve a restaurant, but also an agent to plan a trip:

You: Reserve a table at Green Farmer's tomorrow at 7pm please and show me how to get there from Gare du Nord, Paris.

Agent: ok, table is booked. Here is the best way to get there...

Below is a representation of this pattern. Note that we need an "orchestrator agent" to take care of the orchestration between agents.

N.B.: Strands supports the A2A protocol which runs on top of HTTP, exactly like the MCP protocol. We can, by the way, mix A2A and MCP in the same agent depending on the sophistication you need (i.e. MCP is simpler).

Deploy and expose your AI Agent to the world

When you are happy with an agent, you can deploy it on a server and call it through an API.

You can run the AI Agent and its tool in a Docker container (using ECS/Fargate for example) or in a Lambda function as we can see below.

We could also do something more complex, calling tools outside the containers using MCP.

My first AI Agent with Strands

I couldn't resist the idea of generating a Strands agent from a Drawio Diagram with Amazon Q Developer.

That wasn't straight forward, it's not easy to reproduce, but that was really fun and full of learnings. So I decided to share this part as well.

Here is what I wanted to achieve for my first AI agent

It's a simple Strands AI agent. I want it to... reserve a table at a restaurant :)

First I drew the diagram above and I created a rule file that I called 'ProfessionalTwin', where I describe what I use as a professional.

- I use python 3.10
- I use AWS CDK V2
- I use AWS Strands SDK for AI Agents

Then I asked Q developer

@diagram.drawio.xml generate application

It generated the application, but introduced a few errors, and it didn't work right away, so I had to create some more files containing rules for each thing I had put in my Professional Twin file: CDKRules, PythonRules and StrandsRules.

Here is, for example, what StrandsRules contains:

When asked to use AWS Strands SDK, here are the rules

- use lambda layer for strands packages
- install strands-agents python module in lambda layer folder
- install strands-agents-tools python module in lambda layer folder
- Strands Agent Class parameters are model, tools, system_prompt
- create rules with @tool decorator
- create a system prompt that describes the role of the agent
- use "anthropic.claude-3-haiku-20240307-v1:0" model
- Allow lambda calling bedrock with stream response

After several trials I reused:

@diagram.drawio.xml generate application

That gave me a URL as an output, I clicked on it, and I could start a conversation with my assistant. I then could verify my reservation in my DynamoDB table. And I didn't write or modify a single line of code!

Don't get me wrong, I'm not saying we can generate everything from diagrams, but that can help bootstrap ideas.

For example I had no clue how to design and build the landing page. Amazon Q Developer created this in a few seconds!

Now I can iterate on this code.

I shared the code generation result in a GitHub repo.

You can also follow this little tutorial and try to make a generation on your side, improve it or just review and deploy the generated result.

Conclusion

Are AI Agents a revolution? Are they great? If so, why?

From the value creation perspective that seems to offer great perspective and that should shorten many processes. Combined with voice that should be impressive. And I clearly understand the enthusiasm around this.

From the technical point of view, I tend to think AI agents are traditional servers using the power of LLMs to process a query. That's one more component in the architecture.

Moreover, it's a NON-deterministic component that can have unexpected behaviors sometimes. That comes with a lot of technical questions and some challenges as well.

I guess a framework like Strands can really help answer some of them. I'm personally a big fan of its simplicity. That's a great discovery!

If you want to practice and understand more about AI Agents, I warmly recommend this workshop : Getting Started with Strands Agents

DEV Community