DEV Community

Cover image for Understand how AI Agents work, with AWS Strands
Olivier Lemaitre
Olivier Lemaitre

Posted on

Understand how AI Agents work, with AWS Strands

I often feel like I understand things slower than people around me.

At least before I can say something is good or great I need to understand what makes it so good or great. I also need to understand the mechanics behind it.

That's what happens with Agentic AI, I see a lot of enthusiasm around it, but I don't see anyone explaining why it's so nice.

Moreover, it's hard to understand (at least for me) the mechanics just by seeing terms like Tools, MCP, A2A, etc.

So, I tried to understand why AI agents are so great and what is the mechanic behind them. That's what I share in this blogpost.

N.B. : Many of the ideas were inspired by this workshop that I did end to end: Getting Started with Strands Agents

How does the world work without agents?

In the workshop I did, there is a simple, but good example of a workflow, where an agent could be useful: Book online a table at a restaurant.

Before we dive into how AI agents work, let's see how you could potentially do that booking WITHOUT an AI agent:

  1. You open your browser
  2. You look for a restaurant near the place you want to go
  3. You select some restaurants and look at the menu
  4. You select the restaurant with your food preferences
  5. You fill in a form (your name, date & time, ...)
  6. You validate your reservation

N.B.: Our job as engineers is to create value (i.e. make users' lives easier), and shortening workflows is a very common way of creating value. I guess AI agents should shorten workflows that's why it's valuable. Let's explore this further.

What can an AI agent do?

With an AI agent we could imagine this conversation:

You : What is the best restaurant serving veggie burgers in Paris?
Agent: Green Farmer's is the best veggie burger restaurant in Paris

You : Reserve a table at Green Farmer's tomorrow at 7pm please.
Agent: ok, table is booked.

How is it possible? I don't have to make a search on the web, I don't even have to fill in a form or click on a button to validate!

That's because the agent will take care of that for you.

Technically, here is the generic and simplified mechanics of agents (as I understand it)

Let's take a concrete example.

Imagine that you want to create an agent that computes additions.

Here is what happens.

After I send my prompt "compute 1 + 1":

1 - The agent sends the prompt to the LLM with the available tools (my add function), and the LLM answers with the tools to use if necessary (in this case it will ask me to use the add() tool)

2 - The agent calls the tools (in my case add(1+1)) and gets the answer

3 - The agent sends the context and the tools answer, so it can create the final response (1 + 1 = 2)

Of course, this simple addition can be done with the LLM without any tool, but only because the result is obvious and present in the model they are trained with.

However if you want to compute more complex things (a sinusoidal function for example), I would recommend not to rely too much on the NON deterministic output of the LLMs :)

So, an LLM needs tools to do things they cannot do on their own, like computing complex things, searching the web or writing to a database for example.

For instance, when we say "Reserve a table at Green Farmer's tomorrow at 7pm please.", here is what could be the mechanic behind the scenes with a "book()" tool:

We could also add tools to search the web or tell when is "tomorrow" (which is something an LLM will never know).

That's it. So far you should have understood the basics of agents.

With this mind, you could code your own agents from scratch.

However, it's best to use a framework to handle the plumbing behind the scenes and actually AWS has it's own framework: Strands Agents.

How to build an agent with Strands?

Let's take the table reservation and let's see what would be the components to declare with Strands.

Actually a Strands agent is a Python class that you initiate with 3 main parameters:

  • The LLM model that you want to use (Claude4Sonnet for example)

  • The Tool list (added the tool registry) that contains all the tools we can send the the LLM so it can make a choice

  • The system prompts which defines the accurate role of the agent. For example "You are a restaurant assistant helping customers reserve a table...", to make sure it won't respond off topic.

Here is what the Python code of a Strands AI agent could contain

...
@tool
def create_booking(tool: ToolUse, **kwargs: Any) -> ToolResult:
  ...
...
system_prompt = "You are a restaurant assistant ..."
...
agent = Agent(
    model=model,
    system_prompt=system_prompt,
    tools=[create_booking, delete_booking],
)
...
result = agent("Reserve a table at Green Farmer's tomorrow at 7pm please.")
Enter fullscreen mode Exit fullscreen mode

That's it. Now you know how to build an agent with Strands, and that's not very difficult as you may see. The difficulty resides more in the system prompt and the code to develop inside the tools. This is where you should focus on from my point of view.

Some more capabilities for agents & Strands

Call external systems like databases

Previously the "book(...)" tool was supposed to book a table at the restaurant. That could be done by calling an external API or/and writing to a database for instance.

Here is a representation of a tool calling a database. This database could be any kind of database like DynamoDB, RDS MySQL, ...

Call remote tools with MCP (Model Context Protocol)

Your tools (i.e your functions) can be called "locally", inside your agent process, but they can be called outside the agent process by calling a server.

This server can run locally (through stdio protocol) or remotely (through streamable http protocol).

We can see an MCP server like a classic http server that can communicate with an AI agent. The AI Agent can call this server to list the tools that it provides for example.

Strands integrate with MCP Servers as we can see below.

Agent to Agent with Strands

We can imagine that an agent not only uses tools, but... other agents.

These agents actually are tools, but they contain a Strand agent.

We can imagine specialized agents with their own system prompts that interact with the LLM to achieve a task. For example, I can have an agent to reserve a restaurant, but also an agent to plan a trip:

You: Reserve a table at Green Farmer's tomorrow at 7pm please and show me how to get there from Gare du Nord, Paris.

Agent: ok, table is booked. Here is the best way to get there...

Below is a representation of this pattern. Note that we need an "orchestrator agent" to take care of the orchestration between agents.

Deploy and expose your AI Agent to the world

When you are happy with an agent, you can deploy it on a server and call it through an API.

You can run the AI Agent and its tool in a Docker container (using ECS/Fargate for example) or in a Lambda function as we can see below.

We could also do something more complex, calling tools outside the containers using MCP.

My first AI Agent with Strands

I couldn't resist the idea of generating a Strands agent from a Drawio Diagram with Amazon Q Developer.

That wasn't straight forward, it's not easy to reproduce, but that was really fun and full of learnings. So I decided to share this part as well.

Here is what I wanted to achieve for my first AI agent

It's a simple Strands AI agent. I want it to... reserve a table at a restaurant :)

First I drew the diagram above and I created a rule file that I called 'ProfessionalTwin', where I describe what I use as a professional.

- I use python 3.10
- I use AWS CDK V2
- I use AWS Strands SDK for AI Agents
Enter fullscreen mode Exit fullscreen mode

Then I asked Q developer

@diagram.drawio.xml generate application

It generated the application, but introduced a few errors, and it didn't work right away, so I had to create some more files containing rules for each thing I had put in my Professional Twin file: CDKRules, PythonRules and StrandsRules.

Here is, for example, what StrandsRules contains:

When asked to use AWS Strands SDK, here are the rules

- use lambda layer for strands packages
- install strands-agents python module in lambda layer folder
- install strands-agents-tools python module in lambda layer folder
- Strands Agent Class parameters are model, tools, system_prompt
- create rules with @tool decorator
- create a system prompt that describes the role of the agent
- use "anthropic.claude-3-haiku-20240307-v1:0" model
- Allow lambda calling bedrock with stream response
Enter fullscreen mode Exit fullscreen mode

After several trials I reused:

@diagram.drawio.xml generate application

That gave me a URL as an output, I clicked on it, and I could start a conversation with my assistant. I then could verify my reservation in my DynamoDB table. And I didn't write or modify a single line of code!

Don't get me wrong, I'm not saying we can generate everything from diagrams, but that can help bootstrap ideas.

For example I had no clue how to design and build the web page. Amazon Q Developer created this in a few seconds!

Now I can iterate on this code.

I shared the result IN THIS GITHUB REPO HERE

You can try to make a generation on your side, improve it or just review and deploy my generated result.

Conclusion

Are AI Agents a revolution? Are they great? If so, why?

From the value creation perspective that seems to offer great perspective and that should shorten many processes. Combined with voice that should be impressive. And I clearly understand the enthusiasm around this.

From the technical point of view, I tend to think AI agents are traditional servers using the power of LLMs to process a query. That's one more component in the architecture.

Moreover, it's a NON-deterministic component that can have unexpected behaviors sometimes. That comes with a lot of technical questions and some challenges as well.

I guess a framework like Strands can really help answer some of them. I'm personally a big fan of its simplicity. That's a great discovery!

Top comments (0)