Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Using Strands Agents to build autonomous, self-improving AI agents (AIM426)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Using Strands Agents to build autonomous, self-improving AI agents (AIM426)

In this video, Aaron and Cagatay from AWS's Agentic AI organization demonstrate building self-evolving AI agents using Strands Agents framework. They showcase agents that read their own source code, create tools dynamically using hot tool reloading, update their system prompts via environment variables, and implement memory through Bedrock knowledge bases and vector stores. The session progresses from basic agents to meta agents that create sub-agents dynamically, use orchestration patterns like swarm and graph, and employ tools like use_agent, think, and journal for autonomous operation. They demonstrate deploying to Bedrock AgentCore using simple CLI commands and the @app.entrypoint decorator, with real-time provisioning. The presentation includes live demos of agents building calculator and weather tools, storing conversation history, and operating with shared context. They discuss AgentCore Policies for constraining agent behavior, evaluation methods, and production deployment via Terraform and CDK, emphasizing the shift from static workflows to model-driven approaches where agents orchestrate themselves.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to Self-Evolving AI Agents with Strands

We're going to be learning about Strands Agents and building autonomous, self-improving AI agents. My name is Aaron. I'm a Principal Engineer in the Agentic AI organization. I'm Cagatay from the same organization as a Research Engineer. By the end of this session, we will go through watching an AI agent read its own source code, create new capabilities while it's running, and then we will deploy it to production—a self-evolving agent. This is not theoretical. We have a GitHub repository we'll share with you as well. It's all working code. Feel free to follow along, or after the session as well.

So we had this poll just as everybody was coming in here. We got some of the results here. We have such mixed results, very mixed results.

I vote for—I'm new to agents, so don't mind me. Thanks for this voting. Looks like we have people building simple workflows, using frameworks like Strands Agents, new to agents, and production agent systems heavily, right? A couple of people are even building autonomous agents already. We love that. Perfect. So what are we building today?

We have a GitHub repository. You can scan the QR code, open it in GitHub, and follow along with us over the comments. We have six commits to share with you. Each individual slide has its own code, so you can also follow us while we're explaining the ideas, and you can inspect the code and run it. In this repository, you can talk to us outside afterwards as well. We'll be around. This is the code that we're going through today. You'll see several commits in there as Cagatay mentioned. Each of those commits builds up from the last toward our final self-evolving agent. Feel free to follow along there.

Building Blocks: From Basic Agents to Autonomous Systems

The agenda for today: we'll start with a very basic agent. Looking at the poll results, this shouldn't be anything revolutionary to most folks here. We'll start with a very basic agent, and we'll gradually introduce more and more autonomy. First of all, we'll have the agent build its own tools. For folks who've built agents before, agents are a couple of things: one is the model, one is the tools. These tools, often we write them ourselves. We build Python code or we build an MCP server. What we're doing here is having the agent actually build that tool and test that tool itself. If the tool fails, it can correct the tool's implementation and continue from there.

Once we have that, we will have the agent update its own system prompts. With the work that Cagatay and I have done in Codeium and Q Developer, for example, we have engineering and science teams evaluating, refining, and writing these prompts. In this case, we're actually having the agent do that itself. We'll continue increasing the autonomy from there. We'll talk about learning from interactions—memory concepts with agents. We'll dig into that and some more topics on further increasing autonomy. We've coined them meta agents—agents that orchestrate themselves, so we don't have to define a graph or a workflow. They do all of that for us. We'll dig into that as well. Finally, we will productionize and deploy to AgentCore, and show you how simple that is as well.

We will save five or ten minutes at the end for Q&A, so if you have questions, please write them down. We'll save some time at the end for those. Like I mentioned, we'll be available outside afterwards as well for further discussion.

Creating a Basic Agent with Shell Tool Capabilities

Over to Cagatay. Let's jump in. Starting with the basic agent is simple by design. We designed the SDK for developers to develop their agents faster. For this case, we have a shell tool attached to the agent, and we are asking, what can you do?

In this video, you can see I'm asking the question, and the agent starts streaming to my terminal.

This is the default behavior of Strands. But it only has a shell tool in it. With Shell, agents can access the environments they are running on, but I have a question.

Self-Extending Agents: Building Tools at Runtime

But I have a question: Can agents build their own tools? We will be explaining that in a flash. We call them self-extending agents. The key is how to load the tools from a directory. Imagine you are building a front-end service where you put a file and the framework allows you to load that file as an API. The same applies for agents. I have a small code example here that I want to highlight some key points. I place a system prompt to the agent which says: create a tool for yourself. That is the goal. Start using them directly. Because the framework allows hot tool reloading from a directory, it can store a file under the tools directory—in this case, weather calculator, or whatever you want—and define the tool using a tool decorator, which we love and use. We attach the system prompt to our agents, but this is not enough by itself.

We enable load tools from directory because with this approach, agents will be watching the directory while you are developing your agent. Alternatively, agents can write the file and test the tool for us, which we are just going to watch together. Let's see how agents are going to create their tools and start immediately using them. One thing to notice here is that the agent will not stop after writing the files. You can see the agent created a tools directory which is empty, and using the shell tool—which is the only tool it has—to create some files. You can see the calculator, password generator, text analyzer, and unit converter. This keeps going, but the interesting thing is that after the agent created these tools, it started using them immediately. You did not stop the process. The agent created these files five seconds ago, and now the process has finished. With this example, we showed that agents can create tools during runtime and start using them. This is critical for us and for the agents so they can build their abilities on the fly.

Dynamic System Prompts and Rudimentary Memory

So, can agents update their own system prompt? Yes. We just saw an agent that builds its own tools. Now we are going to look at an agent that updates its own system prompt as well. We are starting to get into memory concepts a little bit now. Your agent being able to learn over multiple iterations means it is learning over time. What we have here is very similar to what was just shown. We have a system prompt, but this time it is a little bit different. The system prompt is actually coming from an environment variable named SYSTEM_PROMPT. What this allows is persistent storage for our system prompt essentially. It is just an environment variable. Our agent not only has a shell tool now, but it also has an environment tool. This allows the agent to read and write to environment variables. So we can say to our agent in our system prompt: update your system prompt in every turn. Because it has the environment tool, it will go ahead and do that.

There are three important steps here. First, we define our system prompt, which comes from somewhere that can be changed, like an environment variable. If you are deploying to production, this could be any storage—it could be an S3 object, it could be a DynamoDB record—so that it is not tied to the compute environment like an environment variable. You can store that somewhere else, load it, and read it at runtime. The second key piece is that we are reconstructing the system prompt on every invocation. We first define our agent here, right? It does not have any system prompts. There is a system prompt parameter in Strands, but there is not one defined here.

We're adding this dynamic prompt here to our environment variable, and that is what we set as the agent prompt. You'll notice every single time we're in this while loop here, we're reprovisioning the system prompt and resetting it basically. So if that environment variable has changed, that will now go through into our agent here. And then finally we invoke the agent. So three important steps: one is we define somewhere that the agent can read and write to, and then we reconstruct the system prompt on every invocation.

We can see a small example of this here. So here's the environment tool we give to our agent, reading from some kind of persistent storage and reconstructing the prompts. Here we're resetting the messages. I'll show you in a second. So Cagatay wrote this. You can see here, my name is Cagatay Kelly. He's telling it that we're presenting session AIM 426, which is this session here that you're all in. We send that prompt to the agent, so there we can see it's checking. It's called the environment tool. System prompt was not set. The agent has set the system prompt for us. So now we can go back to the agent here, we see it set the system prompts, we can ask a follow-up question: what is my name? And now it will say my name is Cagatay, my name is Aaron, but this is Cagatay. I'm impersonating. So here it's a very rudimentary form of memory in a sense, and we'll talk about different mechanisms for memory. This is a rudimentary form to introduce the concept.

So now we have an agent that can change its system prompt as well. What we did here is you see how we cleared the agent messages. This is the conversation history of the agent, all the interactions we've had with the agent, all the decisions the agent has made, the tools it's called, and the results of those tools. Those are all in that agent.messages property. We're cleaning them up in this example to show that the knowledge is not coming from the previous message I just sent. I said my name is Cagatay Kelly. That goes into the conversation history, we're wiping that out to prove that it's actually getting that information from the system prompt, not the messages, which are now empty. So just an example of how that can apply to system prompts as well.

The Three-Step Loop: Retrieval, Self-Modification, and Persistence

So now we have a very basic agent shell and environment tool. It's created some of its own tools in that tool directory. We've shown that it can modify its own system prompts. And I'll see to Cagatay can talk about what's going on behind the scenes. So now we started with an agent with one tool, the shell tool, and we gave it an additional environment tool. The shell tool can also update environment variables. Let's talk about the three steps Aaron mentioned. Dynamic prompt construction is the first step. Your agent system prompt needs to allow a dynamic part. Not everything in your system prompt needs to be dynamic, but there should be a portion that allows your agents to get the latest version of that system prompt. It could be a DynamoDB, it could be an additional external resource you are getting your system prompt from, but you need that portion.

Second is self-modification. Agents need agency to be able to update that portion for future users. The third piece is persistence. You need to persist that version of the system prompt somewhere. This is crucial for the agents and the sub-agents your agent will create so they can keep the context going. We will be talking about agents that will create their sub-agents. Unless you store this version of the system prompt, your sub-agents will not be aware of those changes. So what if we ask an agent a question today and expect that agent to have that piece of context available for the agent tomorrow? I'm leaving it to Aaron to talk about it.

Yeah, that's good. Yeah, if I'm just going to jump back a second as well, so this loop that we have here for the prompt, but it applies to everything really. Everything to do with these self-evolving agents. It's this three-step loop, right? You have the retrieval of some information. Right, this first step, the dynamic prompt construction. We have the second step where the autonomy kind of comes in, the agent is able to modify its behavior in real time, maybe generating tools, changing system prompts, or using some other mechanisms.

This self-modification is what really enables the autonomy. It's not a static definition. Our engineers and scientists have not defined a big heavy graph with all these edge case handling and error handling, with 50,000 different conditions that we have no hope of writing and covering all of them in a good way. So we're offloading some of the engineering and science burden to the models that are now capable of this orchestration, especially some of the more state-of-the-art models like Claude 4.5 and GPT 5, which are able to orchestrate themselves essentially.

And then finally we persist that. For example, when the agent is writing tools, it's writing to a file system. It could also be pushing them to GitHub for the modification of the system prompt. For memory, it's again persisting that. We saw the example with an environment variable. As mentioned, it could be DynamoDB, S3, or somewhere else to persist that. But these three steps are critical: dynamic retrieval, modification, and then persisting those modifications. And that's a loop. On the next invocation, the modifications that we just persisted are used, so the agent is continually modifying and evolving itself through every interaction.

Implementing Memory with Vector Stores and Bedrock Knowledge Bases

Those interactions don't have to come from humans. You could have an agent that is triggering self-modification on schedule. It could be running triggered by events, perhaps. Maybe you even have an actor-critic kind of pattern where you have an agent with another agent kind of battling it out with each other, correcting each other in an actor-critic pattern. To take that a step further, I mentioned towards the beginning meta agent concepts. We spoke about memory through system prompts. There are a few more methods for memory as well, perhaps a more familiar method of memory using a vector store.

In this sample, we have a three-step process. First of all, we search for relevant conversations and past contexts. If you recall that previous slide we saw about dynamic construction and dynamic retrieval, that's what's going on here in that first step. We're retrieving information from a knowledge base, in this case a Bedrock knowledge base, but it could easily be any vector store like Mem Zero, AgentCore memory, S3 vectors, or whatever. We're retrieving that information and putting it into the agent's context. So that's that first step on the loop that we saw, the retrieval of modified information and memories.

Step two is the agent responds. We invoke the agent, and you can see here we have that result line. We're just invoking the agent. Because we previously called the retrieve line, the agent has those memories from the Bedrock knowledge base pre-filled into its memory. We're doing a search on the knowledge base with the user input. So if I type "what is my name?" our agent will call the retrieve tool that we've added here, which links to Bedrock knowledge bases. It will pass through my query and retrieve some documents that we've stored in our vector store, our memories about what my name is.

Those are pre-filled into the agent's context, which is that agent.messages property. So by doing agent.tool.retrieve, I'm pre-filling the agent.messages context. I'm pre-filling the context window essentially. When I call that retrieve tool in this manner with a direct tool call using agent.tool, I can just programmatically call the tool there. It's pre-filling the agent's context with the results from that tool. So I can now have a workflow as we see here that's pre-filling the agent's context with my memories just with that simple agent.tool.retrieve line.

We invoked the agent on the result line, and then the final step, as we spoke about behind the scenes, is the persistence. The last line here again, we have a direct tool call. This time it's not the retrieve tool, it's a store_in_kb tool. You can guess what that does—it stores in a knowledge base. And you can see the example documents we have here at the bottom. The title of that document is Conversation History, and you see the content is what the user has asked our agent and what the agent responded with.

For every interaction with this agent, we are retrieving information from Bedrock knowledge base, pre-filling the agent's context, running the agent, and then storing the results in that same knowledge base. When it runs again, the agent knows what we just did in the previous interaction. It's all stored in that knowledge base.

There are many different options for what you want to store in the database or in the vector store. Maybe you want to store every interaction like we have here. You can also give the agent those tools. You see here we have the shell, the retrieve, and the store_in_kb tool. So not only are we calling them directly in this workflow to prefill our agent context and enable continuous learning through memory with the knowledge base, we're also able to have our agent call them dynamically.

We do not have to program it. You could ask the agent a question like "What did I do last Sunday?" and it will dynamically call the retrieve tool and respond with that information. So the agent is able to, at runtime, dynamically call that tool as well as me manually calling it. Here you have that retrieve, and here we have the store at the very end.

Can we ask questions? Sure, please. Which permissions are being used for the tool at the end? So you need IAM credentials or something like that to access the knowledge bases. Where does it take that from? The retrieve tool uses Bedrock knowledge bases and your agent runtime needs to have that IAM role. If you have that, it's just going to work. But if you don't have that Bedrock knowledge base ready, I will suggest checking Bedrock AgentCore memory, which is going to provision the memory for you.

This is one of the things that comes with a suite of tools. We have those Strands tools at the top there where you see we're importing shell and retrieve. It comes with a suite of tools that you can use. The retrieve tool here is for retrieving information from Bedrock knowledge base. We have other tools as well. AgentCore memory, as mentioned, you could use S3 vectors. You could use some of our partners like Mem Zero. We provide tools for that as well. There are MCP servers for memory, and each of those tools has their own authentication requirements. For Bedrock here, it's AWS credentials. As mentioned, it could be an IAM role, it could be typing AWS configure in your terminal and setting up your access key and secret key. That all works.

Can I ask a question here? Yes. So you say it has history vectors, so the agent can retrieve or look into memory? Yes. A couple of ways you could implement that. We could have multiple retrieve tools, one that talks to our S3 vectors and one that talks to Bedrock knowledge base as we see here. Maybe we have two Bedrock knowledge bases. So there are a couple of ways you could do that. You could provide multiple tools, one per knowledge base, and your agent can dynamically select which to retrieve from.

Another alternative you could do is bundle all of those into one tool, just like a retrieve memory tool. We have the retrieve tool here. In that retrieve tool, it's very simple. It's all open source and you can go look at the code on our GitHub repo. It's a very simple tool and you're able to modify it however you like. You could have it retrieve from S3, then retrieve from Bedrock as well, and combine the results into one. So there are a couple of ways to do that. It depends how you want to surface it to your model. It could be multiple tools or one that combines.

So we can see a short example of this running now. Cagatay Cali made this one. Do you know Cagatay Cali? And it's responding with information about things that Cagatay has worked on because Cagatay uses this agent when he's working, so we know what he's working on here. It's given us a nice little bio about our good friend Cagatay here. This is your typical memory kind of integration. These are documents in a knowledge base and our agent is able to use them. Simple as that, really.

Meta Agents: Creating Sub-Agents Dynamically

So I'll hand over to Cagatay again to talk about further increasing autonomy and agency with meta agents. So far, if you are following along, this is the place we will be discussing about how we can increase agency.

We've given agents the ability to update their system prompt, their own tools, and learn from every turn, but we haven't talked about the steps we'll be explaining here. Building their own tools is one level of agency we gave them. The second level is creating agents dynamically. The main agent you are running can create their own sub-agents with a custom system prompt, a custom set of tools, and they can be asynchronous. Your main agent can trigger one, leave it running, and collect the results later. It could be synchronous—trigger the agent, wait for the result, and get the results. We have additional agency we're not going to touch today. Agents can update their own Python code and continue running, which is possible.

That second point is particularly interesting—creating agents dynamically. We've seen research papers from Amazon and Anthropic where they have a research agent that Anthropic published, for example. There are predefined agents in there. We're taking that a step further. We're saying to our orchestrator agent, our main agent that we're talking to, we give it a tool that allows it to create another agent, a sub-agent, dynamically. None of our engineers or scientists have defined what that agent is. We haven't given it a system prompt or tools. We're able to dynamically create them, so our model itself is deciding how it needs to break down the task and what sub-agents might be ideal.

It's like if you're managing a project—you might know what kind of skills you need to work on that project. That's what the model is doing here. It's figuring out what skills it needs. It knows what tools are available for the agents, and it can dynamically create them. Each of those sub-agents could also be self-evolving. Maybe it's a great time to give an example for this. Imagine your agent having a tool named use_agent. That agent tool will accept system prompt, tools, model provider, and model settings as parameters.

In this example, I attached a couple of tools to the agent shell—use_agent, think, swarm, graph, and journal. Let's talk about it before we start running the video. The shell tool we all know—it's perfect for access to the environment. Use_agent gives agents the ability to create a new sub-agent, which is another agent interface. You invoke a brand new agent, which accepts system prompt, tools, and model settings that can use a different model provider—could be Anthropic, could be Google, could be Bedrock, any model provider you would like. It accepts additional tools because your main agent in this case has shell, use_agent, think, swarm, and graph, but maybe your sub-agent doesn't need that in the first place.

You can see there's a think tool. In other agents you can see ultra-thin, deeper thinker, whatever you can count. The idea behind the think tool is we are calling one agent and the result of that agent is passed through the second agent and then the other agent, and so on, a number of times—like recursive thinking exactly. We are allowing agents to create their instances programmatically a number of times to deepen that idea throughout the time. We have a swarm tool, which is a shared context between agents.

Orchestration Primitives: Swarm, Graph, and Think Tools

Aaron has a great example for this. We collaborated on this presentation. Do you want to take some time to explain? Yeah, so Cagatay was just talking about some of the meta tools that we have—use_agent, think, swarm, and graph. Swarm you can think of as a self-organizing team of agents, basically. The example Cagatay was giving—him and I have paired, brainstormed, sat together with pens, paper, PowerPoint, and IDE code to make this presentation. I wasn't telling Cagatay what to do. He wasn't telling me what to do. We were just two people collaborating. That's kind of what a swarm is—there's no lead.

It's just a collaboration of agents, and they figure it out among themselves. It's a super interesting concept, and you see similar things in frameworks like Crew AI as well. Thank you for that example because today we were talking about that and we have a great diagram in a couple seconds.

We have a graph tool as well, which is for conditional edging. You can define a group of agents directed to each other. The last piece of this is the journal tool. The journal tool uses a file system to persist data so our shared agents can have a backbone of context. It's more of an agent scratch pad—it could be like a to-do list. Exactly. If your agent starts working on something, they are not losing the context about each other. They will all be aware of the same context.

So let's see in this example. I'm saying you use the use_agent tool, that's it. I'm not prompting. I'm not doing a lot of prompting. It's asking me what to do. I'm saying testing use_agent tool, and I'm not giving any hint to the model. It says, okay, I'm going to use the use_agent tool a couple times. I'm going to calculate 2 + 2. This is the best thing to do, right? The main agent invokes two different sub-agents to do two different things.

But so far, what we achieved with this ability is we expanded our context length because our main agent had 1 million context length in this example. But every sub-agent created has a brand new context that can spend 1 more million, 1 more million, and they can call each other to spend more. So here is the diagram we have been trying to put together. In this diagram, we have three sub-agent tools called four times in parallel. The agent invokes those tools in parallel and does not wait to see the results. It is a fire-and-forget pattern. The main agent is going to be idle after this step.

On the first row, we have one agent using another agent. You don't have 1 million context again. That creates two more doors in parallel, but it's going to wait for the results. This result is going to be back-propagated to this agent, so this agent is going to store the result in a journal. So this row is going to spend 1, 2, 3, 4 million tokens in the worst scenario if you keep everything running for a long time. You expect at least at max 4 million tokens you are looking at.

The second row is the thing we just mentioned—recursively calling the agents and sharing the context across time. I liken the think tool to, you know, if I'm doing research on a topic, I'll do my initial research, write it all up, and then I kind of do the same again, right? I refine my understanding maybe a month later. I'm revisiting what I just did. It's kind of what the think tool does. You invoke the agent, it does something, and you get it to do it again, right? It's like think deeper, think deeper.

In this example, we have it thinking 10 times to show the case. But maybe you only want it to revise 2 or 3 times. You asked if there's a difference between the journal tool and other memory solutions. The journal tool is using a file system here, so there is no extra dependency. Let me go back one second for that. So the journal tool only uses the file system, so there is no memory or any other thing happening here. If you plug the Wi-Fi off, this agent is still going to work because of that. They will be eventually consistent. I think that's a nice difference as well. It depends, right? There are many ways to implement memory.

Memory Mechanisms: Journal Tool vs. Vector Stores and Composable Patterns

What we saw previously with Bedrock knowledge bases, S3 vector, AgentCore memory, Mem0—those are vector stores. S3 Vector, Agent Core Memory, Memo, and other vector stores are based on embeddings, which allow you to perform semantic retrieval. The Journal tool, on the other hand, is markdown file-based. It's another way to store memories through markdown files, but it limits your ability to do semantic search since it's not embedded at that point. So there are a couple of different ways to implement a mechanism for your agent to track progress, which is a form of memory.

In semantic search, there's a delay once you start that context and how that's going to be propagated back to you. In this case, time is crucial for us because one another sub isn't going to be triggered immediately. You may not be able to wait until that ingestion finishes. We've experimented a lot with things like Chroma DB to run a local vector store as well, which is super fast—just a couple of milliseconds to run. This helps with some of the latency issues, but then you have the trade-off that it's not distributed. If the compute that is running on disappears, that's gone.

So there are a lot of different trade-offs. When we mention the swarm pattern with context sharing, each agent in the swarm has a small circle with intersections between them. That's the context we are trying to explain. They have a shared context about each other, and while they are working, they keep their context within their boundaries in case of this shared context. This can be implemented in many ways. In our example Swarm tool, we've implemented it in memory. So agent one updates the context, agent two updates the context, and now both agent one and agent two are able to get both of those contexts. They're able to almost communicate with the shared context.

It's like Cagatay and I working on this presentation. We wrote down a bunch of notes, which is our shared context in memory. Same idea. It doesn't have to be in memory either. The shared context could be any sort of storage—it could be DynamoDB or whatever you want. After swarm, we have a graph on the last row. You can see these are directed systems. You can actually make them loop again. They can stop whenever the task is finished, but there's an interesting thing we have below here. One of the graph nodes uses an agent. What is that?

You're able to compose these orchestration primitives—a graph, a swarm, use agent—they're all able to be mixed and matched. You can have a graph of graphs, you can have a swarm of graphs, you name it. They're all able to mix and match and compose with one another. When an agent is running, it can change the way it's running. We are leaning towards the model-driven approach. As the models get better, they're getting better at creating these kinds of systems that have never been visible or explainable for us. This is something beyond the static workflows we have been building all along.

Deploying to Production with Bedrock AgentCore

Intelligence is the ability to adapt to change. The reason we are building software to help people is that it's not supposed to be static. The way to do that might be agents, might be robots. We will all see it together in the future, I believe. So we have a video here of our code. We changed the way we code previously. We let the agent update their system prompt and build their tools. But we need to deploy this to somewhere else on the cloud because I don't want my MacBook to stay open forever.

You can deploy your agent to any runtime with Strands, but in this case we are going to deploy the agent to Bedrock AgentCore. To deploy to Bedrock AgentCore, you need three different CLI commands and one decorator. The decorator comes from the Bedrock Agent Core Python package, and I am importing the memory primitives from Bedrock. So my agent will be able to learn from every interaction after we deploy.

In our local environment, we were using Bedrock knowledge bases. When we deployed the Agent Core, I switched to Bedrock Agent Core memory. I attach a session manager to my agent, so every session and every turn becomes part of my agent's context. I can ask a question in three different ways: blocking, fire and forget, or streaming. These are the three different options when invoking an agent. For example, a creative sub-agent uses fire and forget. I'm using the Agent Core launch CLI command to deploy my agent to Agent Core. This provisions the memory and knowledge base for me. It's immediately useful after this command finishes. I'm running Agent Core status to make sure my agent is deployed, and I'm using Agent Core invoke to invoke the agent. Because we see how fast that provisioning works, the point we're trying to get across is this: we've built a great self-evolving agent that builds its own tools, can modify its own system prompt, has memory, and learns from our interactions. Now we need to ship it somewhere so we can turn it into a business and start making money, perhaps. With a simple one decorator, you just do @app.entrypoint at the top of a function, and now it's an Agent Core agent. You can deploy it to Agent Core. As shown, we can run Agent Core launch, and that does everything for us. As we saw in that real-time video, not sped up, it provisioned everything. Now it's infinitely scalable, running on AWS in a serverless manner, paid per request.

So we deployed our agent while I was talking. That might look fast from the video, but let me show you some resources. If we jump back just one second, I wanted to show you something. Let me play this. Do you want to see the code? I'm going the wrong way. So if I play this again. So yes, we have two SDKs: Strands and Bedrock Agent Core has another SDK. Python, TypeScript, you name it. We just launched Strands in TypeScript today if you're interested in that. But here you see that entry point, right? It's a decorator. We just have @app.entrypoint. Once we have that, we can just run those CLI commands: Agent Core launch and Agent Core invoke. We've also upgraded the Agent Core CLI this week. I encourage you to go and play with it. You can also run Agent Core Dev, which does the same thing as Agent Core launch but locally. It puts your agent into a container that matches the same input-output interfaces as when you have it deployed, but just running locally. If you've built Lambda and you're using SAM CLI, it's a similar concept where we containerize your application here with a local dummy version of Agent Core runtime essentially, but it works exactly the same way as when you deploy it. So far we have been explaining advanced topics. We expected agents to build their system prompts and their tools, and maybe it's hard to keep up with everything we explained today. I would suggest checking out our GitHub resources on our GitHub organization named Strands Agents dot com as well. So let us know what we can do better and let's build together.

I'm leaving a couple more seconds to take some photos. You see there are a few links here. As I mentioned, we just launched TypeScript for Strands. We just launched an evaluation SDK for Strands. We launched a bunch of features. There's bidirectional voice agents we just launched today as well, so you can now talk to an agent in real time. It can understand intonation and rhythm. If you're stressed or happy, that kind of stuff. We also introduced a new concept called agent steering. If you go to the Strands Agents dot com website, that third QR code there on the right, you'll see on the navigation menu in our user guide a section called experimental. If you go into there, you'll see some of the new stuff we've just launched, like the bidirectional voice agents and agent steering as well, which is super interesting. It's more of that actor-critic kind of pattern.

We're applying a lot of that stuff into Quiro as well. If you're familiar, Quiro is built on top of Strands, and everything we're showing you here today powers Quiro as well. The best way to predict the future, I invented.

Here we are at re:Invent. I hope you are enjoying this meeting so far, and thank you for joining us. We have a Q&A session for 15 minutes. We have a great time to chat and explain all the questions you have, and we have some resources as well. We will be keeping this open and we are here to answer your questions.

Q&A: Practical Use Cases, Testing, and Best Practices

So this is all quite abstract. Do you have a practical use case that you think is exemplified by these tools? Yes, we have examples that you can install right now on your computer and test all these features. Beyond that, we have an agent that can update its own messages and remove tools from the list of tools. We can talk after this. I can share with you directly, or you can visit the Strands agent builder. You can give it a try by yourself.

Some use cases we've seen include research agents, which are very beneficial. I personally find it useful for two things. One is where you're looking for emergent behavior, and the second is where the problem space is so big that it's infeasible for me to engineer it as an engineer. For example, developing something like Quiro or other tools, those tools being able to extend themselves with LLM intelligence, we found to be very useful.

For example, now Quiro is able to write its own tools. If I say, "Where is the space station? How far away is the space station from me?" There's no tool for that, but the agent is able to build its own now. If I work at NASA, for instance, now Quiro is able to use my internal APIs or whatever to find where the space station is. So I personally find it useful for two things: emergent behaviors and situations where the scope is so broad that I would need a big, expensive team of people to engineer that. That's where we found it useful.

We have two or three more questions. It seems to talk about a huge lot of space. Do you have any thoughts on that? That's a great question. Let me defer to my principal engineer. So we just released this steering concept, and you'll find we also just launched AgentCore Policies. As you mentioned, these models are stochastic, right? They're not deterministic, which is actually beneficial. But what we've built with AgentCore Policies, I encourage you to go try it, is adding some more of that determinism to these agents.

We have these very powerful dynamic agents, and we kind of want to draw a box around them and say this is your boundary. You can evolve and self-modify, but only within these constraints. Those constraints can be defined in many different ways. You can try to write a bunch of if statements, but then we're back to the same problem where I have to engineer this huge scope. I can build a state machine to validate everything the agent's doing, but it's unwieldy.

With AgentCore Policies, I encourage you to try them. You're able to type in natural language, and it will generate a policy for you. It's very similar to an IAM policy. It's built on the same language and uses CEDA, our open source library, for this kind of policy language. You're able to define deterministically what the bounds are of that agent. Those bounds can be literally anything, and the policy is able to be formally proven using some of our automated reasoning techniques.

You don't need to deeply understand automated reasoning. We have a natural language input there. You write natural language, and it will generate the policy for you. That policy becomes the source of truth for the bounds of the agent. But I definitely encourage you to try that. The critical part is that policies are outside of the agents. They are not inside of the code base, so even if they evolve, there is nothing to do with that.

I was at a session before talking about the design of agents.

One of the things they mentioned about the design of Agent Core was that the container is for the session. So if I write these tools, they're not persistent. That's a perfect question. You can store your tools in an S3 bucket and let other agents download and use them, but there are many ways to do that. Is there just a checkbox in Agent Core to say "share this"? That's a perfect idea. I will be working with the team to make it possible, like a persistent environment. We will be taking that feedback very carefully.

There are a couple of things on Agent Core. The runtime is a long-running container, right? You can have a long-running autonomous agent there, but you're right, it's a single piece of compute. It's a container essentially under the hood. So it's ephemeral, right? It might be long-lived, but it is ephemeral ultimately. When these agents are evolving themselves or modifying themselves, we need to persist it somewhere. So it could be S3, it could be wherever, right?

In the example we gave, if you clone that repository and change the system prompt to write your tools to S3, and if you give it the strands use AWS tool, it will write those tools to S3. As an example, I think the Docker container needs to be persistent in a way we can just reuse the same environment as is, but that's definitely something we're looking into. We're looking into reinitializing the whole container state, like all of the memory. There are ways to export the Python runtime, but we can talk later.

We are pitching the model-driven approach, so we are not determining what to do and when. We are letting the agent have those tools and the models are deciding whether to use them or not. What are the criteria someone has to decide whether they should create a new agent or not? It's completely based on the request you're asking. My manager was asking, "Should I buy an AWS stock today?" Do we need to create a new tool for that? Definitely not. But the agent created five different subagents to make market research and check the tag of documentation. So it basically depends on the question.

Can you talk a bit about production deployment of these agents? I would not use the Agent Core CLI. Yes, that's a very good point. We built the Agent Core CLI for developers. The same reason we built Strands actually is developer experience first. People are not going to use Agent Core if it's not easy to use. We understand that. So we've built Agent Core CLI for that purpose. I'm a developer and I want to quickly deploy it. We've actually been surprised that there are a number of companies that do deploy to production that way, which is interesting.

On the more traditional side of deploying this to production, Agent Core has support for Terraform and CDK. You deploy it through your typical CI/CD pipeline to production as well. Have you tried using self-evolving systems in the real world? Yes, I can give you one example. As a research engineer, I built a research agent for myself running on GitHub every day on schedule. It keeps evolving its system prompt and searching arXiv for finding new ideas for me because a lot of arXiv papers are releasing every day and I cannot keep up.

What I did for myself around two years ago was deploy my Strands agent. We were internally using Strands back at that time, so it keeps searching arXiv using CrossRef to compare the papers and sends me a Slack message every day. I just read the post every day and the agent itself builds those arXiv tools, CrossRef tools, and whatever you can imagine from that point.

I have another repository similar to the research agent, but it's for building tools for protocols like TCP and WebSocket. I didn't want to spend my time continuously building tools, so I built an autonomous agent to build them for me. I've found it useful for example with Kiro CLI, formerly known as QCLI. We have an internal version that implements a lot of this style of agents—a self-evolving agent that's essentially my personal assistant. I don't want to have to code it and prompt it on how to do research, coding, testing, planning, and managing my weekend every time. It's basically my personal assistant, and I don't want to define all of those tasks because there are thousands of things that I do.

By enabling it to evolve itself, I don't have to program everything. I'm relying on the model to define that. Obviously, I'm doing human-in-the-loop there. I'm reviewing what it's doing and looking at the tools it's generating. You can also evaluate these agents. Every time they make a modification, you're able to run evaluations. We just launched Strands evals and AgentCore evals, which has evaluation methods as well.

I think this is super cool and novel, so congratulations to the team who built this. I have more of a comment and question. The entire immutability artifact deployment paradigm goes out the window, right? Whatever you tested or not doesn't matter because whatever gets deployed, who knows what that is. I want to get your thoughts on that. Testing this becomes effectively impossible, right? You don't know what's going to happen.

I wouldn't say it's impossible. I would say the points at which you run those tests changes. Instead of everything happening within the scope of those tests, my deployment tests would essentially make sure it deploys and that whatever functionality it had previously still works. As it's self-evolving, I would then test it again for every modification it's making. That's the human-in-the-loop portion.

I have a great answer for how we can actually evaluate these systems. When we give agency to people, we need another person with the same agency to compare. Why aren't we thinking about agents the same way? You build one autonomous agent to do something, and you build a similar one to test it. These are the systems we use to actually improve one another. That's the place where such systems can evolve together instead of us trying to predict what might happen. We'll see what will happen.

There are definitely certain use cases where it's appropriate and some where it's not. I think it's inherently stoic, you know, to your point about humans. We don't always test humans. I think humans build trust over a period of time. Presumably that's the approach you're taking, more philosophical I suppose. It has done things for me the past nine times.

These agents are almost like learning by interacting with the world. They're living in the world with us doing things. They build a tool that pulls an API, they test it, it fails, they modify the tool, and now it works. At some point in that chain, you probably want a human to review. But where you do that could be immediately at every step, at the end, or after a month. You have a lot of options there, but definitely human-in-the-loop still for certain use cases.

For some other use cases like these personal assistant tools that we've built, I actually want it to just go and do everything. But you could have it with specs and say I want to call another agent that builds this, another agent that checks, another agent that does security checks. Have a chain of stuff where all of them have to sign off before the tool is added to this shared directory. Absolutely, you can do a lot of the automation of what we do today.

Really simple question: are the TypeScript and Python SDKs available? The Python SDK helped us build the TypeScript SDK, and we will be building the rest of the SDKs altogether. We initially built the Python SDK for internal reasons about three years ago, and it became something publicly available today. Lots of our customers asked us for TypeScript, and we have an experimental version that we launched today.

The SDK was built by the Python SDK running on GitHub. If you check out the TypeScript repo on GitHub, look at the pull requests and issues. The agent built that SDK, and you can see we didn't just take a reckless approach. It's human in the loop, so if you look through there, you can see our engineers working with the agent to build it, and it saved us a significant amount of time.

The idea with the different languages, and we're also talking about more languages, is that they are at feature parity today. TypeScript is at V0, kind of a preview release. Python has been V1 for a couple of months now, but the idea is we have feature parity. They use JSI. Today they're developed separately so that we can make the most use out of each language's features and the ecosystem.

Before we leave, I'll take one more question. We would love to give you feedback after this session. We would love to see your ideas, and this is critical for us to evaluate whether we did this in a good way. Please check it out in your AWS events app and give us some votes.

We have one more question. I was wondering about best practices regarding memory because it seems like there could be collisions and duplication. Semantic indexes are basically collapsing the data in a three-dimensional environment, and that environment might be having the same context over and over again. The way to store the context and how to retrieve it is basically the engineering problem we are all going to be solving.

If you are storing the conversation store in the same place, you may need another knowledge base to attach additional context so they're never going to be overlapping each other. This kind of engineering decision we are all going to make going forward as an agent builder persona. Thank you everybody. Thank you so much.

; This article is entirely auto-generated using Amazon Bedrock.