Kazuya

Posted on Dec 5, 2025 • Edited on Dec 11, 2025

AWS re:Invent 2025 - Building custom agents for intelligent AWS patch automation (COP407)

#agents #aws #automation #security

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Building custom agents for intelligent AWS patch automation (COP407)

In this video, Praveen Bhatt and Justin Thomas demonstrate how agentic AI can transform vulnerability management and emergency patching, using the Log4J incident as a reference point. They present "Patchy," a multi-agent system built with Strands SDK and deployed on Amazon Bedrock Agent Core, featuring a supervisor agent coordinating three specialist agents: Vulnerability Analyst, Patch Manager, and Compliance Analyst. The demo shows how agents can instantly assess production environments, identify 50 high-severity vulnerabilities, calculate SLA violations for PCI-DSS compliance requiring 48-hour patching versus 24-day maintenance windows, execute phased emergency patching across dev/staging/production, and generate compliance breach reports. The presenters walk through their iterative development journey from basic EC2 queries to sophisticated multi-agent orchestration, sharing architectural patterns, prompt engineering techniques, tool design principles, and code examples demonstrating deterministic behavior through structured docstrings, batching optimization, and workflow controls that enable automated health verification and rollback capabilities.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Learning from the Log4J Crisis

Excellent, folks. I just want to start off by saying that Justin and I were quite nervous, as you can imagine, but seeing you all here bright and early, 15 minutes before the session started, means a lot. So I just wanted to say thank you so much for being here so early, and hopefully we can do justice for the time you've invested in us.

Just on that note, I want to do a quick raise of hands if that's okay. How many of you experienced a Log4J event that happened a couple of years ago? Awesome. Half the room. Excellent. How many of you were on the front line on incident bridges having to answer CISOs and CIOs or CTOs around exposure and blast radius? Any of you on the front line? I can see a few hands. How awesome was that experience? I'm being sarcastic here, but you get the drill, right? It was chaotic. It was absolutely nuts. It was insane, right? And that's exactly what we want to talk about today: how can agentic AI help you and your organization be better prepared for situations like the Log4J event.

Everyone, I'm Praveen Bhatt. I'm a Principal Solutions Architect coming all the way from Down Under, or Australia as we like to call it. And with me today I have Justin Thomas. I'm a Senior Cloud Support Engineer, and I spend my days working and helping customers with cloud operations. The Log4J event that Praveen just mentioned, I was there at the front line, and I was helping customers, many customers simultaneously. What I witnessed was chaos. It was organized chaos at scale. I would be on a call with a customer helping them try to find the exposure, and suddenly I would see five more customers coming in with the same problems and same questions.

The Challenge: Manual Everything Corp's Emergency Response Chaos

So I want to take a look at an example of the same experience. This is a fictitious company, Manual Everything Corp. Imagine it's Friday afternoon. The team is ready to wrap up for the week, and suddenly a critical CVE gets announced and now thousands of instances need patching. So an emergency bridge call is scheduled. The Security Director joins the call.

Everyone, are we exposed or not? Do we have exposure? What's our blast radius? The first few minutes, nobody answers. It's silence. The platform team hasn't even joined the call yet. After some time, after we page the platform team, the platform engineer joins and says, I'm still trying to figure that out. I need to log into the instances. I need to log into the accounts. I need to see how many instances are non-compliant right now.

I also need to test this patch that just released before we start deploying to production. The IT Compliance Manager jumps in and asks, are we doing emergency patching right now, or can we wait until our patch maintenance window?

The CTO of the company is listening to all this conversation. It's frustrated because this looks like a scavenger hunt for the right data to make the decision. The CTO acknowledges that we need better automation and better intelligence. This conversation is something that I constantly see. The first 30 minutes to one hour, we are just waiting. We are waiting for the right people to join the call at the emergency bridge call. And when the right people join the call, it's the same questions: which instances are affected? When are we doing the patch? Do we need downtime?

Imagine this single assessment, this conversation is for a single CVE and this is going to be for hours. Now imagine doing that for multiple CVEs in a month. The team simply cannot scale to do it. So what if we add intelligent agents which work alongside your engineers to help identify and assess these vulnerabilities, check the compliance requirements, and also do the remediation? Your team stays in control, but it is now augmented by these agents to do all the heavy lifting.

The Solution: Multi-Agent Architecture with Strands and Amazon Bedrock Agent Core

That's what we want to show you today: the intelligence agents that we built. We can move to the demo now. The demo setup that we have is pretty similar to an enterprise setup, so we have three different environments: development, staging, and production. Each environment has a different patch schedule and different patch compliance requirements. Existing services are already configured: Amazon Inspector, AWS Systems Manager, and AWS Config. These are the services which are already detecting vulnerabilities and remediating them. What's missing is an intelligent orchestrator that leads the way when a critical CVE gets released. That's where we add our intelligent patch automation.

This is a multi-agent pattern. We create four agents: one supervisor and three specialist agents. These agents are built using Strands. Strands is an open-source SDK that makes it easier to build and run agents with just a few lines of code. What it brings is a model-driven approach, which means we allow the LLMs to think on their own, to think autonomously and make decisions to reach the results. This is deployed in our demo account using Amazon Bedrock Agent Core. Agent Core is an agentic AI platform which makes it easier to deploy and operate AI agents at scale and safely.

Live Demo: Patchy in Action - From Vulnerability Assessment to Emergency Patching

Let's switch to the terminal now and we can see the intelligent patch automation that we built in action. Can you see the font okay, especially in the back? Awesome, thank you. So we call this Patchy. That's the name we've given to the agentic system that we built. You can see we have access to the specialist agents. For what you're actually seeing is a simple chat client that we built to interact with the agent. Naturally, you will build a proper web app in production, which can even integrate with your internal change systems like Jira or ServiceNow. But here, we just want to show simply how we can interact with the agent.

The chat client seems pretty simple. So the first thing during a critical event is to understand the environment and understand what's running in the account. Even while the platform engineer is taking some time to log into the console to see what the production instances and production environment look like, I can ask the agent to summarize what's there in the environment. I'm going to ask it to also give me information about the compliance requirements for those production instances.

What we are doing is interacting with the supervisor agent that we saw, and it's making the necessary API calls. It's reading the EC2 tags and querying the AWS Config service. Within seconds, we have the complete picture of the production environment. We get the instance details and configuration information about production, but most importantly, we see the compliance framework for each instance, and that's useful information for us to make decisions.

At Manual Everything Corp., if you see, this information is spread across spreadsheets, Excel sheets, and database tools. But here we got it with one prompt. Now, do you remember the first question the security director asked? It was: are we exposed? What's our exposure? So let's ask the agent to find if there are vulnerabilities in our production environment.

This time, the specialist agents—specifically the Vulnerability Analyst agent—consult with the supervisor, and that agent is built to query Amazon Inspector to get these findings. Imagine if you're a security engineer, you would do that by logging into the AWS console manually, downloading the findings, checking the CVE details, and verifying if patches are available for those CVEs.

Then you would download everything and get a priority list, which represents hours of work. Within seconds, you will see exactly 37.6 seconds, and you will see that we got all those results. The agent summarizes that there are 50 high severity vulnerabilities in the production environment and also summarizes which are the top critical ones based on the CVSS score. It identifies which instances are affected by these vulnerabilities and which compliance frameworks are at risk if we do not patch the CVEs. Now here comes the interesting question that was asked by the IT compliance manager: when is the right time to patch this? There is an SLA requirement from the company and a patch management window. How do we come to a decision about when to patch this?

I will ask the agent to come and tell us when is the right time to patch this. Notice that I am not specifying the severity, and I am not specifying vulnerabilities. I am simply asking when should we patch this. This is because with Agent Core, we are using Agent Core memory, which remembers all the conversation history that we have. So it is a context-aware agent, and we will go through that once we see the architecture. This time, the agent is consulting the Patch Manager agent, which is a specialist agent, and it is checking what our patch management window is and what our SLA is, and it gives you a decision. It says patching decision needed: 24 days until the next window. So your maintenance window is 24 days from today. The vulnerability status initially says SLA data unavailable for some reason. Let us actually check again.

The Patch Manager agent will see the code, and in the backend, we give this Patch Manager agent access to a tool which can calculate the SLA requirement. When we ask when should we patch, it actually goes and checks what our SLA policies are for those specific severities and then sees the delta with the maintenance window. So here it goes: it says emergency patching required. There is an SLA violation. There is necessarily a deadline because your instances have a PCI-DSS requirement, so you need to patch within 48 hours. But your maintenance window is 24 days away. So you are at risk. The recommendation is to do immediate patching within 48 hours. The agents helped us avoid an SLA violation and gave a clear recommendation that you need to patch this now.

So up until now, we have identified which instances are affected, which vulnerabilities are there, and we came to a decision that we need to apply patching now. Let us ask the agent to do the patching. But I want to start with Dev first. We checked everything for prod right now, but we need to start with dev and then staging and then prod. So I will ask: OK, apply emergency patching for the Dev instances. This time also, the Patch Manager agent will come into the picture, which is built to do all the Systems Manager API calls. When in the patch manager you deploy and run a run command, you use the AWS Run Patch Baseline document. It actually asked me to clarify which environment should I patch. OK, so let us patch the dev environment. Yes. So at the backend, it is going to generate a run command ID. It is going to target those dev instances, the 5 dev instances, and it will use the AWS patch baseline document for that.

I think it's quite interesting to notice the smartness of the large language model to understand that you've been discussing production the whole time and all of a sudden you're saying you want to swap and switch to the dev environment, actually trying to get a confirmation. In a lot of instances, this doesn't really happen. The more memory and context that the large language model has, the more aware it is about your organization and your processes, the better it becomes with such things.

We can see that a command ID is generated and the status right now is pending. Five instances are targeted. We have some next steps that the agent provides that we can check the status. We can also type proceed to continue with the staging environment. Basically, agents understand that this is a phased rollout. We started with dev, and I didn't specify that it's going to be like staging and then production, so it understands that if I type proceed, it will start patching the staging environment next. Or we can wait, check, and verify that everything's fine before proceeding.

We avoided an SLA violation and did the emergency patching. But what about the times that we were not able to catch that SLA miss? That's what the CTO of Manual Everything Corporation is thinking right now. How many times were we non-compliant just because the team was busy doing a scavenger hunt for the data? This is something even the auditor is going to ask: what was your compliance history? Manual Everything Corporation does not have a tool to get this information.

Let's ask the agent to get how many times we missed SLA. This time, the last specialist agent, the compliance analyst agent, is consulted by the supervisor agent. What it does is fetch the reports from an S3 bucket and then do the math. It checks when the vulnerability was released, when we did the patching, and also checks the SLA policies. It checks the SLA policy and sees how many times we did the patching, which should have been done within the SLA timeline. It says there are a total of six breaches in the last thirty days and also gives the breach details, like which CVEs were breached and which teams were affected by it.

Now we want to explore the architecture and how we built the score and the decision principles we took when building this. For this, I'll hand over to Praveen. A couple of things I'll highlight as we get into the code is that the patch manager agent itself has the ability to verify and do health checks as part of its patching process. The reason we can do it in the interest of time, but under the hood, it's basically checking your application and infrastructure. We'll go into detail about how it actually does it. Once it completes the patching, it uploads all that information to an S3 bucket.

Architecture Deep Dive: Building Intelligent Patch Automation from the Ground Up

Now that we've seen the solution in action, let's look at the underlying architecture and code that drives the solution. You've already noticed that you have multiple agents in the mix. This is based on a pattern called the supervisor pattern, wherein you have a supervisor agent that manages the end-to-end orchestration of the whole workflow while it consults and coordinates with specialized agents. Each specialized agent has a specialized role, a set of instructions, and a set of tools that it can use to complete a given task.

Agents by themselves don't have the ability to understand user intent or reason. For that intent, we have Claude 4.5 as a large language model running on Amazon Bedrock that gives it the brainpower. Even with all that brainpower, the agents need to be able to interact with AWS services. For that, we have specialized tools that are exposed to the specialized agents. In terms of the AWS services we have in the mix are Amazon Inspector for vulnerability findings and AWS Config to give the agents an understanding of the infrastructure.

AWS Config provides the agents with an understanding of resource dependencies. When the patch manager agent makes a patching decision, it understands the business criticality that might have an impact on the whole patching operation. If it's a web application, a front-end facing application that has extremely high business impact, you don't want to patch it. All of that decision is factored in by the patch manager agent. In your case, this could be something like a CMDB like ServiceNow or Jira that you're using, but anything that is able to give the resource dependency view, the large language model will absorb that information and make it available for the decision-making process.

Systems Manager helps us understand the patch compliance state and apply the patches. We use S3 to store and retrieve the compliance reports, and finally EC2 to apply the patches themselves. We're using Strands SDK, as Justin mentioned. The one thing I want to highlight is why Strands is important for one key reason. Strands makes it very easy for users to build agents that can interact in a continuous manner with the large language model and the tools in what's referred to as an agentic loop.

What that means is, think of a scenario where a user puts in a request: patch my instances in dev environment. The agents don't understand it initially, so they need to understand the intent. This request gets routed to the large language model. The model says, alright, I understand the intent, but I need access to tools that can interact with AWS services. So the request gets routed to the tools, the tools talk to the relevant services, come back with a response. That response is fed back to the large language model. It reasons through that and continues on and on until the user's original request is completed. All of this happens within what's called an agentic loop. You don't need to code it; it's just made available to you, and I'll show you when we get to the code itself.

In terms of scale and security, it was very easy for us to deploy this on Bedrock AgentCore. AgentCore comes with six to seven primitives, but we're only using three of those. Runtime to actually host and run our agents, memory for making the agents context aware, and finally observability to be able to understand the state of our agents by using tracing, metrics, and logs. All of which are very useful when you're trying to debug a situation, when you're trying to iterate or evaluate your prompts, or when you're making changes to your large language model parameters. All of that observability makes it very easy for you to get that information.

When Justin and I came up with this idea four months ago, we didn't start with this whole architecture at all. We thought we'd start off with something very basic and simple, understand the capabilities of large language models, and understand what enterprise users would prefer in this capability. So we started off pretty small and then continued to iterate. We got feedback from customers and then iterated again and again and again. In the next thirty minutes that we have left, we'll take you all on that accelerated experience of how we started off with something very basic and then how we iterated and how we got to the stage where Patchy is today. Hopefully you can leverage the tips and practices that we followed and the architecture patterns to get you and your teams to a stage where you're more comfortable handling large-scale events.

Code Walkthrough Part 1: From Basic EC2 Agent to Tool-Enhanced Intelligence

What we started off with first was a very simple EC2 agent that could just list all your EC2 instances and get the compliance state. I'm going to start off with importing some main classes. The agent class is the main class because it's the core interface into your large language model for your conversation management and all of that. This is the most important class that you need. Then for your agents to be able to interact with your AWS services, you need a tool. Strands has a built-in tool called use_aws that's available with the Strands library. I'm going to quickly define the agent, starting off with a name. Agents have their own identity.

Then give it a system prompt. The system prompt is where you specify the role, the persona, the traits, and the instructions that you want your agent to have. In our case, I'm going to keep it very simple: you're an EC2 assistant. That's it. And finally, don't forget the tools. Use AWS, which is a tool that allows the agent to interact with AWS services. That's it. Six to seven lines of code within Strands that allows your agent to interact with the EC2 services in your account, get the list of EC2 instances, and then interact with Systems Manager to get the compliance date. That's it.

So I'm going to quickly start off. I'm going to say, list all my EC2 instances in US East One and their compliance state. That's it. I'm going to run this. All right, Python. Hopefully I didn't make a typo. Yeah, so what you're seeing at the very top is "I'll help you list all your EC2 instances." That's a large language model actually understanding the user intent, and it's figured out that it needs to do a few things: get the EC2 instances, then get the compliance date. And then you see the tool invocation and the response that comes back. The intent is being reasoned by the large language model and goes on and on, right? So you'll see multiple tool invocations happening. And then it comes up with the final response, which is like, here are all your production instances and whether they're noncompliant or compliant, and a bunch of other information that I probably didn't need. But anyway, this is just a basic agent, right? What I do want to highlight is the agentic loop again. The large language model invoked the tool, got the response, and as you can see here, I don't know if the highlighter is working, but the invalid parameter value exception. We've reached a stage where large language models can autocorrect themselves. It figured out that the syntax that it followed was wrong, goes back to its knowledge base, figures out what the right syntax is, and submits that again. And it probably tries AWS Config. There you go. Because we said compliance state, it goes to Config for some reason to get the compliance information, and then figures out Config doesn't have it, and then autocorrects itself and tries to get that information from Systems Manager, which is here somewhere. Yeah, I'll use the Systems Manager compliance API to get it. So that was it.

So we realized we can do a lot, but the performance was not optimum. The fact that we had to go through five tool invocations, sometimes seven, it wasn't ideal. So we thought, how about we make the agent more richer and more powerful? Give it more tools and tweak our system prompt. So at this point in time, we went with something like this. Just going to remove this. The same agent, but with a slightly more enriched prompt and better tools that it has access to. And I'll explain to you a bit more of what's happening under the hood.

So you've got the Bedrock model defined. When we didn't define one with Strands, by default it uses Claude. For today in US East One, it also sets up all the parameters like temperature and top P, which controls creativity, and a few other things like max tokens as well. It has default parameters which might not be suitable for our use case. So we had to control that as well. And in our case, we just tried, can you use Nova, which gives us better cost performance to see how it works? So this was a second iteration, literally this was it. And then control the temperature element to control the creativity of the large language model to make it as deterministic as possible.

Now, for all the Python aficionados, you can see the tools here. I've got two tools: one to get the EC2 instances and the other to get the patch compliance state. It's just a Python function leveraging Boto3, that's all it is. The only difference is the fact that we use a tool decorator, which exposes a Python function to be used as a tool by the large language model. And the way it does it is by the docstrings. The docstrings, as long as you're clear and structured in terms of what the tool does, it knows when and how to use it. So in this case, you just say get EC2 instances, optionally filtered by environment, and so on. That's all it is. And the same thing with the get patch compliance, we're doing the same thing here as well, right? So that's the tool bit.

Now coming to the prompt, previously we saw just "you're an EC2 assistant." We wanted to change that because you could see all that additional information coming through which we didn't need. So you could control that using what's like a behavior, which is basically a set of instructions that I'm giving to the agent to ensure that it follows a few things that I wanted.

I want it to give me the exact summary to begin with, and then the details, markdown tables for multiple instances, and use the default region if it's not mentioned. But here comes the interesting bit. I'm building an EC2 assistant, and I don't want anything else. So you can control the scope and the role that it needs to perform using a prompt. In this case, I've just said if a user asks anything about non-EC2 related queries, just respond by saying I don't handle it. Simple as that.

Then there's being explicit with your tools. Again, you don't need to—it's a personal preference—because the docstrings themselves help the large language model understand when it should use the tool. But I'm just pedantic. I prefer to be very explicit and make sure the agent understands what it needs to do. And then the output format: we're using a chatbot on a terminal, so we had to follow a certain set of instructions from an output point of view. And then the tools. As you can see, we're not using AWS anymore. We just have two tools. I've got a very basic while loop just to start thinking about a chatbot experience, and that's all it is.

So let's invoke this one. I'm going to say Claude, Python, EC2 but with tools. It comes with a request, so I am going to copy paste my previous prompt itself so you can see exactly the behavior, but this time with a more intelligent agent. So you'll see the streaming information coming through, and this is one of the reasons why we built a chatbot on terminal just to show you the streaming behavior. We probably could have done this with a web app, but it was just a bit easier with the chatbot.

So you can see it figures out the intent and makes only two tool calls. That's it. If you remember, before I go, just observe this behavior as well because this is quite important. The fact that it just types one line at a time. I'll explain what's happening under the hood and our experience with why this is not ideal. It just did two tool calls. If you remember, the previous agent did five, sometimes seven, sometimes two—it's unpredictable. With tools, you start getting more predictable, deterministic behaviors.

Now, to the point about that single line being printed: that's because the tool we had for patch compliance state only allowed for one instance at a time. So the large language model is basically making five API calls to Systems Manager to get that information—there are five instances. Not ideal. Batching is the way to go, but we realized that the hard way. Actually, it's just that I think we were being lazy. One last thing around the scope: list all my Lambda instances. Now, if you remember, this is just an EC2 agent. The agent shouldn't give me any information. So from a security point of view, I can build in guardrails. It's just one way to do it. There are other ways as well, probably some things that are more complex, but this is one way to actually achieve it, especially when it comes to injecting unwanted content into your prompt.

So this is one way to actually achieve it: having controls in place within your system prompt. It just says I only handle EC2 related queries. Ignore the duplication, but it's just saying I only handle EC2 related queries, and the same would apply if you keep trying. So that's it. We reached a stage where we realized we're able to do a lot more than what we use AWS for, so we started building more tools. But the agents started getting heavier and heavier. So we got to a stage where it started becoming a monolith versus microservice sort of debate again. Should we go with a single agent, or should we start splitting into multiple agents?

Code Walkthrough Part 2: Project Structure and Tool Design Principles

The reason why it was becoming more obvious for us to go down the latter path was because of hallucination and confusion. The simple philosophy we started following was: if an agent is not doing one specialized task, then it should be broken up. Again, it's a philosophical debate—monolith versus microservices. I'm not going to get into it. I'm pretty sure we're going to spend all day, maybe all week on it, but we're not going to get anywhere. But the reality for us was that we started getting better results by breaking it down and evolving it and adding more tools. We got to a stage where we have Apache today, and I'll walk you through the Apache codebase now.

Alright, I'm just going to close these things. So just very quickly in terms of the project structure, all of the agent related code is within the agent folder itself. We have helper functions within the helper folder, and I'll walk you through that as well. But just very quickly, the Bedrock Agent Core YAML file that you're seeing in the Docker file—these are pertaining to the agent code deployment, so they are the configuration file and the Docker file that's used by the runtime to deploy agents.

Everything else includes the four agents: the patch manager, supervisor, vulnerability agent, and compliance analyst. In the helper functions, we have memory, globals, and a few other things. The key thing I want to highlight is the tools file.

You can either have the tools embedded within your agents. However, coming from a software engineering background, it was easier for me to externalize and have it all in a separate file. This made it easy for maintainability, readability, and other considerations. You can always import it as simply as that. I've already walked you through a few tools, so I'm not going to bore you with that again. I just want to show you one tool and what it could look like from our perspective.

We have the get vulnerability findings tool here. I'm going to remove this so you can see the whole screen. There's probably 23 key things that I want to highlight here. One is the docstrings. If you've been running Python code, you're familiar with this. Being as clear and structured as possible makes it that much better for large language models to understand when and how to use a tool. It's garbage in, garbage out, as simple as that.

In our case, we define what the purpose is, the arguments like CVSS severity, environment, and limit, for example, being very clear in terms of default values and giving examples where possible. We also define what it returns once the tool is invoked. Try to return dictionary objects where possible. The more context that the large language model has compared to just a string output, the better it will be in terms of providing you with richer information as an output.

Now, one thing I do want to highlight is the fact that you have optional and required or mandatory arguments. You can basically control the behavior of agents using this one thing. To give you an example, let's say in get vulnerability findings, if my severity was mandatory or required. When the large language model is invoked, if Justin in his prompt says "give me all the vulnerabilities" and that's it, he's not mentioned the severity at all. At that point in time, if this was required or mandatory, the large language model might assume that the user is asking for all of it, so it probably assumes they want critical findings because critical is high. There's an assumption being baked into the large language model.

However, if you enforce it, then the large language model will question or check with the user again to say, do you mean critical, high, medium, or low? You can control the behavior using just the arguments itself. Other than that, it's not rocket science. It's a simple Python function. We're using Bedrock clients to get that information for vulnerability findings.

Code Walkthrough Part 3: Supervisor Agent and Patch Manager Implementation

Moving on to the supervisor agent, I'm going to have a quick look at the time. We're good on time. This is the supervisor agent that we have, basically the one that controls the entire orchestration for the workflow. Now, for the supervisor to interact with the agents, we're using a pattern called the agents' tool. If I scroll all the way down, you'll start seeing what I mean. I'll just mention one thing very quickly.

You'll see these three lines where we're importing the three key agents from the different files that they belong to, and we're making them available as tools by using the tool decorator. I'm going to scroll down very quickly. Here it is. This is just to consult the vulnerability analyst, which is the agent itself, the specialized agent. We just pass the natural language request in as a string, allowing the vulnerability analyst to understand the intent of the user. So all the supervisor agent is doing is understanding the intent, passing that request onto the specialized agent based on their capabilities, and I'll walk you through that as well. Then let the specialized agents do the rest.

In this case, we've got the three tools. We're also using the use AWS tool in case a request is outside the scope of any of these three agents. I'll walk you through very quickly the system prompt itself. You're defining what the role is, which is a patch automation coordinator routing requests to the specialized agents. I have some response guidelines in terms of what are the things it needs to follow, like verbosity in terms of the output, being very direct about what we actually need and what format we need it in.

What we actually need is to understand what format we need the data in, including pagination and truncating the data if it's more than, say, 10 items. You might have seen that there are 50 severities available, but we're only showing 10, for example. Those kinds of things matter. Then there's the routing priority—how do I want the priority to be set, and how do I want the request to be routed? This involves understanding the user's primary intent. If we can't find that information, we go to the context from memory itself. If we can't find that either, then we route to use AWS. That's it. I just provide the details around when to use the specialized agents, which is for analyzing security vulnerabilities, CVEs, and so on. You can use semantic routing, keyword routing, or capability-based routing as well. This is just one way we could do it.

We mention the tools again. Now, if you recall the second agent from memory, when the tool started printing out one line at a time, this is what I was trying to get to. With tools that you've defined, you can control batching. If you have an API call that needs to be made 10 or 20 times, you can control it. But something like use AWS, which is a built-in tool, you can't control it. So you're using the prompt itself to control the large language model's behavior of how it should use or leverage that tool. In this case, batching related queries and using filters to reduce data transfer all helps in reducing token consumption. Yes, we have to pay for tokens, and we realized that again as the cost started going up.

It was just output format, like emojis, to make it look appealing. But that was it. That's the supervisor agent. To give you an overview, I'll show you Patch Manager very quickly because that's the most complex one, and every other specialized agent follows the same sort of format. As you can see, we've got a whole heap of helper tools available. It's just because we wanted very specific capabilities to come out of it. We want it to be deterministic, and we want it to be consistent. Walking you through it, we defined the scope very clearly because what we observed was that when the scope wasn't defined, the supervisor agent, when a request comes through, routes it to multiple agents. If a specialized agent is not able to perform the set task because it's not in its scope, it comes back with an out of scope response, and the supervisor agent now needs to correct itself and figure out which other specialized agent it needs to route the request to. This is extremely important and something you might want to keep in mind.

Now, as you might all appreciate, patching as a process is extremely rigorous and very controlled. You don't want it to just be done in an ad hoc way. You want to control that whole process. When we started off, Patch Manager was doing things in parallel when it shouldn't, like it was trying to do dev and staging at the same time, trying to be more efficient. But we realized that's not the way it should be. So we had to bake in this concept of workflow within the prompt to control its behavior. The way we started off was to get the CVE severity first. Either the user provides that in the prompt itself, saying "give me the critical severity in the dev environment," or if that's not available, we extract it from memory as well. So it has different ways that it can absorb that information and use that for its calculation.

Then we get the patch compliance. We have multiple applications running within these instances, each assigned a compliance framework that has an SLA associated with it. So we get the patch compliance state based on the environment and the severity that's been provided. Next, we get the maintenance window for the said environment, which is when the next execution from a patching perspective is going to happen. Now comes the smart decision-making ability of the Patch Manager agent. Now that it has the severity information and understands the SLA of your application, it needs to make a decision on whether it has to patch it in an emergency form, which is straight away, or if it can wait for the next maintenance window. That all happens with just a few lines of code, rather than a pretty large if-else statement that we would have probably written. It's far too complex for anybody to understand. Then we just display that information, which is whether the windows are within or beyond the SLA deadline, so we need to either patch it now or leave it for later. That whole workflow was baked into the prompt. Could we have optimized it? Absolutely, but we reached this stage based on a lot of tests.

Last week we had a release, and as we're getting closer to re:Invent—we're actually in re:Invent already—there are a lot of new announcements. One new announcement was Strands' Standard Operating Procedure, wherein a lot of the workflow implementations can actually be externalized. This makes the prompt a lot smaller, which means less token consumption, faster responses, and faster agent responses. Then comes instance ranking. It performs calculations on emergency versus non-emergency scenarios and has the ability to rank your instances based on business impact and blast radius. You can actually go through phased execution. This would have taken a long time, which is why we couldn't show it in the demo as I mentioned previously, but under the hood, it has awareness that it needs to proceed in a phased manner, even in an emergency patching situation. It does this while keeping in mind that health verification is extremely important, which is mandatory after each environment. We've all observed this when you patch—applications experience issues and just go down, impacting your uptime and so on. So we built a very simple health verification system. You could build a more complex one, obviously, but the logic is the same. We have two layers of checks. One is a simple ping check with Systems Manager, which is for the infrastructure layer. The second one is for CloudWatch alarms. In your case, you could integrate your agents as a webhook with Splunk, New Relic, DataDog, or whatever you use to see if any alarms have been triggered as part of the patch. If that happens, it can roll back.

How useful would that experience be where you can just roll back capacity to apply because it's broken the system? It's also smart enough to figure out that it only rolls back that environment, so you can compare a delta between a previous or lower environment where your patch was successful compared to this environment as well. Finally, all this information needs to go somewhere for compliance reporting so our CTO is happy. That happens with a specific tool, and all that information is captured. We had rollback policy output format, but this is pretty straightforward. Couple of things I want you all to take away from a prompt perspective—I'm sure you've all experienced this. As much as I want to say prompt engineering is an art, it feels like there's a lot of science behind it now. A few key tips that have helped us: keep them as structured and clear as possible. Keep them concise. The more robust they are, every time the agent is working with your specialist agents, they're taking a lot of this with them, which makes it really hard and becomes more expensive. Keep in mind that you don't have conflicting instructions at the top and the bottom. It's happened a few times where you've said something at the top and the bottom is completely different. Agents confuse quite easily in that regard. Exemplify—show them an example of what good looks like so that they use that as a pattern moving forward.

That's pretty much what we had in terms of the code itself. What I'm going to do very quickly is change that to PPD. We've all seen it. Justin's already shown the demo, I've gone through the codebase with you, and if you have time, we're more than happy to take questions either offline, depending on how easy Romi is going to be on us, or we're happy to take questions or just step out of the room and answer any questions. We've just seen how agentic AI can help you manage everything better and be better prepared. It's technically and operationally viable. It's not if—it's just a matter of when. I'm not saying it will magically solve everything tomorrow, but I think we're very close to a stage where answering questions for the security director about what's a risk or blast radius can be done in seconds. You're no longer dependent on application owners and application teams to give you that information. You can get that within a matter of seconds. Platform engineers now have the confidence and comfort knowing that if they apply a patch, they can roll that back. This is especially useful in a centralized environment where you have application teams, platform engineering teams, cloud teams, and so on. Compliance managers are feeling a bit more at ease, knowing that they don't need to rush through a critical vulnerability just because it's critical. If it fits within the maintenance window, they can make that decision sooner. The CTO is super happy that they're not going to be in the press because they're more compliant. They understand the health of their business from a patching perspective and can continue to help the teams evolve and get better. So if there are three key things that we want you to take away, those are the main points.

Key Takeaways: Tools vs Prompts, Augmentation vs Replacement, and Iterative Development

Tools versus prompts is the first one. I think we get to a stage where it becomes a bit more philosophical, but what we've experienced is that tools are more deterministic and more consistent, while prompts are more creative and let the large language model think on its own. Sometimes it works, and sometimes it doesn't. So keep that in mind when you're trying to build a particular solution or get to a certain outcome. Think about tools versus prompts when making those decisions.

Augment or replace is the second key takeaway. I think a lot of people believe AI is just going to take over the world. I think it will eventually, but we're not at that stage yet. Whatever we've shown you today demonstrates the augmenting capabilities of AI to make your teams and systems smarter and more powerful. These tools are meant to augment or replace your existing systems. Agents will give you the scale and agility, but they need humans to provide the governance and oversight. Without that, it's going to be a really bad situation for all of us.

Finally, build small, test, and iterate. We didn't get to where we are on day one. It took us a few months to get here. We've all got our day jobs, but I'm sure if you put your mind to it, just like we did, you can accelerate through the whole experience. Start small, test, and iterate. Use your observability tooling to get to a stage where you understand what the agents can do and what the responses look like before you tweak them. Take your time with this process.

One key thing I want to highlight is that even if you're not a software engineer, please follow all the software engineering practices in terms of version control. Look at tools that help you with model evaluation and prompt evaluation as you tweak your prompts and models. The parameters are super useful. We are planning to publish the code soon. Whatever we've built has been based on patterns and samples that are easily available. These are the QR codes for Bedrock Agent, code samples, and Strands Agent. Hopefully we'll have our code published soon, but in the meantime, you can take the lessons you've learned today in terms of the patterns and architecture principles and apply them with these samples.

Everyone's here for swag, I'm pretty sure. If you're not, you can come for the demos. We'll be there as well. Please do visit the Cloud Ops kiosk at the AWS Village. On behalf of Justin and myself, thank you so much. We do have five minutes, so we're happy to come around and take questions if that's okay with Ziggy. Thank you so much. We really appreciate it.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community