AWS DevOps Agent is a service that autonomously investigates incidents and identifies root causes.
In this demo, I'll try to simulate a scenario where EC2 instances behind an Application Load Balancer start failing health checks, and watch AWS DevOps Agent diagnose the problem.
Demo
In production environments, one of the most common incidents is ALB targets becoming unhealthy. This can happen for many reasons:
- Application crashes
- Database connection failures
- Memory exhaustion
- Dependency timeouts
- Misconfigured health checks
In this demo, I'll deploy a Flask web application behind an ALB and simulate a database connection failure that causes the health endpoint to return 503 errors. The ALB will mark the targets as unhealthy, trigger CloudWatch alarms, and AWS DevOps Agent will investigate the root cause.
Here is the diagram, just a simple ELB, two instances and alarms. I recommend deploy this stack to us-east-1 cause at this time writing this, AWS DevOps Agent service is only available there.
I have also included a Lambda function that auto shut down instance after 2 hours, no worry about unexpected cost.
CloudFormation template: https://gist.github.com/Hung-00/e53f4c980baf13d9bb8902fd36a79a6b
Check the output of the stacks after successfully created.
Go ahead connect to the two instances. You can connect through Session Manager
Run this command to check heatlh status:
curl http://localhost/health
Server is healthy.
Now use this command to interrupt both servers:
curl -s http://localhost/simulate/unhealthy
or
curl http://localhost/simulate/crash
The alarm had triggered.
Now let's head to AWS DevOps Agent.
You can learn how to create an Agent Space, the process is straightforward and simple.
Open Operator Access.
Have a look at your system in tab DevOps Center. You can see the stack's resources.
Change to tab Incident Response and let's investigate latest alarm.
You can watch the Investigation Progress. The agent shows its reasoning as it investigates:
It is interesting that AWS DevOps Agent can actually know the user did interrupt the server. As you can see below:
Each investigation has it own chat session, you can ask the agent about it.
Go to tab Prevention and run. Agent will analyze and give you some recommendations to improve based on investigation in history.
AWS DevOps Agent can not resolve the incidents by itself. You need to fix the root cause and implement the recommendations on your own.
Finally, run this command to restore healthy status:
curl -s http://localhost/simulate/healthy
Remember to delete the stack if you don't want to continue.
Conclusion
In this demo, we saw how AWS DevOps Agent cuts down resolution time, finds root causes quickly, and suggests ways to prevent similar issues.
The agent works best when it understands your full environment — AWS accounts, external tools, everything. Adding MCP servers for custom integrations could make it even more powerful.
It's free during preview, with some usage limits. Security is administrator-controlled through IAM permissions.
I think DevOps Agent is a tool, not a replacement. Engineers are still essential for implementing fixes, designing infrastructure improvements, and making critical decisions when rollbacks are needed
This is just a simple scenario to have a first look at AWS DevOps Agent, I will try to stimulate more scenarios in the future.
Thank you.

















Top comments (0)