AWS announced DevOps Agent at re:Invent 2025, calling it a "frontier agent that acts as an experienced DevOps engineer." The marketing promises autonomous incident investigation, root cause analysis, and proactive prevention.
I spent the past week digging into the documentation, testing the preview, and analyzing what AWS carefully avoided mentioning. Here's what I found.
TL;DR
AWS DevOps Agent is a powerful diagnostic assistant, not an autonomous operator. It can investigate incidents and suggest fixes, but it cannot execute them. The preview is free, but AWS hasn't disclosed GA pricing. Your job is safe because someone still needs to actually fix things.
What DevOps Agent Actually Does
Think of it as a 24/7 on-call engineer that never sleeps, never gets tired, and never forgets to check the logs. When an alert fires at 2 AM, it immediately starts investigating.
Core capabilities:
| Capability | What It Does |
|---|---|
| Incident Investigation | Correlates metrics, logs, traces, and code changes |
| Root Cause Analysis | Identifies probable cause using topology understanding |
| Mitigation Plans | Suggests steps to fix with rollback guidance |
| Prevention Analysis | Analyzes historical incidents to prevent recurrence |
| Stakeholder Updates | Posts findings to Slack channels and tickets |
What makes it "frontier":
AWS calls this a frontier agent because it can run autonomously for hours or days. It doesn't need you to guide it step by step. Give it an alert, and it figures out what to investigate, which logs to pull, which deployments to check.
The Integration Ecosystem
This is where DevOps Agent gets interesting. It's not locked into AWS services.
Built-in integrations:
- Observability: CloudWatch, Datadog, Dynatrace, New Relic, Splunk
- CI/CD: GitHub Actions, GitLab CI/CD
- Ticketing: ServiceNow (native), PagerDuty (webhook)
- Collaboration: Slack
The MCP wildcard:
Through Model Context Protocol servers, you can connect anything: Prometheus, Grafana, custom internal tools, proprietary systems. This is the underrated feature. While competitors lock you into their ecosystem, AWS lets you bring your own.
What AWS Isn't Telling You: Pricing
Here's where it gets murky.
Preview limits (documented):
- 20 incident resolution hours per month
- 10 incident prevention hours per month
- 1,000 chat messages per month
- 10 Agent Spaces maximum
- 3 concurrent investigations
- 1 concurrent prevention task
GA pricing (unknown):
- Per hour? Per investigation? Per seat? Per account?
- Third-party tool API costs passed through?
- Bedrock model usage fees?
- Multi-region pricing?
Hidden costs during preview:
"Queries and API calls made to other AWS and non-AWS services may generate charges from those services."
Translation: Your CloudWatch and X-Ray bills might increase. If Datadog charges per query, those costs are on you.
My speculation on GA pricing:
Based on Bedrock pricing and similar services, expect something like $50-150 per investigation hour. A complex incident taking 4 hours of agent time could cost $200-600. For organizations with frequent incidents, this adds up quickly.
What DevOps Agent Cannot Do
This is the part AWS marketing glosses over.
It cannot:
- Execute fixes (only recommends)
- Deploy code changes
- Modify infrastructure
- Make policy decisions
- Handle unprecedented situations
- Operate autonomously in regulated industries
The regulatory reality:
For healthcare, finance, or any regulated industry, DevOps Agent is a diagnostic assistant. It cannot be an autonomous operator. Compliance requires human decision-making for changes. This alone disqualifies the "replacement" narrative.
Why Your Job Is Safe
I've seen the LinkedIn panic. "AI is coming for DevOps jobs!" Let me explain why that's wrong.
What the agent does:
- Detects and monitors
- Investigates and correlates
- Reports and recommends
What you still do:
- Implement the actual fix
- Deploy changes to production
- Verify the fix worked
- Make architectural decisions
- Handle the weird edge cases
- Build new infrastructure
- Coordinate across teams
The Commonwealth Bank example:
AWS cited that Commonwealth Bank found a root cause in under 15 minutes using DevOps Agent, versus hours manually. Notice what they didn't say: the agent fixed it. An engineer still had to implement the solution.
DevOps Agent doesn't reduce headcount. It reduces MTTR.
Your value isn't in correlating logs. Your value is in knowing what to do with that information. The agent accelerates the boring parts so you can focus on the interesting ones.
Agent Spaces: The Security Model
One thing AWS got right is the security boundary model.
Each Agent Space:
- Has its own dedicated IAM role
- Defines exactly which accounts it can access
- Controls which tools are connected
- Isolates data from other Agent Spaces
Access patterns:
- Admins configure via AWS Console
- Operators interact via standalone web app
- IAM Identity Center or direct IAM authentication
Resource discovery:
- CloudFormation stacks (including CDK) are auto-discovered
- Terraform and console resources need tags
- No tags = invisible to the agent
If your infrastructure is a mess of untagged resources, DevOps Agent won't help much. This is actually a feature: it forces infrastructure hygiene.
Real Limitations I Found
Testing revealed issues AWS documentation doesn't highlight.
Investigation accuracy varies:
One tester reported that when the time between two alarms was around 40 minutes, the agent couldn't find the root cause and required a re-run. The agent isn't infallible.
English only:
No multilingual support. If your team operates in other languages, this limits adoption.
US East only (for now):
The agent runs in us-east-1, though it can monitor resources in any region. Multi-region redundancy isn't available during preview.
Context dependency:
The agent's effectiveness directly correlates to how well you've connected tools and tagged resources. Garbage in, garbage out still applies.
Investigation gaps feature:
To its credit, DevOps Agent explicitly shows "Investigation Gaps," things it couldn't analyze due to missing logs, absent SSH access, or incomplete telemetry. This transparency is valuable but confirms the limitations.
How to Maximize Effectiveness
If you're going to use this, do it right.
1. Connect everything:
Don't just connect CloudWatch. Add your GitHub repos so it can correlate deployments. Connect Slack so it updates your incident channel. Add Datadog or whatever you use.
2. Use MCP for custom tools:
Have internal tools? Build an MCP server. The protocol is open and documented. This is how you get real value.
3. Tag your resources:
If it's not in CloudFormation, tag it. Use consistent key-value pairs across your infrastructure.
4. Create runbooks:
DevOps Agent supports runbooks as "pre-loaded guidance." Create them for your common incident patterns. This gives the agent hints about where to look.
5. Start with one Agent Space:
Don't create 10 spaces immediately. Start with one team or application, learn the patterns, then expand.
The Bottom Line
AWS DevOps Agent is genuinely useful. It's not a gimmick. The ability to have something correlating data across 5 different tools at 3 AM while you sleep is valuable.
But it's not magic. It's not replacing anyone. It's a sophisticated diagnostic tool that still requires human judgment to act on its findings.
Use it if:
- You have frequent incidents and high MTTR
- Your observability tools are already well-integrated
- You want to reduce on-call burden (not headcount)
- You're willing to invest in proper tagging and MCP setup
Skip it if:
- You're in a heavily regulated industry requiring human approval for all changes
- Your infrastructure is poorly documented
- You expect it to fix things, not just find them
The preview is free. Try it. But go in with realistic expectations.
What's your take on AI agents in DevOps? Have you tested the preview? Drop a comment below.
Resources:





Top comments (0)