Ravi Teja Reddy Mandala

Posted on Apr 9

I Put an AI Agent in My Incident Workflow for 7 Days. Here’s What Actually Broke.

#ai #sre #devops #cloudcomputing

Everyone says AI agents will reduce on-call fatigue.

So I tried adding one into a real production incident workflow not to replace engineers, but to assist with triage, summarization, and next-step recommendations.

It helped in some places.
It failed in others.
And the biggest lesson had less to do with the model and more to do with system design.

The Setup

I integrated an AI agent into a typical incident response flow:

Incoming alerts from monitoring systems
Initial triage and classification
Root cause hypothesis
Suggested remediation steps

The agent was allowed to:

Summarize alerts
Group duplicate incidents
Suggest possible causes
Draft remediation steps

The agent was NOT allowed to:

Execute production changes
Restart services
Modify configs
Trigger escalations automatically

This was intentional. I wanted to see where it adds value without risking production.

What Worked Surprisingly Well

1. Alert Summarization

The agent reduced noisy alerts into clean summaries.

Instead of reading through logs, I got:

“High latency observed in service X after deployment Y. Likely related to dependency Z.”

This alone saved time during high-pressure incidents.

2. Duplicate Incident Grouping

It grouped alerts that were actually the same issue.

This reduced alert fatigue and helped focus on the real root cause faster.

3. Drafting Next Steps

It suggested reasonable first actions:

Check recent deployments
Validate dependency health
Inspect error spikes

Not perfect, but a solid starting point.

What Broke Almost Immediately

1. Wrong Prioritization

The agent sometimes treated low-impact issues as critical.

Severity is not just data. It is context.
And context is hard.

2. False Confidence

The responses sounded very confident even when wrong.

This is dangerous in production systems.
Confidence ≠ correctness.

3. Noisy Recommendations

Some suggestions were technically valid but operationally useless.

Example:

“Restart the service”

In production, that is not always acceptable without deeper checks.

4. Escalation Confusion

It struggled to decide when to involve humans.

Too early → noise

Too late → risk

That balance is harder than it looks.

The Real Problem: System Design

After a week, it became clear:

The AI agent was not the main problem.

The real issues were:

Weak incident workflows
Poor escalation design
Lack of structured context
No clear guardrails

If your system is messy, the AI will reflect that mess faster.

The Architecture That Works Better

Here is what I would recommend instead:

Alert comes in
AI summarizes + groups signals
AI suggests possible causes
Human validates context
AI drafts remediation options
Human approves final action

AI as a co-pilot, not an autopilot.

Key Takeaways

AI is great at summarization and pattern detection
It struggles with context and real-world constraints
Confidence can be misleading
System design matters more than model capability

Most teams trying AI in incident response are not failing because of the model.

They are failing because their workflow is not designed for AI.

Final Thought

AI can absolutely improve incident response.

But if your escalation paths, permissions, and observability are weak,

the agent will not fix your system.

It will expose it.

Question for You

Would you allow an AI agent in your on-call workflow?

Recommendation only
Limited action with approval
Full automation

Curious to hear how others are approaching this.

Top comments (18)

Benjamin Nguyen • Apr 9 • Edited

Great article! You made a valid point. I think that agents has their limitations. You need human in the loop to verify the code the or the decision of the AI. I think that OpenClaw is probably the most autonomous agent at the moment. You need to verify the accuracy of the work of the agent.

Ravi Teja Reddy Mandala • Apr 9

Appreciate that, Benjamin — completely agree.

Human-in-the-loop is still critical, especially when decisions have real impact. In my experience, the challenge isn’t just accuracy, it’s knowing when the system is confident vs when it’s guessing.

That’s where I see a lot of these agent systems struggling today — they can automate steps, but they still need clear boundaries and validation points.

I like your point on autonomous agents as well. Feels like we’re getting closer, but we’re not at a stage where you can fully trust them without guardrails.

Benjamin Nguyen • Apr 9

You made a valid point. It is reason why I am cautious about agents these days. I agree with you 100% to have guardrails for the agents or the AGI.

Ravi Teja Reddy Mandala • Apr 9

That’s a fair take, and honestly a healthy mindset right now.

I think the real shift is treating guardrails as a core part of the system, not an afterthought. The more critical the workflow, the more you need clear boundaries, validation, and fallback paths.

Agents are powerful, but without those controls, they can fail in very non-obvious ways.

Benjamin Nguyen • Apr 9

That is true!

leob • Apr 9 • Edited

Great write-up - and yes, I see AI as promising in this field ...

Ravi Teja Reddy Mandala • Apr 9

Appreciate that!

I’m bullish on AI here too, but I think we’re still in the phase where it works best as a “visibility layer” rather than a decision-maker.

It’s great at summarizing and surfacing signals, but the moment it has to reason across messy, real-world systems, things get interesting 😄

Have you tried using it in any production workflows yet?

leob • Apr 9 • Edited

Not personally, but I agree it can be really powerful (and a time saver) when used for certain things - and it can be a time and money drain for other things ... it all depends on WHEN and HOW you use it, and that's definitely an "evolving art" - people come up with clever ideas and approaches all the time ...

For instance, I just came across this article, which describes a simple but really brilliant approach (for software/app development, in this case):

boristane.com/blog/how-i-use-claud...

The main takeaway, for me, is how he's asking the AI to produce "artifacts" in the form of MD files to document the results - and he then goes into a loop, adding notes/comments, and asking the AI to refine the document based on that, etc - until he's satisfied ...

And using that approach he goes through a "research" => "plan" => "implement" pipeline, with each stage resulting in tangible 'artifacts' (the MD files), which also serve as the Context - rather than having messy/ephemeral/unstructured chat sessions with endless "prompting" and retrying, with the Context not really being tangible, and with the developer not really feeling 'in control' ...

Eye opener, in the category of "why didn't I think of that myself?" ;-)

Ravi Teja Reddy Mandala • Apr 9

This is a great example — thanks for sharing.

The “artifact-first” approach really resonates. I’ve seen similar patterns work well when the AI is forced to externalize its thinking instead of keeping everything in a chat loop. It makes the process auditable and, more importantly, gives humans something concrete to validate.

In incident workflows, I think this could translate into things like structured runbooks, timelines, or even evolving incident summaries rather than just transient responses.

The interesting challenge is making sure those artifacts stay grounded in real system context, otherwise they can still drift.

Curious — do you see this approach scaling well in more dynamic environments like on-call or incident response?

leob • Apr 9 • Edited

This is a brilliant way to put it:

"the AI is forced to externalize its thinking instead of keeping everything in a chat loop"

and this:

"It makes the process auditable"

and:

"do you see this approach scaling well in more dynamic environments like on-call or incident response"

Yes I do, in the ways you're hinting at ... this idea of 'artifacts', to broaden the idea: it's all about adding structure to your process, control, accountability, observability, etc ...

People are now starting to realize that the holy grail is not in "prompting harder" (and burning more tokens), you need a more disciplined/structured/intelligent approach ...

Ravi Teja Reddy Mandala • Apr 9

Really appreciate that — you captured it perfectly.

I completely agree, the shift is less about “better prompting” and more about building structured workflows around the AI. The moment you treat outputs as artifacts instead of responses, it changes how you design the entire system.

In my experience, that’s also where reliability starts to improve — you get traceability, iteration, and a clearer feedback loop.

The interesting next step is figuring out how to keep that structure lightweight enough for fast-moving environments like on-call, without slowing engineers down. That balance is where most systems either succeed or fall apart.

leob • Apr 9

"building structured workflows around the AI" - that's a good way to summarize it!

Ravi Teja Reddy Mandala • Apr 9

Exactly — that’s the shift I’m seeing too.

Once you start thinking in terms of structured workflows instead of prompts, AI becomes much more predictable and usable.

Feels like we’re moving from “prompt engineering” to more of a “system design” mindset for AI.

Benjamin Nguyen • Apr 19

Hi Ravi,

I was wondering if you want to connect with me on LinkedIn?

Ravi Teja Reddy Mandala • Apr 25

Hi Benjamin,

Absolutely, happy to connect. Here’s my LinkedIn:
linkedin.com/in/ravi-teja-reddy-ma...

Benjamin Nguyen • Apr 25

It sounds good!

Bhavin Sheth • Apr 9

Tried something similar in our setup — summarization was a big win, but prioritization was always off without real context. Feels like AI helps only when your incident workflow is already clean, otherwise it just makes the gaps more obvious.