DEV Community

I Put an AI Agent in My Incident Workflow for 7 Days. Here’s What Actually Broke.

Ravi Teja Reddy Mandala on April 09, 2026

Everyone says AI agents will reduce on-call fatigue. So I tried adding one into a real production incident workflow not to replace engineers, but ...

Read full post

Benjamin Nguyen • Apr 9 • Edited

Great article! You made a valid point. I think that agents has their limitations. You need human in the loop to verify the code the or the decision of the AI. I think that OpenClaw is probably the most autonomous agent at the moment. You need to verify the accuracy of the work of the agent.

Ravi Teja Reddy Mandala • Apr 9

Appreciate that, Benjamin — completely agree.

Human-in-the-loop is still critical, especially when decisions have real impact. In my experience, the challenge isn’t just accuracy, it’s knowing when the system is confident vs when it’s guessing.

That’s where I see a lot of these agent systems struggling today — they can automate steps, but they still need clear boundaries and validation points.

I like your point on autonomous agents as well. Feels like we’re getting closer, but we’re not at a stage where you can fully trust them without guardrails.

Benjamin Nguyen • Apr 9

You made a valid point. It is reason why I am cautious about agents these days. I agree with you 100% to have guardrails for the agents or the AGI.

Ravi Teja Reddy Mandala • Apr 9

That’s a fair take, and honestly a healthy mindset right now.

I think the real shift is treating guardrails as a core part of the system, not an afterthought. The more critical the workflow, the more you need clear boundaries, validation, and fallback paths.

Agents are powerful, but without those controls, they can fail in very non-obvious ways.

Benjamin Nguyen • Apr 9

That is true!

leob • Apr 9 • Edited

Great write-up - and yes, I see AI as promising in this field ...

Ravi Teja Reddy Mandala • Apr 9

Appreciate that!

I’m bullish on AI here too, but I think we’re still in the phase where it works best as a “visibility layer” rather than a decision-maker.

It’s great at summarizing and surfacing signals, but the moment it has to reason across messy, real-world systems, things get interesting 😄

Have you tried using it in any production workflows yet?

leob • Apr 9 • Edited

Not personally, but I agree it can be really powerful (and a time saver) when used for certain things - and it can be a time and money drain for other things ... it all depends on WHEN and HOW you use it, and that's definitely an "evolving art" - people come up with clever ideas and approaches all the time ...

For instance, I just came across this article, which describes a simple but really brilliant approach (for software/app development, in this case):

boristane.com/blog/how-i-use-claud...

The main takeaway, for me, is how he's asking the AI to produce "artifacts" in the form of MD files to document the results - and he then goes into a loop, adding notes/comments, and asking the AI to refine the document based on that, etc - until he's satisfied ...

And using that approach he goes through a "research" => "plan" => "implement" pipeline, with each stage resulting in tangible 'artifacts' (the MD files), which also serve as the Context - rather than having messy/ephemeral/unstructured chat sessions with endless "prompting" and retrying, with the Context not really being tangible, and with the developer not really feeling 'in control' ...

Eye opener, in the category of "why didn't I think of that myself?" ;-)

Ravi Teja Reddy Mandala • Apr 9

This is a great example — thanks for sharing.

The “artifact-first” approach really resonates. I’ve seen similar patterns work well when the AI is forced to externalize its thinking instead of keeping everything in a chat loop. It makes the process auditable and, more importantly, gives humans something concrete to validate.

In incident workflows, I think this could translate into things like structured runbooks, timelines, or even evolving incident summaries rather than just transient responses.

The interesting challenge is making sure those artifacts stay grounded in real system context, otherwise they can still drift.

Curious — do you see this approach scaling well in more dynamic environments like on-call or incident response?

leob • Apr 9 • Edited

This is a brilliant way to put it:

"the AI is forced to externalize its thinking instead of keeping everything in a chat loop"

and this:

"It makes the process auditable"

and:

"do you see this approach scaling well in more dynamic environments like on-call or incident response"

Yes I do, in the ways you're hinting at ... this idea of 'artifacts', to broaden the idea: it's all about adding structure to your process, control, accountability, observability, etc ...

People are now starting to realize that the holy grail is not in "prompting harder" (and burning more tokens), you need a more disciplined/structured/intelligent approach ...

Ravi Teja Reddy Mandala • Apr 9

Really appreciate that — you captured it perfectly.

I completely agree, the shift is less about “better prompting” and more about building structured workflows around the AI. The moment you treat outputs as artifacts instead of responses, it changes how you design the entire system.

In my experience, that’s also where reliability starts to improve — you get traceability, iteration, and a clearer feedback loop.

The interesting next step is figuring out how to keep that structure lightweight enough for fast-moving environments like on-call, without slowing engineers down. That balance is where most systems either succeed or fall apart.

Benjamin Nguyen • Apr 19

Hi Ravi,

I was wondering if you want to connect with me on LinkedIn?

Ravi Teja Reddy Mandala • Apr 25

Hi Benjamin,

Absolutely, happy to connect. Here’s my LinkedIn:
linkedin.com/in/ravi-teja-reddy-ma...

Benjamin Nguyen • Apr 25

It sounds good!

Bhavin Sheth • Apr 9

Tried something similar in our setup — summarization was a big win, but prioritization was always off without real context. Feels like AI helps only when your incident workflow is already clean, otherwise it just makes the gaps more obvious.

Ravi Teja Reddy Mandala • Apr 9

100% agree with this. That was exactly my takeaway too.

The agent didn’t fail because it was “bad”, it failed because our workflow had hidden gaps that humans were compensating for without realizing.

Once those gaps showed up, the AI just amplified them instead of fixing them.

Curious, did you end up adding more structured context (like better incident metadata or runbooks) after that?