Ravi Teja Reddy Mandala

Posted on Mar 19

What Actually Happens When You Put an AI Agent on Call

#ai #sre #agents #programming

AI agent assisting with real-time production incident response

Everyone is talking about AI agents.

But very few are actually using them in real production workflows.

And almost no one talks about what really happens when they are put on call.

Writing code.

Reviewing pull requests.

Answering questions.

Automating workflows.

That part is easy.

But production is different.

It is noisy.

It is unpredictable.

And it does not forgive mistakes.

So I kept thinking about one question:

What actually happens when an AI agent becomes part of an on-call workflow?

Not in a demo.

Not in a toy setup.

But in real systems.

Why this matters

Modern production systems generate too much information during an incident.

A single issue can create:

alerts from multiple services
spikes in logs and metrics
duplicate symptom reports
confusion around root cause

This is exactly where an AI agent looks useful.

In theory, it can:

summarize the incident
correlate alerts
scan logs
suggest likely causes
recommend next actions

That sounds great.

But the real value is not in replacing the engineer.

It is in reducing the time spent navigating noise.

Where AI actually helps

After thinking through how an AI agent fits into on-call, I see four areas where it can be genuinely useful.

1. Incident summarization

During an active issue, the first problem is usually information overload.

An AI agent can quickly turn scattered signals into something readable:

what changed
which services are affected
when the issue started
what symptoms are most visible

That alone can save valuable time.

2. Log and alert correlation

A human usually jumps between dashboards, logs, alerts, and deployment history.

An AI agent can act like a first-pass investigator:

group similar errors
detect repeated patterns
connect alerts across services
highlight suspicious deployments or config changes

This does not replace debugging.

But it gives the engineer a much better starting point.

3. Runbook guidance

During an incident, even experienced engineers forget details.

An AI agent can help by pulling the most relevant runbook steps:

known mitigation paths
rollback instructions
common checks
escalation conditions

This is especially useful when incidents happen outside normal working hours.

4. Post-incident support

AI can also help after the issue is resolved:

summarize the timeline
draft incident notes
organize contributing factors
prepare a clean starting point for postmortem review

That reduces operational overhead and improves documentation quality.

Where people get it wrong

The biggest mistake is expecting AI to function like an autonomous incident commander.

That is risky.

Production systems are full of edge cases, hidden dependencies, and partial signals.

An AI agent may sound confident while still being wrong.

A safer way to use it

If you are introducing an AI agent into an on-call workflow, boundaries matter.

summarize, not blindly decide
recommend, not auto-execute
explain reasoning before suggesting action
stay within approved operational limits
escalate to humans for risky changes

This is where AI becomes useful without becoming dangerous.

Final thought

AI in production is not about intelligence.

It is about control, constraints, and safety.

The teams that benefit most will not be the ones that hand over control too early.

They will be the ones that use AI to make engineers faster, clearer, and more informed.

👇 Curious

Would you trust an AI agent to take action in production?

Or should it stay as a recommendation layer?

Top comments (2)

Bhavin Sheth • Mar 19

This matches what I’ve seen too.

AI is great at cutting through noise and giving a quick starting point, but the moment it tries to act, things get risky fast. Production always has edge cases you didn’t expect.

For me, it’s most useful as a “smart assistant” during incidents — summarize, suggest, guide — but final decisions should stay with humans.

Ravi Teja Reddy Mandala • Mar 27

Exactly, well said.

That “assist vs act” boundary is where things get critical in production. AI works great as a guide, but without context and guardrails, acting directly can get risky fast.

I’ve seen the most value during incidents as a smart assistant too 👍