DEV Community

Cover image for I Replaced My On-Call Runbook with AI — Here’s What Happened in Production

I Replaced My On-Call Runbook with AI — Here’s What Happened in Production

Ravi Teja Reddy Mandala on March 12, 2026

Last month I tried something risky. Instead of waking up at 3AM to debug production incidents, I experimented with an AI assistant handling the fi...
Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen • Edited

Nice! I agree with your project. AI agent is great to create coding these days. Companies should create an application to isolate the malicious attack and isolate the incident automatically . We need to install guardrails regarding AI. My sentinel project is early concept for cybersecurity where the application detects the malicious attack and run all day long.

Collapse
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

Thanks, Benjamin, really appreciate this!

Totally agree on guardrails. That’s actually one of the biggest gaps I noticed early on without constraints, AI tends to over-suggest or miss critical signals.

Your point about isolating malicious activity is interesting, especially integrating incident triage with security detection. In my setup, I focused more on reliability signals (timeouts, retries, missing observability), but combining that with security signals would make it much more powerful.

Curious, how are you handling false positives in your sentinel project? That’s been one of the tougher challenges on my side.

Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Nice! Yes, I did.

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

Nice! Curious to hear what worked well for you vs. where it struggled.

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

The main issue I faced was how quickly Gemini 3 Flash runs out of tokens when working on a previous project. It did not happen with my current project (Sentinel).

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

That’s interesting. I’ve seen similar behavior with token limits depending on context size and prompt patterns.

In my case, breaking workflows into smaller steps and adding retrieval (instead of passing full context each time) helped a lot with token efficiency and consistency.

Curious, did you change anything in your architecture between the previous project and Sentinel, or do you think it’s mostly model-related?

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Interesting! I should clarify something about my Arctic AI project from January. I ended up restructuring the system and switching models because Gemini 3 Flash kept running out of tokens for the workload. For that project, I moved back to Gemini 2.5 Flash, which handled the token demands much better. I heard stories from other people on Dev.to. They had issues with Gemini 3 flash concerning the tokens.

What’s funny is that I never had any issues with my system or with my Sentinel projects when using Gemini 3 Flash—only the Arctic project pushed it past its limits.

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

That makes sense, it sounds like your Arctic workload was hitting the upper bounds of context and chaining more aggressively.

I have seen similar patterns where certain use cases, especially long reasoning chains or heavy context stitching, expose limits that do not show up in typical flows like Sentinel-type systems.

In those cases, moving to a hybrid approach with retrieval, tighter prompt windows, and step-wise execution usually stabilizes things much more than just switching models.

Curious, was the Arctic system doing more multi-step reasoning or large context aggregation compared to Sentinel?

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Yes, it was! Gemini 3 Flash was pulling information from three to five websites to generate a summary. I had to correct a mistake in the code, and after refreshing Gemini 3 Flash, it reported that the model had run out of tokens. That’s why I switched back to Gemini 2.5 Flash

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

I am cautious with all of the new Gemini 3 models.

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

Makes sense, that multi-source aggregation can hit token limits pretty quickly.

I’ve seen similar behavior when agents try to compress too much context into a single pass. Breaking it into smaller steps or adding retrieval usually helps stabilize things.

Curious how Gemini 2.5 Flash is handling that workload for you now 👍

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Nice! I borrow the expression night and day regarding gemini 2.5 flash. I never had any issues concerning this model (2.5 flash). I never run out of token with gemini 2.5. flash

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

That’s great to hear, sounds like a much more stable setup 👍

Yeah, 2.5 Flash seems to handle context and token management much better. Curious if you’re still doing multi-source aggregation the same way, or if the model just handles it more efficiently now.

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Honestly, I have not use gemini 2.5 flash anymore but it handle better and efficient for multi-source compare to gemini 3 flash.

Thread Thread
 
ravi_teja_8b63d9205dc7a13 profile image
Ravi Teja Reddy Mandala

Got it, that makes sense 👍

Yeah, I’ve noticed similar trade-offs between the newer models and stability. Always interesting to see how different setups behave in real use cases.

Thread Thread
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

yeah!