Hey dev.to π
This is my first post here.
I wanted to share something I built at work β a system I created in a single evening, completely by myself.
I built it because I saw an opportunity β and knew this domain really matters in my company.
The Context
Most SRE improvements come with more tooling, more dashboards, and more complexity.
I went the opposite direction.
No new system. No big infra changes. Just a different way of working.
During incidents, we kept asking:
Where is this happening most?
Is it tenant-specific?
Is it region-related?
Is this new or recurring?
The data existed β but the process to get answers was slow and inconsistent.
What I Built
What started as a Markdown file turned into something much bigger:
An AI-powered SRE teammate.
A system that:
- understands our architecture
- queries logs and metrics in real time
- searches past incidents and Runbooks
- and investigates production issues end-to-end
Like a senior engineer whoβs been here since day one β available 24/7.
At a Glance
- ~4 minutes to triage incidents
- End-to-end investigations from a single input
- Zero context switching between tools
- Live correlation between code, logs, and metrics
π Full article here: I Cut MTTR to 4 Minutes β My βSREβ Is a 619-Line Markdown File
Why Iβm Sharing This
This wasnβt meant to be a βbig solutionβ.
It was just:
βLetβs make on-call a bit less painfulβ
But it ended up having a real impact.
So I figured itβs worth sharing β and also getting feedback.
What Iβm Thinking About Next
I want to go deeper in the next posts.
A couple of directions Iβm considering:
1. Designing Deterministic Skills & Agents
How I built skills and agents that behave predictably β
so you can test, extend, and evolve them without breaking things.
- test at different layers
- extend with confidence
- avoid hidden regressions
2. New Ideas for Agents
Less about hype β more about:
- practical use cases
- where agents actually help
- and some practical methods Iβve found effective
Would Love Your Input
If any of this sounds interesting β let me know π
- What would you want me to dive into next?
- Have you tried something similar?
- Do your on-call shifts feel harder than they should be?
Thanks for reading βοΈ
Top comments (0)