The first 10–20 minutes after a deploy used to be the most stressful part of my day — not because I lacked monitoring, but because I had too many alerts and no clear decision.
So I built a simple workflow: a 15-minute decision window.
The idea
After each deploy, evaluate in 3 checkpoints:
- 5m
- 10m
- 15m
Then output one decision:
Proceed / Review / Rollback
What signals matter (not raw noise)
Instead of “error count went up”, I track deploy-relevant buckets:
- Regressions (existing errors got worse)
- New error types (new fingerprints)
- Severity-weighted growth (critical > warning)
- Concentration shift (one service dominating failures)
- Trend across windows (5m → 10m → 15m)
Confidence is the anti-spam lever
Confidence isn’t “how accurate the score is” — it’s data quality:
- baseline volume enough?
- window samples enough?
- coverage partial?
- too low volume to trust?
If confidence is LOW, don’t shout. Say “not enough evidence”.
Question
How do you handle the first 15 minutes after deploy without drowning in alerts?
(If you want context, here’s the tool I’m building around this workflow: https://app.relivio.dev)
Top comments (0)