DEV Community

Cover image for The Reliability Roadmap: A 90-Day Plan for New SRE Teams
Samson Tanimawo
Samson Tanimawo

Posted on

The Reliability Roadmap: A 90-Day Plan for New SRE Teams

New SRE team at your company? Here's a 90-day plan I've used twice. It works because it balances 'show immediate value' with 'build for the long term.'

Days 1-14: Observe

Resist the urge to change things. Watch the current system, read existing post-mortems, shadow on-call, talk to engineers about their pain.

Output: a list of the top 5 reliability problems ranked by 'engineering time lost per week.'

Days 15-30: Quick wins

Pick the top 2 from your list. Fix them. Make the fixes visible announce them in eng all-hands.

Good quick wins: delete a flaky alert, automate a repetitive runbook, fix a broken dashboard everyone complains about.

Bad quick wins: rewrite the deployment pipeline. Too big, takes 90 days alone.

Output: visible reliability improvements + trust from engineering teams.

Days 31-60: Foundations

Now use your trust. Introduce the boring stuff:

  • Define SLOs for the top 3 critical services
  • Set up an error budget dashboard
  • Establish a weekly reliability review (10 minutes, not an hour)
  • Write an incident response runbook template

Output: measurable reliability targets that engineering can rally around.

Days 61-90: Programmatic change

Start turning the reliability work into ongoing programs:

  • Post-mortem process with action-item tracking
  • Monthly toil survey (what did engineers do this month that could've been automated?)
  • Quarterly reliability review with leadership
  • A clear hand-off process: when does reliability work become product engineering work?

Output: processes that continue working when you take a vacation.

The trap

'This is fine, we can do all this in the first week.' You cannot. Every team I've seen that tried it got burned out or resented. 90 days is the minimum. More is normal.

The real goal

At day 91, the engineering team should be able to describe your team's value in one sentence. If they can't, you spent 90 days on the wrong things.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)