DEV Community

Cover image for The On-Call Schedule Math Nobody Does
Samson Tanimawo
Samson Tanimawo

Posted on

The On-Call Schedule Math Nobody Does

Most on-call schedules are designed in a slack thread, in 20 minutes, by whoever drew the short straw. Then the team lives with it for years. The math is almost never done, and the result is the same in every company: a few engineers burn out and quit, and management is surprised.

Here's the math, in the order it matters.

Page volume per engineer per week

Add up every page your team got last quarter. Divide by 13 (weeks per quarter). Divide again by the number of engineers in the rotation.

If the answer is more than 3 pages per engineer per week, you have a burnout problem. The exact number doesn't matter; the trajectory does. If it's growing quarter over quarter, your team is going to lose people.

This is the only metric that matters. Everything else (alert ratio, MTTR, false positive rate) is a contributing factor to this number.

Off-hours fraction

What fraction of pages happen outside business hours (let's say 8 PM to 8 AM local)? If it's more than 30%, your on-call is significantly worse than the day shift, and your compensation should reflect that.

Most teams don't compensate for night pages at all. This is fine when night pages are rare. When they're 40% of all pages, you've quietly converted on-call into a second job with no extra pay.

Rotation length

The default 7-day rotation is wrong for most teams. Here's the trade-off:

  • Short rotations (1-3 days): hard to context-switch into on-call mode, but the load is bounded.
  • Long rotations (7+ days): easier to settle in, but if it's a bad week you suffer for 7 days straight.

For a team with high page volume, shorter rotations are kinder. For a team with low page volume, longer rotations have less context-switch overhead. The crossover point is roughly 1 page per day. If you're paging more than that, switch to shorter rotations.

Coverage gaps

Holidays, conferences, vacations. Most schedules silently break during these. Run through your next 90 days, marking every day where coverage is at risk. Fix the gaps before they become incidents.

Bonus check: when an engineer leaves the company, do you have at least three people who can cover their on-call role? If not, that's a single point of failure waiting to bite you.

The kindness moves

Most teams don't do these. Most should:

  • No on-call the week before vacation. The engineer is already half-checked-out.
  • No on-call the week after a major release if you led it. You've already done enough.
  • On-call buddies for juniors. A senior is "shadow on-call" available for escalation but not primary. The junior learns by doing.
  • Pager-free Friday afternoon. Either nobody's on-call after 3 PM Friday, or it's a dedicated junior shift. Weekends will arrive in time.

The math is the floor. The kindness is what makes the math sustainable.

Top comments (0)