The Rotation Nobody Wants
Our on-call rotation was a spreadsheet. Updated manually. Someone always got scheduled during their vacation. Two people occasionally got double-booked. Holidays were a battleground.
Designing Fair Rotations
Principle 1: Equal Burden Distribution
Track total on-call hours, not just shift count:
def calculate_oncall_burden(engineer, period_days=90):
shifts = get_shifts(engineer, period_days)
return {
'total_hours': sum(s.duration_hours for s in shifts),
'weekend_hours': sum(s.duration_hours for s in shifts if s.is_weekend),
'holiday_hours': sum(s.duration_hours for s in shifts if s.is_holiday),
'night_hours': sum(s.duration_hours for s in shifts if s.is_night),
'pages_received': sum(s.page_count for s in shifts),
'burden_score': calculate_weighted_score(shifts)
}
def calculate_weighted_score(shifts):
"""Weight different types of on-call differently."""
score = 0
for s in shifts:
base = s.duration_hours
if s.is_weekend: base *= 1.5
if s.is_holiday: base *= 2.0
if s.is_night: base *= 1.3
score += base
return round(score, 1)
Principle 2: Respect Preferences
onCall_preferences:
alice:
blackout_dates: ["2024-03-25", "2024-04-01:2024-04-05"] # Vacation
preferred_days: ["Mon", "Tue", "Wed"] # Family on weekends
max_consecutive_days: 3
bob:
blackout_dates: ["2024-04-10"]
preferred_days: ["any"]
max_consecutive_days: 7
prefers_weekends: true # Weekend differential pay
Principle 3: Minimum Pool Size
The math on sustainable rotations:
Pool size Frequency Burnout risk
3 people 1 week on / 2 off HIGH unsustainable
4 people 1 week on / 3 off MEDIUM barely okay
5 people 1 week on / 4 off LOW comfortable
6+ people 1 week on / 5+ off MINIMAL ideal
Rule: Minimum 5 people per rotation.
If you have fewer, reduce on-call scope or hire.
Principle 4: Escalation Tiers
escalation_chain:
tier_1: # Primary on-call
response_time: 5 minutes
scope: all pages
tier_2: # Secondary on-call (backup)
response_time: 15 minutes
scope: escalated or unacknowledged
tier_3: # Engineering manager
response_time: 30 minutes
scope: P1 only or when both T1+T2 unavailable
tier_4: # CTO/VP Engineering
response_time: 60 minutes
scope: Extended P1 (>1 hour), customer escalation
The Override System
Life happens. Make swaps easy:
def request_swap(requesting_engineer, target_date, volunteer=None):
"""Allow easy on-call swaps."""
if volunteer:
# Direct swap: Alice asks Bob to cover
execute_swap(requesting_engineer, volunteer, target_date)
notify_team(f"{requesting_engineer} swapped with {volunteer} for {target_date}")
else:
# Open request: Alice needs coverage, anyone can take it
post_to_channel(
f"{requesting_engineer} needs coverage for {target_date}. "
f"Reply to volunteer. Comp: standard on-call rate."
)
# Key: NO manager approval needed for swaps
# This reduces friction dramatically
Holiday Fairness
The holiday rotation is separate and tracked year-over-year:
holidays_2024 = [
'New Years', 'MLK Day', 'Presidents Day', 'Memorial Day',
'July 4th', 'Labor Day', 'Thanksgiving', 'Christmas'
]
def assign_holidays(team, year):
# Get historical holiday assignments
history = get_holiday_history(team, years=3)
# Sort by who has covered the FEWEST holidays recently
sorted_team = sorted(team, key=lambda e: history.get(e, 0))
assignments = {}
for i, holiday in enumerate(holidays_2024):
engineer = sorted_team[i % len(sorted_team)]
assignments[holiday] = engineer
return assignments
Metrics We Track
| Metric | Target | Current |
|---|---|---|
| Burden score variance | < 15% | 8% |
| Swap request fulfillment | > 95% | 98% |
| Pages per shift (average) | < 3 | 1.8 |
| NPS for on-call experience | > 0 | +32 |
| Holiday coverage fairness | < 1 shift variance | 0.5 |
If you want AI-powered on-call scheduling that optimizes for fairness automatically, check out what we're building at Nova AI Ops.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)