The Post-Mortem Nobody Learns From
I've sat through hundreds of post-mortems. Most follow the same pattern: something breaks, someone writes a Google Doc, we have a meeting, we list action items, nobody follows up, the same thing happens again in 3 months.
Here's how to break the cycle.
The Blameless Culture Trap
"Blameless" doesn't mean "actionless." The biggest failure mode I see is teams that use blameless culture as an excuse to avoid accountability.
Blameless means: we don't punish the person who pushed the bad deploy.
Blameless does NOT mean: nobody is responsible for fixing the systemic issue.
My Post-Mortem Template
# Incident: [SERVICE] [SYMPTOM] on [DATE]
## Impact
- Duration: X minutes
- Users affected: N
- Revenue impact: $X
- SLO budget consumed: X%
## Timeline (UTC)
- HH:MM - First alert fired
- HH:MM - On-call acknowledged
- HH:MM - Root cause identified
- HH:MM - Fix deployed
- HH:MM - Service recovered
- HH:MM - All-clear declared
## Root Cause
[2-3 sentences. Technical but readable.]
## Contributing Factors
1. [Factor that made the incident possible]
2. [Factor that made detection slow]
3. [Factor that made resolution slow]
## What Went Well
- [Something that worked]
- [Something that helped]
## What Went Wrong
- [Process failure]
- [Technical gap]
## Action Items
| Action | Owner | Priority | Due Date | Status |
|--------|-------|----------|----------|--------|
| ... | ... | P1/P2/P3 | ... | Open |
## Lessons Learned
[1-2 paragraphs of genuine insight]
The Action Item Problem
Action items from post-mortems have a 30% completion rate industry-wide. That's terrible. Here's why:
- Too many items (I've seen post-mortems with 15 action items)
- No clear ownership
- No deadline
- No follow-up mechanism
- Competing with feature work
The Fix: Three Rules
Rule 1: Maximum 3 action items per post-mortem.
If you can't narrow it to 3, you haven't identified the real problems.
Rule 2: Every action item gets a JIRA ticket linked to the next sprint.
Not "someday." Not "backlog." Next sprint. If it's not important enough for next sprint, it's not an action item.
Rule 3: Review completion in the next post-mortem.
Start every post-mortem meeting by reviewing open action items from previous incidents. This creates accountability without blame.
# Post-mortem meeting agenda
1. Review open action items (10 min)
- Incident #42: "Add circuit breaker" — DONE
- Incident #43: "Add canary deploys" — IN PROGRESS (blocked on CI)
- Incident #44: "Fix retry logic" — NOT STARTED (reassigning)
2. Current incident review (30 min)
- Timeline walkthrough
- Contributing factors
- Action items (max 3)
3. Pattern analysis (10 min)
- Any recurring themes?
- Systemic issues to address?
The Metric That Matters
Track Repeat Incident Rate: what percentage of incidents have the same root cause as a previous incident?
When we started tracking this, our repeat rate was 45%. After implementing the three rules above, it dropped to 12% over six months.
That's the real measure of whether your post-mortems are working.
If you're looking for better incident learning loops and pattern detection across your post-mortems, check out what we're building at Nova AI Ops.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)