You write the postmortem. You file the action items. Everyone nods, the doc gets archived, and life moves on.
Six months later, the exact same root cause takes down the exact same service — and nobody in the room remembers the first incident, let alone that its fix never actually shipped.
"We use rootly to track this automatically. It flags when incidents have the same root cause as previous ones."
That's a real answer from an SRE thread about this exact problem — and it's a paid, hosted feature of a full incident-management platform. Most teams don't have rootly or incident.io. What they have is a folder of markdown postmortems that nobody diffs against each other.
So I built rootecho: a zero-dependency CLI that does the one useful thing those platforms do for this — flag when a new incident's root cause echoes a past one, and show you whether that past incident's action items ever actually got finished.
How it works
Each postmortem is one JSON record — free-text root_cause and/or curated root_cause_tags, plus action_items with a status:
{
"id": "INC-2026-014",
"title": "Payment webhook retries exhausted",
"root_cause": "webhook retry queue misconfigured to drop after 3 attempts, no dead-letter fallback",
"root_cause_tags": ["webhook", "retry-queue", "dead-letter", "config"],
"action_items": [
{ "id": "AI-1", "description": "Add dead-letter queue for webhook retries", "owner": "alice", "status": "open" }
]
}
rootecho add records it and compares against your history:
$ rootecho add inc-2026-014.json
⚠ root cause echo detected for "INC-2026-014":
INC-2026-003 (2026-03-15) — 100% similar root cause
Payment webhook retries exhausted
✓ Add retry backoff [done]
✗ Add monitoring alert for queue depth [open] — 93d overdue
→ 1 action item(s) from this past incident were never finished.
recorded to .rootecho/history.jsonl
That's the whole point of the tool in one output: not just "you've seen this before," but "and here's the fix that never happened."
rootecho check does the same comparison without recording — exit code 1 on an echo, so you can wire it into CI as a gate on the PR that closes out a postmortem.
Why not just grep the old postmortems?
Because "webhook retry queue misconfigured" and "retry queue drops webhooks after repeated failures" are the same root cause in different words, and grep doesn't know that. rootecho scores similarity with Jaccard overlap — tags first (curated, low-noise, weighted 70%), free text as a fallback/secondary signal (30%) — no ML dependency, no network call, runs in milliseconds on a folder of JSON files.
Design notes for the technical reader
-
Storage is project-local, not
~/.rootecho. History lives in.rootecho/history.jsonlin your repo, one JSON object per line, meant to be committed — sogit blame/git logon that file doubles as an incident timeline the whole team shares, andgit diffon it during a PR review shows exactly what changed. -
Zero dependencies, dual-language. A Node build (
npx rootecho) and a Python build (pip install rootecho) exist because teams aren't single-language, and they read/write the exact same history file — down to byte-identical--jsonoutput, which took more care than I expected (Python'sjson.dumpsescapes non-ASCII by default and prints whole floats as1.0; JS'sDate.parseand Python'sdatetime.fromisoformataccept different date grammars depending on Python version. All ironed out — timestamps are epoch milliseconds on both sides, dates are a hand-rolled strict ISO 8601 parser shared in spirit between both implementations instead of trusting either language's built-in leniency). -
Corrupt data degrades, it doesn't crash. A JSONL file meant to be hand-edited and merged by a team will occasionally get a stray
nullline from a bad merge. Both CLIs skip malformed entries instead of taking downadd/check/listfor the whole team.
Install
npx rootecho init incident.json # scaffold a postmortem
npx rootecho add incident.json # record it, flag any echo
pip install rootecho # or the Python build
MIT licensed, ~500 lines total, no dependencies in either language.
- npm: https://www.npmjs.com/package/rootecho
- PyPI: https://pypi.org/project/rootecho/
- GitHub (Node): https://github.com/jjdoor/rootecho
- GitHub (Python): https://github.com/jjdoor/rootecho-py
Does your team already have a home-grown way of catching repeat root causes — a spreadsheet, a Slack bot, a wiki convention? I'd like to hear what's actually working (or not) before I add anything past this MVP.
Top comments (0)