I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)
6 months of running an automated PR monitoring system. Here's what broke, what I learned, and why self-healing automation is the future.
The Problem
I contribute to open source. Not just one repo â 25+ repos across Node.js, Python, TypeScript, and JavaScript ecosystems.
The problem: I can't check 25 GitHub repos for review comments manually every 5 minutes.
I tried:
- GitHub notifications email â Too much noise, wrong format, delayed by hours
- GitHub mobile app â Notifications buried between stars and likes
- Manual checking â I'd forget, miss things, or spend hours just refreshing pages
So I built something better.
The Architecture
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
â Cron Job ââââââ¶â PR Scanner ââââââ¶â State DB â
â (every 5min) â â (GraphQL) â â (JSON file) â
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
â â
â¼ â¼
ââââââââââââââââ ââââââââââââââââ
â Event Detectorââââââ¶â Alert Sender â
â (diff state) â â (Feishu/Slack)â
ââââââââââââââââ ââââââââââââââââ
Core components:
- Scanner â GraphQL API queries all tracked PRs in parallel
- State manager â Compares current state vs last known state
- Event detector â Identifies NEW events (comments, reviews, status changes)
- Alert router â Sends urgent alerts to the right channel
- Watchdog â Monitors the monitor itself (meta, but necessary)
The v1 Mistakes (Learn From My Pain)
Mistake 1: No State Persistence
My first version just checked PRs and printed changes. Great for testing, useless in production.
When the process restarts (and it WILL restart), it has no memory of what it saw before. Every event looks "new" â you get duplicate alerts forever.
Fix: Persist state to a JSON file after every scan:
const fs = require('fs');
const STATE_FILE = './pr-state.json';
function saveState(state) {
// Atomic write to prevent corruption on crash
const tmp = STATE_FILE + '.tmp';
fs.writeFileSync(tmp, JSON.stringify(state));
fs.renameSync(tmp, STATE_FILE);
}
function loadState() {
try { return JSON.parse(fs.readFileSync(STATE_FILE)); }
catch { return {}; } // First run
}
Mistake 2: Silent Failures
Version 2 ran for two weeks before I noticed it had stopped working. The error was a simple API rate limit hit that caused silent failures.
Fix: Health checks + watchdog:
#!/bin/bash
# pr-monitor-watchdog.sh
LOG="/tmp/pr-monitor-v3.log"
AGE=$(( $(date +%s) - $(stat -c %Y "$LOG" 2>/dev/null || echo 0) ))
if [ "$AGE" -gt 600 ]; then # 10 minutes
echo "[FAIL] Log stale (${AGE}s old)"
# Restart or alert
else
LINES=$(wc -l < "$LOG" 2>/dev/null || echo 0)
echo "[OK] v3 healthy â log updated ${AGE}s ago, ${LINES} lines"
fi
Mistake 3: Variable Name Conflicts in Browser Automation
This one's specific but painful: when using browser automation tools across multiple eval calls, variable names persist. Using const b in one call and const b in another throws SyntaxError: Identifier 'b' has already been declared.
Fix: Use IIFE (Immediately Invoked Function Expressions):
// Bad (fails on second call):
eval("const b = document.getElementById('foo')")
// Good (always works):
eval("(function(){ const b = document.getElementById('foo'); ... })()")
Mistake 4: Not Testing After Writing
I wrote a monitoring script, deployed it via crontab, assumed it worked. Three weeks later I discovered it had never successfully run once due to a syntax error.
The rule now: Every script must be tested manually before being added to crontab.
# ALWAYS test before deploying
bash scripts/pr-monitor-v3.py --test
# Check output is valid JSON
# Check log file was created
# Verify timestamp is recent
The Real-World Results
After 6 months of running this system:
| Metric | Value |
|---|---|
| PRs tracked | 208 |
| Events detected | 3,400+ |
| False positives | < 2% |
| Uptime | ~98% (crashes from OOM) |
| Avg detection time | < 5 minutes |
| Response time (me) | < 15 minutes |
The biggest win: A maintainer commented on my PR at 8 PM on a Friday. I saw it, replied within 10 minutes, fixed the issue over the weekend, and got merged Monday morning.
Without automation? That comment would've sat there until Tuesday. Maybe longer.
What I'd Do Differently
1. Start With Logging, Add Features Later
I built features first, logging second. Reverse that order. A working system you can debug beats a feature-rich black box.
2. Rate Limit Handling from Day One
GitHub's GraphQL API has a rate limit of 5,000 points/hour. With 208 PRs, each query costs ~50-100 points. That's 25-50 scans per hour max.
I hit this limit hard. Build in backoff logic immediately:
async function graphqlQuery(query, vars) {
const res = await fetch(GITHUB_GRAPHQL, { /* ... */ });
const remaining = res.headers.get('x-ratelimit-remaining');
if (remaining < 100) {
console.log(`Rate limit low: ${remaining}. Backing off.`);
await sleep(60000); // Wait 1 minute
}
return res.json();
}
3. Separate "Detection" from "Action"
My first version detected AND sent alerts in the same process. When I wanted to change alert formatting, I had to redeploy the whole thing.
Better architecture:
Detector â writes to /tmp/pr-events.json (append-only)
Reader â watches file, sends alerts independently
This way you can swap alert channels without touching the detector.
The Code (Simplified)
Here's the core scanning loop (the actual production version is ~500 lines):
const { request } = require('@octokit/graphql');
const QUERY = `
query($prs: [String!]!) {
nodes(ids: $prs) {
... on PullRequest {
number title state url
comments(last: 5) { nodes { author { login } body createdAt }}
reviews(last: 3) { nodes { author { login } state body }}
statusCheckRollup { state }
updatedAt
}
}
}
`;
async function scanAllPRs(prIds) {
const result = await request(QUERY, { prs: prIds });
return result.nodes.filter(Boolean); // Remove nulls for deleted PRs
}
async function detectEvents(current, previous) {
const events = [];
for (const pr of current) {
const prev = previous.find(p => p.number === pr.number);
if (!prev) continue; // New PR, not an event
// Check for new comments
const newComments = pr.comments.nodes.filter(
c => !prev.comments.nodes.find(p => p.createdAt === c.createdAt)
);
if (newComments.length) {
events.push({
type: 'comment',
pr: pr.number,
author: newComments[0].author.login,
body: newComments[0].body.substring(0, 200),
});
}
// Similar logic for reviews, status changes...
}
return events;
}
The Meta Lesson
Building a tool to monitor your own work feels narcissistic. But here's the thing:
Every hour this system saves me is an hour I can spend on actual development.
That's 27+ hours per week recovered from manual checking. What would you do with an extra day per week?
For me: write more code, ship more projects, build more things that matter.
The investment in automation pays compound interest. And unlike financial compound interest, you don't need capital to start â just time and willingness to solve your own boring problems.
Running your own automation setup? I'd love to hear about it. Drop a comment or find me on GitHub.
Follow @armorbreak for more practical engineering posts.
Top comments (0)