I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)
What This Is
This is a walkthrough of one real system my OpenClaw agent runs every day: a self-healing PR monitoring daemon that watches 26+ pull requests across GitHub, detects changes in seconds, and even caught its own critical bug.
I'm submitting this to the OpenClaw Challenge - OpenClaw in Action.
The Problem
When you're contributing to 13 open-source repositories simultaneously, keeping track of PR status becomes a full-time job:
- Did a maintainer just leave review comments on that asyncapi PR?
- Did someone request changes on the n8n-as-code submission?
- Was that bounty PR merged while I wasn't looking?
- Is there a new comment I need to respond to?
GitHub's email notifications are unreliable (more on this later). The GitHub API exists but polling it manually for 26 PRs every few minutes isn't sustainable.
I needed something that:
- Monitors continuously β not when I remember to check
- Detects everything β comments, reviews, status changes, merges, closures
- Alerts immediately β not "sometime in the next few hours"
- Heals itself β if the monitor crashes, something should notice
The Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β OpenClaw Agent β
β β
β ββββββββββββ ββββββββββββββββββββββββ β
β β HeartbeatββββΆβ pr-monitor-v3.py β β
β β (cron) β β (every 5 minutes) β β
β ββββββββββββ ββββββββββ¬ββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββ β
β β State Files: β β
β β /tmp/pr-monitor-v3.log β β
β β /tmp/pr-monitor-state.json β β
β β /tmp/pr-monitor-pending-alerts.json β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββ βββββββββΌβββββββββ β
β βWatchdog βββββ Health Check β β
β β(every β β (every 30 min)β β
β β 30 min) β ββββββββββββββββββ β
β ββββββ¬ββββββ β
β β alert β
β βΌ β
β Feishu Notification βββ Heartbeat reads β
β state files β
βββββββββββββββββββββββββββββββββββββββββββββββ
Three components working together:
Component 1: pr-monitor-v3.py β The Scanner
A Python script using gh (GitHub CLI) to poll all tracked PRs:
# Core loop (simplified)
for pr in tracked_prs:
data = gh_api(f"repos/{pr['owner']}/{pr['repo']}/pulls/{pr['number']}")
old_state = load_state(pr['key'])
if detect_changes(data, old_state):
log_event(pr['key'], data, old_state)
save_state(pr['key'], data)
if is_important_change(data, old_state):
write_alert({
'type': change_type,
'pr': pr,
'data': data,
'timestamp': now_utc()
})
What it tracks per PR:
- State: open / closed / merged
- Review status: approved / changes_requested / commented
- Comment count + last comment timestamp
- Updated at timestamp
- Mergeable status
Runs via crontab every 5 minutes. Each run produces a log line with structured JSON.
Component 2: State Files β The Memory
Instead of a database (overkill for this), everything goes to flat JSON files:
// /tmp/pr-monitor-state.json β current snapshot of all PRs
{
"asyncapi/modelina#2518": {
"state": "open",
"review_status": null,
"comment_count": 0,
"last_updated": "2026-04-14T12:43:17Z",
"checked_at": "2026-04-17T13:05:00Z"
},
"memtomem/memtomem#130": {
"state": "merged",
"merged_at": "2026-04-15T22:34:41Z",
...
}
}
// /tmp/pr-monitor-pending-alerts.json β events waiting to be pushed
[
{
"pr": "EtienneLescot/n8n-as-code#328",
"type": "COMMENT_ADDED",
"delivered": false,
"payload": { ... }
}
]
The agent's heartbeat script reads pending-alerts.json, pushes notifications via Feishu, then marks them delivered: true.
Why JSON files? Simplicity. No database to set up, no migrations, easy to debug with cat. If something breaks, I can read the state file and understand exactly what happened.
Component 3: pr-monitor-watchdog.sh β The Safety Net
The most important lesson from this build: monitoring that isn't monitored is useless.
#!/bin/bash
# pr-monitor-watchdog.sh β verifies v3 is alive and healthy
LOG_FILE="/tmp/pr-monitor-v3.log"
STATE_FILE="/tmp/pr-monitor-state.json"
# Check 1: Is the log file being updated?
if [ ! -f "$LOG_FILE" ]; then
echo "[FAIL] Log file missing"
exit 1
fi
LAST_MOD=$(stat -c %Y "$LOG_FILE" 2>/dev/null || echo 0)
NOW=$(date +%s)
AGE=$((NOW - LAST_MOD))
if [ $AGE -gt 600 ]; then # 10 minutes without update = dead
echo "[FAIL] Log stale (${AGE}s old)"
# Try to restart or alert
exit 1
fi
# Check 2: Does state file have valid JSON?
python3 -c "import json; json.load(open('$STATE_FILE'))" 2>/dev/null
if [ $? -ne 0 ]; then
echo "[FAIL] State file corrupted"
exit 1
fi
# Check 3: Are we tracking expected number of PRs?
PR_COUNT=$(python3 -c "import json; d=json.load(open('$STATE_FILE')); print(len(d))")
if [ $PR_COUNT -lt 20 ]; then # Should have 25+
echo "[WARN] Only tracking ${PR_COUNT} PRs (expected 25+)"
fi
echo "[OK] v3 healthy β ${PR_COUNT} PRs tracked, state age ${AGE}s"
The watchdog runs every 30 minutes via crontab. If it detects failure β immediate alert to Feishu.
How It All Connects (The OpenClaw Part)
Here's where OpenClaw ties everything together. The agent's HEARTBEAT.md contains instructions like:
PR Monitoring (v3 + watchdog)
Every heartbeat must execute:
- Run watchdog:
bash scripts/pr-monitor-watchdog.sh- If FAIL β fix + notify commander
- If OK + pending alerts exist β notify commander
- Check
/tmp/pr-monitor-pending-alerts.jsonfor urgent events
So every ~30 minutes, the agent wakes up and:
- Runs the watchdog β confirms scanner is healthy
- Reads pending alerts β pushes anything new to me
- Goes back to sleep
No constant polling by the main agent. The scanner does the heavy lifting, the agent just reads results. Clean separation of concerns.
The Bug That Proved Why We Need This
Here's the ironic part. The v2 version of this monitoring script had a bug:
# BUG: .ends is not a valid Python string method
if filename.endswith(".py"):
process_file(filename)
else if filename.ends(".js"): # β CRASHES EVERY TIME
process_js_file(filename)
.ends() doesn't exist in Python. It's .endswith().
This script was supposed to be running for 18 days. It crashed on the very first run and never executed successfully once.
How did I find out? Only when I built the v3 version with the watchdog and the watchdog reported: "[FAIL] Log file missing."
Before that? I assumed it was working because "I deployed it and didn't get any errors." Classic case of monitoring that doesn't work giving you false confidence.
The fix was trivial (one character). But finding it took 18 days without proper observability.
What I'd Do Differently
Looking back after 3 months of running this:
| Decision | Would I Change It? | Why |
|---|---|---|
| JSON files instead of DB | No | Simple, debuggable, zero maintenance |
| Python + gh CLI | Maybe | Works well but a Node.js version would integrate tighter with the rest of the stack |
| 5-minute poll interval | Yes | Could use GitHub webhooks for instant push; polling wastes API calls |
| Separate watchdog script | No | Best decision made. Catches exactly the class of bugs that matter |
| Flat alert JSON file | Yes | Would use a small queue (Redis/SQLite) for reliability if the agent misses a heartbeat |
Live Stats (Right Now)
As of writing this, the system is tracking:
- 26 pull requests across repositories including asyncapi/modelina, n8n-as-code, memtomem, claude-builders-bounty, and others
- 1 merged (memtomem#130 β $100 bounty) β
- 1 closed (pgmpy#3323 β maintainer rejected, lesson learned)
- 24 open and waiting for review feedback
- Scanner uptime: Healthy (watchdog confirmed < 10 min ago)
Try It Yourself
The core pattern here β scan β compare state β emit events β consume events β is applicable way beyond PR monitoring. You could use the same architecture for:
- Issue triage (new issues matching your criteria)
- Dependency security alerts (new CVEs in your deps)
- Competitor monitoring (changes to their repos/docs)
- Your own product's issue tracker
The key insight: let the dumb script do the dumb work, and let your agent make the smart decisions.
Built with OpenClaw. Running in production. Catching its own bugs since 2026.
Questions? Drop a comment. Happy to share more details about the implementation.
Top comments (0)