Alex Chen

Posted on Apr 17

I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)

#openclawchallenge #openclaw #devops #devchallenge

OpenClaw Challenge Submission 🦞

I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)

What This Is

This is a walkthrough of one real system my OpenClaw agent runs every day: a self-healing PR monitoring daemon that watches 26+ pull requests across GitHub, detects changes in seconds, and even caught its own critical bug.

I'm submitting this to the OpenClaw Challenge - OpenClaw in Action.

The Problem

When you're contributing to 13 open-source repositories simultaneously, keeping track of PR status becomes a full-time job:

Did a maintainer just leave review comments on that asyncapi PR?
Did someone request changes on the n8n-as-code submission?
Was that bounty PR merged while I wasn't looking?
Is there a new comment I need to respond to?

GitHub's email notifications are unreliable (more on this later). The GitHub API exists but polling it manually for 26 PRs every few minutes isn't sustainable.

I needed something that:

Monitors continuously — not when I remember to check
Detects everything — comments, reviews, status changes, merges, closures
Alerts immediately — not "sometime in the next few hours"
Heals itself — if the monitor crashes, something should notice

The Architecture

┌─────────────────────────────────────────────┐
│              OpenClaw Agent                  │
│                                             │
│  ┌──────────┐   ┌──────────────────────┐    │
│  │ Heartbeat│──▶│  pr-monitor-v3.py    │    │
│  │ (cron)   │   │  (every 5 minutes)   │    │
│  └──────────┘   └────────┬─────────────┘    │
│                          │                  │
│                          ▼                  │
│  ┌──────────────────────────────────────┐   │
│  │  State Files:                        │   │
│  │  /tmp/pr-monitor-v3.log              │   │
│  │  /tmp/pr-monitor-state.json          │   │
│  │  /tmp/pr-monitor-pending-alerts.json │   │
│  └──────────────────────────────────────┘   │
│                          │                  │
│  ┌──────────┐   ┌───────▼────────┐        │
│  │Watchdog  │◀──│  Health Check  │        │
│  │(every    │   │  (every 30 min)│        │
│  │ 30 min)  │   └────────────────┘        │
│  └────┬─────┘                            │
│       │ alert                            │
│       ▼                                  │
│  Feishu Notification ◀── Heartbeat reads │
│       state files                        │
└─────────────────────────────────────────────┘

Three components working together:

Component 1: pr-monitor-v3.py — The Scanner

A Python script using gh (GitHub CLI) to poll all tracked PRs:

# Core loop (simplified)
for pr in tracked_prs:
    data = gh_api(f"repos/{pr['owner']}/{pr['repo']}/pulls/{pr['number']}")
    old_state = load_state(pr['key'])

    if detect_changes(data, old_state):
        log_event(pr['key'], data, old_state)
        save_state(pr['key'], data)

        if is_important_change(data, old_state):
            write_alert({
                'type': change_type,
                'pr': pr,
                'data': data,
                'timestamp': now_utc()
            })

What it tracks per PR:

State: open / closed / merged
Review status: approved / changes_requested / commented
Comment count + last comment timestamp
Updated at timestamp
Mergeable status

Runs via crontab every 5 minutes. Each run produces a log line with structured JSON.

Component 2: State Files — The Memory

Instead of a database (overkill for this), everything goes to flat JSON files:

// /tmp/pr-monitor-state.json — current snapshot of all PRs
{
  "asyncapi/modelina#2518": {
    "state": "open",
    "review_status": null,
    "comment_count": 0,
    "last_updated": "2026-04-14T12:43:17Z",
    "checked_at": "2026-04-17T13:05:00Z"
  },
  "memtomem/memtomem#130": {
    "state": "merged",
    "merged_at": "2026-04-15T22:34:41Z",
    ...
  }
}

// /tmp/pr-monitor-pending-alerts.json — events waiting to be pushed
[
  {
    "pr": "EtienneLescot/n8n-as-code#328",
    "type": "COMMENT_ADDED",
    "delivered": false,
    "payload": { ... }
  }
]

The agent's heartbeat script reads pending-alerts.json, pushes notifications via Feishu, then marks them delivered: true.

Why JSON files? Simplicity. No database to set up, no migrations, easy to debug with cat. If something breaks, I can read the state file and understand exactly what happened.

Component 3: pr-monitor-watchdog.sh — The Safety Net

The most important lesson from this build: monitoring that isn't monitored is useless.

#!/bin/bash
# pr-monitor-watchdog.sh — verifies v3 is alive and healthy

LOG_FILE="/tmp/pr-monitor-v3.log"
STATE_FILE="/tmp/pr-monitor-state.json"

# Check 1: Is the log file being updated?
if [ ! -f "$LOG_FILE" ]; then
    echo "[FAIL] Log file missing"
    exit 1
fi

LAST_MOD=$(stat -c %Y "$LOG_FILE" 2>/dev/null || echo 0)
NOW=$(date +%s)
AGE=$((NOW - LAST_MOD))

if [ $AGE -gt 600 ]; then  # 10 minutes without update = dead
    echo "[FAIL] Log stale (${AGE}s old)"
    # Try to restart or alert
    exit 1
fi

# Check 2: Does state file have valid JSON?
python3 -c "import json; json.load(open('$STATE_FILE'))" 2>/dev/null
if [ $? -ne 0 ]; then
    echo "[FAIL] State file corrupted"
    exit 1
fi

# Check 3: Are we tracking expected number of PRs?
PR_COUNT=$(python3 -c "import json; d=json.load(open('$STATE_FILE')); print(len(d))")
if [ $PR_COUNT -lt 20 ]; then  # Should have 25+
    echo "[WARN] Only tracking ${PR_COUNT} PRs (expected 25+)"
fi

echo "[OK] v3 healthy — ${PR_COUNT} PRs tracked, state age ${AGE}s"

The watchdog runs every 30 minutes via crontab. If it detects failure → immediate alert to Feishu.

How It All Connects (The OpenClaw Part)

Here's where OpenClaw ties everything together. The agent's HEARTBEAT.md contains instructions like:

PR Monitoring (v3 + watchdog)

Every heartbeat must execute:

Run watchdog: bash scripts/pr-monitor-watchdog.sh

If FAIL → fix + notify commander

If OK + pending alerts exist → notify commander

Check /tmp/pr-monitor-pending-alerts.json for urgent events

So every ~30 minutes, the agent wakes up and:

Runs the watchdog → confirms scanner is healthy
Reads pending alerts → pushes anything new to me
Goes back to sleep

No constant polling by the main agent. The scanner does the heavy lifting, the agent just reads results. Clean separation of concerns.

The Bug That Proved Why We Need This

Here's the ironic part. The v2 version of this monitoring script had a bug:

# BUG: .ends is not a valid Python string method
if filename.endswith(".py"):
    process_file(filename)
else if filename.ends(".js"):  # ← CRASHES EVERY TIME
    process_js_file(filename)

.ends() doesn't exist in Python. It's .endswith().

This script was supposed to be running for 18 days. It crashed on the very first run and never executed successfully once.

How did I find out? Only when I built the v3 version with the watchdog and the watchdog reported: "[FAIL] Log file missing."

Before that? I assumed it was working because "I deployed it and didn't get any errors." Classic case of monitoring that doesn't work giving you false confidence.

The fix was trivial (one character). But finding it took 18 days without proper observability.

What I'd Do Differently

Looking back after 3 months of running this:

Decision	Would I Change It?	Why
JSON files instead of DB	No	Simple, debuggable, zero maintenance
Python + gh CLI	Maybe	Works well but a Node.js version would integrate tighter with the rest of the stack
5-minute poll interval	Yes	Could use GitHub webhooks for instant push; polling wastes API calls
Separate watchdog script	No	Best decision made. Catches exactly the class of bugs that matter
Flat alert JSON file	Yes	Would use a small queue (Redis/SQLite) for reliability if the agent misses a heartbeat

Live Stats (Right Now)

As of writing this, the system is tracking:

26 pull requests across repositories including asyncapi/modelina, n8n-as-code, memtomem, claude-builders-bounty, and others
1 merged (memtomem#130 — $100 bounty) ✅
1 closed (pgmpy#3323 — maintainer rejected, lesson learned)
24 open and waiting for review feedback
Scanner uptime: Healthy (watchdog confirmed < 10 min ago)

Try It Yourself

The core pattern here — scan → compare state → emit events → consume events — is applicable way beyond PR monitoring. You could use the same architecture for:

Issue triage (new issues matching your criteria)
Dependency security alerts (new CVEs in your deps)
Competitor monitoring (changes to their repos/docs)
Your own product's issue tracker

The key insight: let the dumb script do the dumb work, and let your agent make the smart decisions.

Built with OpenClaw. Running in production. Catching its own bugs since 2026.

Questions? Drop a comment. Happy to share more details about the implementation.

DEV Community

I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)

I Built a Self-Healing PR Monitor With OpenClaw (And It Caught Its Own Bugs)

What This Is

The Problem

The Architecture

Component 1: pr-monitor-v3.py — The Scanner

Component 2: State Files — The Memory

Component 3: pr-monitor-watchdog.sh — The Safety Net

How It All Connects (The OpenClaw Part)

PR Monitoring (v3 + watchdog)

The Bug That Proved Why We Need This

What I'd Do Differently

Live Stats (Right Now)

Try It Yourself

Top comments (0)