Alex Chen

Posted on May 15

Building a Real-Time GitHub PR Monitor (That Actually Works)

#github #monitoring #opensource #showdev

Building a Real-Time GitHub PR Monitor (That Actually Works)

I spent 2 months building a PR monitoring system. Here's what I learned about tracking 200+ pull requests across 25 repositories.

The Problem

I contribute to open source. A lot. At one point I had 30+ open PRs across different repos — Node.js, Python, Pillow, asyncapi, you name it.

The problem? I had no idea what was happening with them.

Did a maintainer leave a review comment?
Did CI fail on a rebased branch?
Was my PR superseded by someone else's?
Did it get merged while I wasn't looking?

GitHub sends email notifications, but they're noisy, delayed, and easy to miss in an inbox full of other stuff.

I needed something better.

What I Tried (And Why It Failed)

Attempt 1: GitHub Webhooks

Set up a webhook receiver that listens for pull_request events.

Problem: You need to register webhooks per-repo. I don't have write access to most repos I contribute to. And maintaining a public webhook endpoint that handles auth correctly is annoying.

Verdict: Only works for your own repos.

Attempt 2: GitHub Actions + Slack

A scheduled workflow that checks PR status and posts to Slack.

Problem: Rate limits (5000/hour sounds like a lot until you're polling 200 PRs every 5 minutes). Also, Actions has a cold start delay of 1-5 minutes, which defeats the purpose of "real-time."

Verdict: Works but slow and eats into rate limits.

Attempt 3: Polling the GraphQL API

The GitHub GraphQL API lets you query multiple PRs in a single request. Much more efficient than REST.

query ($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    pullRequests(states: OPEN, first: 20, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        number
        title
        state
        createdAt
        updatedAt
        reviews(last: 5) { nodes { author { login }, state, body } }
        comments(last: 3) { nodes { author { login }, body, createdAt } }
        statusCheckRollup { state }
        mergeable
        labels { nodes { name } }
      }
    }
  }
}

This was the winner.

The Final Architecture

┌─────────────────────────────────────────────┐
│              Monitoring Server               │
│                                             │
│  ┌──────────┐   ┌───────────┐   ┌────────┐ │
│  │ Scheduler │──▶│ GraphQL   │──▶│ Parser │ │
│  │ (cron)   │   │ Fetcher   │   │        │ │
│  └──────────┘   └───────────┘   └───┬────┘ │
│                                      │       │
│                              ┌───────▼──────┐│
│                              │ State Compare ││
│                              │ (diff vs last)││
│                              └───────┬──────┘│
│                                      │       │
│                    ┌─────────────────┼───────┤│
│                    ▼                 ▼       ▼│
│              ┌──────────┐   ┌──────────┐ ┌──┴──┐│
│              │ Log File │   │ Alert    │ │State ││
│              │ (JSONL)  │   │ Trigger  │ │File  ││
│              └──────────┘   └──────────┘ └─────┘│
└─────────────────────────────────────────────┘

Key Design Decisions

1. JSONL Log Format

Every poll cycle appends one line to a log file:

{"ts":1715847200,"repo":"nodejs/node","pr":63040,"state":"OPEN","reviews":1,"ci":"SUCCESS","mergeable":true,"last_event":"review"}
{"ts":1715847100,"repo":"python-pillow/Pillow","pr":9581,"state":"OPEN","reviews":0,"ci":"SUCCESS","mergeable":true,"last_event":null}

Why JSONL? Append-only (no file locking issues), grep-friendly, easy to parse.

2. Diff-Based Event Detection

Don't alert on every poll. Only alert when something CHANGES:

function detectEvents(current, previous) {
  const events = [];

  for (const [key, pr] of current) {
    const old = previous.get(key);
    if (!old) continue;

    // New review?
    if (pr.reviews > old.reviews) {
      events.push({ type: 'REVIEW', pr, repo: pr.repo });
    }

    // CI status changed?
    if (pr.ci !== old.ci && pr.ci === 'FAILURE') {
      events.push({ type: 'CI_FAIL', pr, repo: pr.repo });
    }

    // No longer mergeable?
    if (old.mergeable && !pr.mergeable) {
      events.push({ type: 'CONFLICT', pr, repo: pr.repo });
    }

    // Got merged!
    if (old.state === 'OPEN' && pr.state === 'MERGED') {
      events.push({ type: 'MERGED', pr, repo: pr.repo });
    }
  }

  return events;
}

3. Rate Limit Awareness

GitHub's GraphQL API has a 5000 points/hour limit. Each PR query costs roughly 5-15 points depending on how many reviews/comments you request.

With 200 PRs at ~10 points each = 2000 points per poll cycle.
At 5-minute intervals = 12 cycles/hour = 24,000 points/hour.

That's 5x over the limit!

Solution: Staggered polling.

Priority repos (your own active PRs): Every 5 minutes
Watched repos (repos you care about): Every 30 minutes
Background repos (just keeping an eye): Every 2 hours

This brings usage to ~3000 points/hour — within limits with headroom.

4. Persistence & Recovery

The system writes state to a JSON file after each successful poll:

{
  "lastPoll": 1715847200,
  "repos": {
    "nodejs/node": { "63040": { "state": "OPEN", ... } },
    "python-pillow/Pillow": { "9581": { "state": "OPEN", ... } }
  },
  "pendingAlerts": []
}

On restart, it loads this file and only polls for changes since the last timestamp. No duplicate alerts after crashes.

What It Catches (That Email Notifications Miss)

Event Type	GitHub Email	My Monitor	Latency
New review/comment	✅ (delayed 5-60min)	✅ (< 5min)	12x faster
CI failure	❌ (not always)	✅ (instant)	∞
Merge conflict	❌ never	✅ (next poll)	~5min
PR merged	✅ (delayed)	✅ (instant)	~60x faster
Label added	❌ never	✅ (next poll)	~5min
Superseded by another PR	❌ never	✅ (detected)	~5min

The biggest win: CI failures. GitHub doesn't proactively notify you when CI fails on your PR. You find out next time you check manually (which could be days).

The Notification Problem

Detecting events is half the battle. The other half: getting them to the right place.

My setup routes notifications based on severity:

function route(event) {
  switch (event.type) {
    case 'MERGED':
      return ['feishu', 'log'];           // Celebrate! Tell everyone
    case 'CI_FAIL':
      return ['feishu', 'urgent-log'];     // Fix this NOW
    case 'REVIEW':
      if (isMaintainer(event.author))
        return ['feishu', 'log'];          // Maintainer responded!
      else
        return ['log'];                     // Just log it
    case 'CONFLICT':
      return ['feishu'];                   // Need to rebase
    default:
      return ['log'];
  }
}

Lessons Learned

1. Don't Trust GitHub Emails

They're unreliable. Sometimes they arrive. Sometimes they don't. Sometimes they arrive in batches 6 hours late. Build your own monitoring if you care about response time.

2. State Files Are Your Friend

After a crash/restart, the last thing you want is to re-alert everything that happened in the past hour. Persist state. Load it on startup. Only look forward.

3. Watchdog Your Monitor

Your monitor needs its own monitor. I have a separate cron job that checks "did the monitoring script run in the last 10 minutes?" If not → alert.

Yes, it's turtles all the way down.

4. Log Everything, Read Nothing (Until You Need To)

I log every event to a JSONL file. Most of the time I never read it. But when something goes wrong ("why didn't I get alerted about PR #9581?"), I can grep through 30 days of history and find exactly what happened.

The Code

The core logic is about 200 lines of JavaScript. Runs as a cron job inside my main automation process. Zero dependencies beyond node-fetch (or native fetch in Node 18+).

If there's interest, I can clean it up and open-source it. Let me know in the comments.

What's Next

Web dashboard: A simple page showing current status of all tracked PRs
Slack/Discord integration: For teams who want channel-based notifications
Smart prioritization: ML model to predict which PRs are likely to get merged (focus effort there)
Auto-rebase: When a conflict is detected, automatically attempt rebase (risky but useful)

What's your strategy for tracking contributions across many repos? Still using email? Built something custom? Drop a comment — I'd love to hear how others handle this.

Follow @armorbreak for more deep-dives into developer tooling.

DEV Community

Building a Real-Time GitHub PR Monitor (That Actually Works)

Building a Real-Time GitHub PR Monitor (That Actually Works)

The Problem

What I Tried (And Why It Failed)

Attempt 1: GitHub Webhooks

Attempt 2: GitHub Actions + Slack

Attempt 3: Polling the GraphQL API

The Final Architecture

Key Design Decisions

1. JSONL Log Format

2. Diff-Based Event Detection

3. Rate Limit Awareness

4. Persistence & Recovery

What It Catches (That Email Notifications Miss)

The Notification Problem

Lessons Learned

1. Don't Trust GitHub Emails

2. State Files Are Your Friend

3. Watchdog Your Monitor

4. Log Everything, Read Nothing (Until You Need To)

The Code

What's Next

Top comments (0)