Building a Real-Time GitHub PR Monitor (That Actually Works)
I spent 2 months building a PR monitoring system. Here's what I learned about tracking 200+ pull requests across 25 repositories.
The Problem
I contribute to open source. A lot. At one point I had 30+ open PRs across different repos — Node.js, Python, Pillow, asyncapi, you name it.
The problem? I had no idea what was happening with them.
- Did a maintainer leave a review comment?
- Did CI fail on a rebased branch?
- Was my PR superseded by someone else's?
- Did it get merged while I wasn't looking?
GitHub sends email notifications, but they're noisy, delayed, and easy to miss in an inbox full of other stuff.
I needed something better.
What I Tried (And Why It Failed)
Attempt 1: GitHub Webhooks
Set up a webhook receiver that listens for pull_request events.
Problem: You need to register webhooks per-repo. I don't have write access to most repos I contribute to. And maintaining a public webhook endpoint that handles auth correctly is annoying.
Verdict: Only works for your own repos.
Attempt 2: GitHub Actions + Slack
A scheduled workflow that checks PR status and posts to Slack.
Problem: Rate limits (5000/hour sounds like a lot until you're polling 200 PRs every 5 minutes). Also, Actions has a cold start delay of 1-5 minutes, which defeats the purpose of "real-time."
Verdict: Works but slow and eats into rate limits.
Attempt 3: Polling the GraphQL API
The GitHub GraphQL API lets you query multiple PRs in a single request. Much more efficient than REST.
query ($owner: String!, $name: String!) {
repository(owner: $owner, name: $name) {
pullRequests(states: OPEN, first: 20, orderBy: {field: UPDATED_AT, direction: DESC}) {
nodes {
number
title
state
createdAt
updatedAt
reviews(last: 5) { nodes { author { login }, state, body } }
comments(last: 3) { nodes { author { login }, body, createdAt } }
statusCheckRollup { state }
mergeable
labels { nodes { name } }
}
}
}
}
This was the winner.
The Final Architecture
┌─────────────────────────────────────────────┐
│ Monitoring Server │
│ │
│ ┌──────────┐ ┌───────────┐ ┌────────┐ │
│ │ Scheduler │──▶│ GraphQL │──▶│ Parser │ │
│ │ (cron) │ │ Fetcher │ │ │ │
│ └──────────┘ └───────────┘ └───┬────┘ │
│ │ │
│ ┌───────▼──────┐│
│ │ State Compare ││
│ │ (diff vs last)││
│ └───────┬──────┘│
│ │ │
│ ┌─────────────────┼───────┤│
│ ▼ ▼ ▼│
│ ┌──────────┐ ┌──────────┐ ┌──┴──┐│
│ │ Log File │ │ Alert │ │State ││
│ │ (JSONL) │ │ Trigger │ │File ││
│ └──────────┘ └──────────┘ └─────┘│
└─────────────────────────────────────────────┘
Key Design Decisions
1. JSONL Log Format
Every poll cycle appends one line to a log file:
{"ts":1715847200,"repo":"nodejs/node","pr":63040,"state":"OPEN","reviews":1,"ci":"SUCCESS","mergeable":true,"last_event":"review"}
{"ts":1715847100,"repo":"python-pillow/Pillow","pr":9581,"state":"OPEN","reviews":0,"ci":"SUCCESS","mergeable":true,"last_event":null}
Why JSONL? Append-only (no file locking issues), grep-friendly, easy to parse.
2. Diff-Based Event Detection
Don't alert on every poll. Only alert when something CHANGES:
function detectEvents(current, previous) {
const events = [];
for (const [key, pr] of current) {
const old = previous.get(key);
if (!old) continue;
// New review?
if (pr.reviews > old.reviews) {
events.push({ type: 'REVIEW', pr, repo: pr.repo });
}
// CI status changed?
if (pr.ci !== old.ci && pr.ci === 'FAILURE') {
events.push({ type: 'CI_FAIL', pr, repo: pr.repo });
}
// No longer mergeable?
if (old.mergeable && !pr.mergeable) {
events.push({ type: 'CONFLICT', pr, repo: pr.repo });
}
// Got merged!
if (old.state === 'OPEN' && pr.state === 'MERGED') {
events.push({ type: 'MERGED', pr, repo: pr.repo });
}
}
return events;
}
3. Rate Limit Awareness
GitHub's GraphQL API has a 5000 points/hour limit. Each PR query costs roughly 5-15 points depending on how many reviews/comments you request.
With 200 PRs at ~10 points each = 2000 points per poll cycle.
At 5-minute intervals = 12 cycles/hour = 24,000 points/hour.
That's 5x over the limit!
Solution: Staggered polling.
- Priority repos (your own active PRs): Every 5 minutes
- Watched repos (repos you care about): Every 30 minutes
- Background repos (just keeping an eye): Every 2 hours
This brings usage to ~3000 points/hour — within limits with headroom.
4. Persistence & Recovery
The system writes state to a JSON file after each successful poll:
{
"lastPoll": 1715847200,
"repos": {
"nodejs/node": { "63040": { "state": "OPEN", ... } },
"python-pillow/Pillow": { "9581": { "state": "OPEN", ... } }
},
"pendingAlerts": []
}
On restart, it loads this file and only polls for changes since the last timestamp. No duplicate alerts after crashes.
What It Catches (That Email Notifications Miss)
| Event Type | GitHub Email | My Monitor | Latency |
|---|---|---|---|
| New review/comment | ✅ (delayed 5-60min) | ✅ (< 5min) | 12x faster |
| CI failure | ❌ (not always) | ✅ (instant) | ∞ |
| Merge conflict | ❌ never | ✅ (next poll) | ~5min |
| PR merged | ✅ (delayed) | ✅ (instant) | ~60x faster |
| Label added | ❌ never | ✅ (next poll) | ~5min |
| Superseded by another PR | ❌ never | ✅ (detected) | ~5min |
The biggest win: CI failures. GitHub doesn't proactively notify you when CI fails on your PR. You find out next time you check manually (which could be days).
The Notification Problem
Detecting events is half the battle. The other half: getting them to the right place.
My setup routes notifications based on severity:
function route(event) {
switch (event.type) {
case 'MERGED':
return ['feishu', 'log']; // Celebrate! Tell everyone
case 'CI_FAIL':
return ['feishu', 'urgent-log']; // Fix this NOW
case 'REVIEW':
if (isMaintainer(event.author))
return ['feishu', 'log']; // Maintainer responded!
else
return ['log']; // Just log it
case 'CONFLICT':
return ['feishu']; // Need to rebase
default:
return ['log'];
}
}
Lessons Learned
1. Don't Trust GitHub Emails
They're unreliable. Sometimes they arrive. Sometimes they don't. Sometimes they arrive in batches 6 hours late. Build your own monitoring if you care about response time.
2. State Files Are Your Friend
After a crash/restart, the last thing you want is to re-alert everything that happened in the past hour. Persist state. Load it on startup. Only look forward.
3. Watchdog Your Monitor
Your monitor needs its own monitor. I have a separate cron job that checks "did the monitoring script run in the last 10 minutes?" If not → alert.
Yes, it's turtles all the way down.
4. Log Everything, Read Nothing (Until You Need To)
I log every event to a JSONL file. Most of the time I never read it. But when something goes wrong ("why didn't I get alerted about PR #9581?"), I can grep through 30 days of history and find exactly what happened.
The Code
The core logic is about 200 lines of JavaScript. Runs as a cron job inside my main automation process. Zero dependencies beyond node-fetch (or native fetch in Node 18+).
If there's interest, I can clean it up and open-source it. Let me know in the comments.
What's Next
- Web dashboard: A simple page showing current status of all tracked PRs
- Slack/Discord integration: For teams who want channel-based notifications
- Smart prioritization: ML model to predict which PRs are likely to get merged (focus effort there)
- Auto-rebase: When a conflict is detected, automatically attempt rebase (risky but useful)
What's your strategy for tracking contributions across many repos? Still using email? Built something custom? Drop a comment — I'd love to hear how others handle this.
Follow @armorbreak for more deep-dives into developer tooling.
Top comments (0)