I've been building PR Focus, a Chrome extension that helps developers triage GitHub pull requests. One of the first decisions I had to make was: how do I actually sort PRs by priority?
The obvious answer is "use AI to score the risk". But I didn't want to rely 100% on an LLM because:
- AI is inconsistent (same diff, different scores).
- It costs users tokens on every poll.
- A wrong AI score can bury a broken PR.
So I built a hybrid system: deterministic signals (CI status + PR age) form the floor, and the AI risk score is a tiebreaker on top. Failing CI always floats to the top, regardless of what the AI says.
I wrote up the full decision, including the trade-offs and what it cost, in my new Build Logs repo.
If you're building dev tools or wrestling with AI reliability, the full log might be useful:
🔗 Why PR risk scoring is a hybrid, not a pure AI verdict
Top comments (0)