When I built ReleaseHub - a CLI that generates release notes from merged PRs - every PR went to the AI. Feature, bugfix, dependency bump, CI fix. All of them.
The problem: ~40% of merged PRs in a typical repo are always going to be marked "internal". They have zero user-facing impact. But I was spending tokens on them anyway.
The fix was a pre-filter step that runs before any API call.
The pattern
const INTERNAL_PATTERNS = [
/^(chore|ci|build|test|refactor|style|perf)(\(.+\))?:/i,
/^bump\s/i,
/\bdependabot\b/i,
/\brenovate\b/i,
]
const INTERNAL_LABELS = ['dependencies', 'ci', 'chore', 'internal']
export function prefilterPRs(prs: PullRequest[]) {
const toAnalyze = []
const prefiltered = []
for (const pr of prs) {
const isInternal =
INTERNAL_PATTERNS.some(p => p.test(pr.title)) ||
pr.labels.some(l => INTERNAL_LABELS.includes(l.toLowerCase()))
if (isInternal) {
prefiltered.push({
original_title: "pr.title,"
category: 'internal',
visible: false,
confidence: 1,
})
} else {
toAnalyze.push(pr)
}
}
return { toAnalyze, prefiltered }
}
The key insight: regex confidence on these patterns is effectively 1.0. A PR titled chore: bump lodash is internal. Always. No LLM needed.
Results
On a 12-PR release:
- Before: 12 PRs sent to AI
- After: 7 PRs sent to AI (5 pre-filtered)
- Token savings: ~40%
- Accuracy change: none
The AI still handles the ambiguous ones - "Update auth middleware" could be a refactor or a security fix. That's where the reasoning step matters.
One gotcha
Don't pre-filter by keyword matching on the full title. "Add CI status badge to README" contains "CI" but it's user-facing. Match against conventional commit prefixes at the start of the title, not anywhere in the string.
/^ci:/i // correct - only matches "ci: ..." prefix
/\bci\b/i // wrong - matches "Add CI status badge"
I shipped this in ReleaseHub v1.1.1. The full pre-filter is here.
What other patterns would you add to the filter?
Top comments (0)