How I cut AI token usage by 40% with a pre-filter step

#typescript #ai #opensource #github

When I built ReleaseHub - a CLI that generates release notes from merged PRs - every PR went to the AI. Feature, bugfix, dependency bump, CI fix. All of them.

The problem: ~40% of merged PRs in a typical repo are always going to be marked "internal". They have zero user-facing impact. But I was spending tokens on them anyway.

The fix was a pre-filter step that runs before any API call.

The pattern

const INTERNAL_PATTERNS = [
  /^(chore|ci|build|test|refactor|style|perf)(\(.+\))?:/i,
  /^bump\s/i,
  /\bdependabot\b/i,
  /\brenovate\b/i,
]

const INTERNAL_LABELS = ['dependencies', 'ci', 'chore', 'internal']

export function prefilterPRs(prs: PullRequest[]) {
  const toAnalyze = []
  const prefiltered = []

  for (const pr of prs) {
    const isInternal =
      INTERNAL_PATTERNS.some(p => p.test(pr.title)) ||
      pr.labels.some(l => INTERNAL_LABELS.includes(l.toLowerCase()))

    if (isInternal) {
      prefiltered.push({
        original_title: "pr.title,"
        category: 'internal',
        visible: false,
        confidence: 1,
      })
    } else {
      toAnalyze.push(pr)
    }
  }

  return { toAnalyze, prefiltered }
}

The key insight: regex confidence on these patterns is effectively 1.0. A PR titled chore: bump lodash is internal. Always. No LLM needed.

Results

On a 12-PR release:

Before: 12 PRs sent to AI
After: 7 PRs sent to AI (5 pre-filtered)
Token savings: ~40%
Accuracy change: none

The AI still handles the ambiguous ones - "Update auth middleware" could be a refactor or a security fix. That's where the reasoning step matters.

One gotcha

Don't pre-filter by keyword matching on the full title. "Add CI status badge to README" contains "CI" but it's user-facing. Match against conventional commit prefixes at the start of the title, not anywhere in the string.

/^ci:/i   // correct - only matches "ci: ..." prefix
/\bci\b/i // wrong - matches "Add CI status badge"

I shipped this in ReleaseHub v1.1.1. The full pre-filter is here.

What other patterns would you add to the filter?

DEV Community

How I cut AI token usage by 40% with a pre-filter step

The pattern

Results

One gotcha

Top comments (0)