Şahin Uygutalp

Posted on Mar 26

I Built a GitHub Action to Stop AI-Generated PRs Before They Reach My Queue

#github #opensource #ai #security

Last year, Daniel Stenberg — the author of curl — shut down his project's bug bounty program.

The reason? 20% of the incoming reports were AI-generated garbage. Not just low-quality — worthless. Hallucinated vulnerabilities, copy-pasted exploit templates, fabricated CVEs. His team was spending more time triaging noise than fixing real bugs.

This is the asymmetry nobody talks about: AI can generate 500 lines of plausible-looking code in two seconds. Reviewing it still takes a human hours.

And it's breaking open source.

The industry's fix made things worse

When the "AI PR flood" problem became obvious, the market responded with AI code review bots — CodeRabbit, Copilot review, and friends.

Here's the problem: they review code the way an anxious intern would. They flood your PR timeline with comments about variable naming, whitespace, missing docstrings. They are glorified linters with a chat interface.

Maintainers went from dealing with one source of noise (AI-generated PRs) to dealing with two (AI-generated PRs + AI-generated review comments).

I call this double review fatigue. And it's what made me build something different.

A different approach: Zero-Nitpick

I built PR-Sentry — a GitHub Action with one core rule:

Never comment on style. Only report things that break in production.

That means: security vulnerabilities, runtime crashes, memory leaks, race conditions. Nothing else.

Here's how it works under the hood:

1. Statistical slop detection (no LLM needed)

Before calling any API, PR-Sentry runs a local analysis on the PR description and diff. It calculates a "slop score" based on buzzword density (robust, seamless, leverage, synergy...), passive voice ratio, sentence length patterns, and repetition score.

If the PR scores above 60, it's flagged as AI slop — without burning a single API token.

slop_score = (
    buzzword_density * 30 +
    passive_voice_ratio * 20 +
    sentence_length_avg * 20 +
    repetition_score * 30
)

is_slop = slop_score >= 60

2. Security scanning with entropy analysis

The diff parser checks for 50+ security patterns before any LLM touches the code:

PATTERNS = {
    "aws_key":    r"AKIA[0-9A-Z]{16}",
    "github_pat": r"gh[pousr]_[A-Za-z0-9_]{36,}",
    "openai_key": r"sk-[A-Za-z0-9]{48}",
    "sql_inject": r"SELECT.*FROM.*WHERE.*=.*\$",
    "xss":        r"innerHTML\s*=|document\.write\(",
    # ... 45 more
}

High-entropy strings (Shannon entropy > 4.5) are also flagged to catch accidentally committed secrets.

3. Constrained AI review

Only PRs that pass the slop filter and show signs of potential runtime issues reach the LLM. And the system prompt is strict:

"You are a zero-nitpick code reviewer. Report ONLY: runtime crashes, memory leaks, race conditions, security vulnerabilities. If the code is logically sound, say nothing."

One concise comment. Or silence. Never noise.

Setup takes 2 minutes

Add this to .github/workflows/pr-sentry.yml:

name: PR-Sentry Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run PR-Sentry
        uses: Ebuodinde/PR_SENTRY@v3
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          # Optional: switch providers
          # provider: openai  # or deepseek

Then add ANTHROPIC_API_KEY to your repo secrets. Done.

No database. No external server. No lock-in — it supports Anthropic, OpenAI, and DeepSeek out of the box.

It also works locally via MCP

If you use Cursor or Claude Code, PR-Sentry ships an MCP server. You can run the slop detector and security scanner against your diff before pushing, directly from your IDE.

What's next

The tool is at v3.0.0 with 262 passing tests. What I want to improve next:

Smarter language-aware slop detection (Python idioms vs JS patterns)
VS Code extension for local pre-push checks
Feedback loop: learning from maintainer decisions over time

If you maintain an open source project and review fatigue is real for you — give it a try. Remove it anytime, it's just a YAML file.

→ github.com/Ebuodinde/PR_SENTRY

Have you noticed an uptick in AI-generated PRs in your repos? Curious how others are handling it — drop a comment.

Top comments (2)

PEACEBINFLOW • Mar 27

The "asymmetry of noise" you mentioned is the exact point where most current AI implementations are failing. We’ve accidentally optimized for the production of information without building the corresponding infrastructure for its filtration. It’s a classic system overflow.

The "Slop Score" as a Biological Buffer
I’m really impressed by your "Statistical Slop Detection." Using Shannon entropy and linguistic patterns to flag AI-generated content before it hits a heavy model is a brilliant piece of system design. You’re essentially creating a cognitive firewall. It treats the PR description like a packet header; if the header is filled with "low-signal buzzwords," there's a high probability the payload (the code) is also junk.

Avoiding the "Anxious Intern" Feedback Loop
Your "Zero-Nitpick" rule is the bridge we need. Most AI review bots operate on surface-level patterns—naming, spacing, linting—because those are the easiest "tokens" to predict. But as a maintainer, you care about the causal logic: Will this leak memory? Is there a race condition? By constraining the AI to only speak when it finds a structural failure, you’re moving it from a "complainer" to a "guardian."

The Entropy of Secrets
The inclusion of Shannon entropy analysis for secret detection is a vital touch. AI is notoriously bad at "knowing" it just hallucinated a valid-looking secret or accidentally leaked one from its training context. Having a local, deterministic check for high-entropy strings provides a grounded baseline that no LLM can currently match for reliability.

A Reflective Thought
It makes me wonder if we are entering an era of "Inverted Development"—where the hardest part of the job isn't writing the code, but defending the codebase against a 24/7 stream of high-speed, low-quality data. Tools like PR-Sentry aren't just utilities; they’re the start of a defensive architecture for the open-source commons.

I'm curious: Have you considered adding a "Commit Velocity" check? Sometimes the biggest indicator of AI slop isn't just the words, but the fact that 500 lines were "authored" in the time it takes a human to drink a sip of coffee.

Şahin Uygutalp • Mar 28

Commit Velocity was on my list early on. The problem: a bad actor just adds sleep(random.uniform(2, 8)) to their bot and bypasses it completely. You can spoof a timestamp, you can't easily spoof the semantic structure of a commit message or the linguistic entropy of a PR description.
That said — as agents get better at mimicking human cadence, velocity + slop score combined might become worth the tradeoff. It's on the roadmap.