DEV Community

Cover image for I built a tool that runs static analysis + Claude AI review on every GitHub Push/PR — SCAManager
xzawed
xzawed

Posted on

I built a tool that runs static analysis + Claude AI review on every GitHub Push/PR — SCAManager

It started as a small annoyance
PR reviews are always a chore. On a small team — or a side project I run alone — the "someone has to look at this" person is always me. And if you're pushing straight to main, code review effectively disappears.
I started by stacking pylint and flake8 on top of GitHub Actions. But those don't answer the questions that actually matter: did this change do what I meant it to? Or does the commit message actually describe what changed? Static analysis catches grammar and style. It can't read intent.
So I asked Claude to review the same diffs, fused both signals together, scored them out of 100, and pushed the result to Telegram. That became SCAManager.
GitHub: https://github.com/xzawed/SCAManager

What it does
When a GitHub Webhook fires for a Push or PR event, the following runs in parallel:

Static analysis — pylint, flake8, bandit
AI code review — Claude Haiku 4.5
Commit message evaluation — Claude AI

Results map to a 100-point score and an A–F grade, then ship to whichever of the nine channels you've configured: Telegram, GitHub PR Comment, GitHub Commit Comment, GitHub Issue, Discord, Slack, Email, Generic Webhook, n8n.
For PRs, the score drives the gate automatically:

Auto mode — Above threshold → GitHub APPROVE. Below → REQUEST_CHANGES.
Semi-auto mode — Inline buttons in Telegram for manual approval.
Auto-merge — Above a separate threshold → squash merge.

The scoring system — why these weights
ItemPointsEvaluatorCode quality25pylint + flake8Security20banditCommit message15Claude AIImplementation direction25Claude AITest coverage15Claude AITotal100
Things machines see well go to machines (pylint, bandit). Things that need human judgment go to AI. AI evaluations come back on a 0–10 or 0–20 scale, then get re-weighted into the final score.
If ANTHROPIC_API_KEY isn't set, the AI items default to a neutral middle, and static analysis alone can still hit 89 points (B grade) at most. The tool isn't useless without API spend.

Architecture — the parts that were interesting to build

  1. asyncio.gather() for parallelism Running static analysis and AI review serially makes per-PR analysis time miserable. Wrapping them in asyncio.gather() collapses total wall-clock to whatever the slowest task is. I use asyncio.gather(return_exceptions=True) for the nine notification channels too — but here the goal is isolation, not speed. If Telegram is down, that shouldn't block Slack.
  2. Idempotency — same SHA, no double work GitHub Webhooks get retransmitted (response timeouts, retries, etc.). Running the same commit SHA twice costs money and produces no new information, so I dedupe by SHA at the DB layer. GitHub Push/PR └─ POST /webhooks/github (HMAC-SHA256 verification) └─ BackgroundTask: run_analysis_pipeline() ├─ Repo register · SHA dedup (idempotency) ├─ asyncio.gather() ── parallel │ ├─ analyze_file() × N (pylint · flake8 · bandit) │ └─ review_code() (Claude AI) ├─ calculate_score() → grade ├─ run_gate_check() [PR only] └─ asyncio.gather(return_exceptions=True) → notification channels
  3. Two ways to use the AI Same review, two call paths:

Server mode — Anthropic API. Needs ANTHROPIC_API_KEY. Costs money.
Local hook mode — Claude Code CLI (claude -p). Runs locally, no API key needed.

Local hook mode runs as a pre-push git hook. Output goes to terminal and to the dashboard. Environments without the CLI (Codespaces, mobile) silently skip the hook — exit 0 always, never blocks the push.

  1. DB Failover I built a FailoverSessionFactory that switches over to a fallback PostgreSQL when primary dies. /health reports which DB is currently active. Honestly, this is probably over-engineered. Whether a small side project actually needs failover is a separate question — building it was largely a learning exercise.

Limits and trade-offs
This tool isn't going to fit every team. Being honest about it:

Python-only — Static analysis is pylint/flake8/bandit. For non-Python repos, only the AI review piece gives you value.
AI score consistency — LLM output isn't 100% deterministic. The score is for spotting trends, not as a hard, trustworthy number.
API cost — Teams shipping big PRs frequently can rack up Claude API spend fast. File filters and thresholds give you some control, but it's a real cost line.
Auto-merge risk — Score-driven squash merge is convenient and dangerous. Validate your threshold settings before turning it on. Start in semi-auto mode.

If you want to try it

Repo: https://github.com/xzawed/SCAManager
License: MIT
Required: Python 3.13 · PostgreSQL · GitHub OAuth App
Optional: ANTHROPIC_API_KEY · Telegram Bot Token · SMTP

Easiest deploy: Railway with the PostgreSQL plugin and your env vars filled in. For on-prem, uvicorn + nginx + systemd works fine.
Feedback, issues, and "wait, is this actually how it should behave?" reports are all welcome.

Top comments (0)