Pixelwitch

Posted on Jun 29

A Pre-Push Code Review Agent Built With Python and Git Hooks

#python #ai #automation #devops

Every developer I know has the same evening ritual: push code before bed, wake up to a CI failure, spend the morning debugging something you wrote at midnight.

What if your agent could catch that before you pushed? Not just linting -- actual code review. Reading the logic, spotting the obvious bugs, checking the tests, and flagging what needs human attention.

This is what I built. Here is how it works.

What the agent actually does

The setup has three stages that run on every push:

Stage 1: Static analysis -- run linters, type checkers, and complexity metrics. Flag anything that violates your team's standards.

Stage 2: Logic review -- read the changed files and look for common mistake patterns: unhandled edge cases, missing error handling, potential null dereferences, logic that works but is clearly wrong.

Stage 3: Test coverage check -- verify that the changed code has corresponding tests. If you touched a function and did not touch its tests, flag it.

The agent does not approve or block -- it comments. You decide what to act on.

The architecture

[git push]
    -> [pre-push hook]
    -> [agent reviews changed files]
    -> [posts comment to PR/changelog]

The pre-push hook runs the agent locally before anything goes remote. The output is a structured report that your hook can post to GitHub, save to a file, or send to Slack.

Stage 1: Static Analysis

This stage is fast -- under 10 seconds for most codebases. If it fails, your hook fails and nothing goes remote.

#!/usr/bin/env python3
# stage1_static.py - run linters and collect issues
import subprocess
import json
import sys
from pathlib import Path

def run_linter(cmd):
    """Run a linter command and return structured issues."""
    try:
        result = subprocess.run(
            cmd, shell=True, capture_output=True, text=True, timeout=60
        )
        issues = []
        for line in result.stdout.splitlines() + result.stderr.splitlines():
            if not line.strip():
                continue
            issues.append({
                "tool": cmd.split()[0],
                "line": line,
                "severity": "error" if "error" in line.lower() else "warning"
            })
        return issues
    except subprocess.TimeoutExpired:
        return [{"tool": cmd.split()[0], "line": "Timeout", "severity": "error"}]
    except Exception as e:
        return [{"tool": cmd.split()[0], "line": str(e), "severity": "error"}]

def get_changed_files():
    """Get list of changed .py files from git diff."""
    result = subprocess.run(
        ["git", "diff", "--name-only", "--cached", "HEAD"],
        capture_output=True, text=True
    )
    return [f.strip() for f in result.stdout.splitlines() if f.endswith('.py')]

def main():
    changed = get_changed_files()
    if not changed:
        print("No changes to review")
        return

    all_issues = []
    files = " ".join(changed)

    # Run ruff (fast Python linter)
    all_issues.extend(run_linter(f"ruff check {files}"))

    # Run mypy on changed files
    all_issues.extend(run_linter(f"mypy {files}"))

    report = {
        "stage": "static_analysis",
        "files_reviewed": changed,
        "issues": all_issues,
        "summary": f"Found {len(all_issues)} issues"
    }

    print(json.dumps(report, indent=2))

    errors = [i for i in all_issues if i["severity"] == "error"]
    sys.exit(1 if errors else 0)

if __name__ == "__main__":
    main()

Stage 2: Logic Review

This is where it gets interesting. Static analysis catches style issues. Logic review catches mistakes that pass the linter but are still wrong.

#!/usr/bin/env python3
# stage2_logic.py - read changed code and flag logic issues
import subprocess
import json
import re

BUG_PATTERNS = [
    {
        "name": "mutable_default_arg",
        "pattern": r"def \w+\([^)]*=\[\]|def \w+\([^)]*=\{\}",
        "message": "Mutable default argument detected. Use None instead.",
        "severity": "high"
    },
    {
        "name": "bare_except",
        "pattern": r"except\s*:",
        "message": "Bare except clause. Catch specific exceptions.",
        "severity": "medium"
    },
    {
        "name": "empty_except_pass",
        "pattern": r"except[^:]+:\s+pass",
        "message": "Exception caught and silently ignored.",
        "severity": "medium"
    },
    {
        "name": "comparison_to_none",
        "pattern": r"if\s+\w+\s+==\s+True|if\s+\w+\s+!=\s+False",
        "message": "Use 'if x:' instead of 'if x == True:'",
        "severity": "low"
    },
    {
        "name": "unchecked_cast",
        "pattern": r"\bint\([^)]*\)|\bstr\([^)]*\)|\bfloat\([^)]*\)",
        "message": "Type cast without validation. May raise on bad input.",
        "severity": "medium"
    },
    {
        "name": "todo_comment",
        "pattern": r"#\s*(TODO|FIXME|HACK|XXX):",
        "message": "Unresolved TODO/FIXME comment found.",
        "severity": "low"
    },
]

def get_changed_diff():
    result = subprocess.run(
        ["git", "diff", "--cached"],
        capture_output=True, text=True
    )
    return result.stdout

def analyze_diff(diff_text):
    issues = []
    lines = diff_text.splitlines()

    for pattern_def in BUG_PATTERNS:
        for i, line in enumerate(lines):
            if re.search(pattern_def["pattern"], line):
                context_start = max(0, i - 3)
                context = "\n".join(lines[context_start:i+2])
                issues.append({
                    "pattern": pattern_def["name"],
                    "message": pattern_def["message"],
                    "severity": pattern_def["severity"],
                    "context": context[:200]
                })
    return issues

def main():
    diff = get_changed_diff()
    if not diff:
        print("No changes to review")
        return

    issues = analyze_diff(diff)
    report = {
        "stage": "logic_review",
        "issues": issues,
        "summary": f"Found {len(issues)} potential logic issues"
    }
    print(json.dumps(report, indent=2))

if __name__ == "__main__":
    main()

The bug patterns are a starting point. The real value comes from tuning them to your codebase -- adding checks for your specific anti-patterns, your team's conventions, the mistakes you keep making.

Stage 3: Test Coverage Check

#!/usr/bin/env python3
# stage3_tests.py - verify changed code has test coverage
import subprocess
import json
from pathlib import Path

def get_changed_files():
    result = subprocess.run(
        ["git", "diff", "--name-only", "--cached", "HEAD"],
        capture_output=True, text=True
    )
    return [
        f.strip() for f in result.stdout.splitlines()
        if f.endswith('.py') and not f.startswith('tests/') and not f.endswith('_test.py')
    ]

def find_test_file(source_file):
    filename = Path(source_file).stem
    possible_tests = [
        f"tests/test_{filename}.py",
        f"tests/{filename}_test.py",
        f"test_{filename}.py",
    ]
    for test_path in possible_tests:
        if Path(test_path).exists():
            return test_path
    return None

def check_coverage(source_file):
    test_file = find_test_file(source_file)
    if not test_file:
        return {
            "file": source_file,
            "has_test": False,
            "test_file": None,
            "message": "No corresponding test file found"
        }

    test_content = Path(test_file).read_text()
    module_name = source_file.replace('/', '.').replace('.py', '')
    imports_module = module_name in test_content

    return {
        "file": source_file,
        "has_test": True,
        "test_file": test_file,
        "imports_module": imports_module,
        "message": "Test exists and imports module" if imports_module else "Test file exists but does not import module"
    }

def main():
    changed = get_changed_files()
    if not changed:
        print("No source files changed")
        return

    results = [check_coverage(f) for f in changed]
    missing = [r for r in results if not r["has_test"] or not r["imports_module"]]

    report = {
        "stage": "test_coverage",
        "files_checked": changed,
        "results": results,
        "summary": f"{len(changed) - len(missing)}/{len(changed)} files have test coverage"
    }
    print(json.dumps(report, indent=2))

if __name__ == "__main__":
    main()

The pre-push hook

#!/bin/bash
# .git/hooks/pre-push

echo "=== Running agent code review ==="

echo "Stage 1: Static analysis..."
python3 scripts/stage1_static.py
STATIC_RESULT=$?
if [ $STATIC_RESULT -ne 0 ]; then
    echo "Static analysis found issues. Push blocked."
    exit 1
fi

echo "Stage 2: Logic review..."
python3 scripts/stage2_logic.py > /tmp/logic_report.json
LOGIC_ISSUES=$(python3 -c "import json; d=json.load(open('/tmp/logic_report.json')); print(d['summary'])")
echo "  $LOGIC_ISSUES"

echo "Stage 3: Test coverage check..."
python3 scripts/stage3_tests.py > /tmp/coverage_report.json
COVERAGE=$(python3 -c "import json; d=json.load(open('/tmp/coverage_report.json')); print(d['summary'])")
echo "  $COVERAGE"

echo "=== Review complete ==="
exit 0

Make it executable:

chmod +x .git/hooks/pre-push

What you actually get

When you push, you get a structured report in your terminal and saved reports in /tmp/. For a team, extend this to post the report as a PR comment or a Slack message.

The key insight: the agent does not try to be smart about the code -- it applies consistent rules that your team decides on. That is what catches the midnight bugs. Not intelligence, just discipline.

Extending it

The pattern -- three stages, structured output, human in the loop for decisions -- works for almost any code quality check:

Add a security scanning stage (Bandit, Semgrep rules)
Add a dependency check (are you pulling in new transitive deps?)
Add a performance stage (is this loop going to be a problem at scale?)

The architecture stays the same. You just add more stages.

This setup runs on my own codebase. More agent patterns at https://thesolai.github.io

DEV Community