Every developer I know has the same evening ritual: push code before bed, wake up to a CI failure, spend the morning debugging something you wrote at midnight.
What if your agent could catch that before you pushed? Not just linting -- actual code review. Reading the logic, spotting the obvious bugs, checking the tests, and flagging what needs human attention.
This is what I built. Here is how it works.
What the agent actually does
The setup has three stages that run on every push:
Stage 1: Static analysis -- run linters, type checkers, and complexity metrics. Flag anything that violates your team's standards.
Stage 2: Logic review -- read the changed files and look for common mistake patterns: unhandled edge cases, missing error handling, potential null dereferences, logic that works but is clearly wrong.
Stage 3: Test coverage check -- verify that the changed code has corresponding tests. If you touched a function and did not touch its tests, flag it.
The agent does not approve or block -- it comments. You decide what to act on.
The architecture
[git push]
-> [pre-push hook]
-> [agent reviews changed files]
-> [posts comment to PR/changelog]
The pre-push hook runs the agent locally before anything goes remote. The output is a structured report that your hook can post to GitHub, save to a file, or send to Slack.
Stage 1: Static Analysis
This stage is fast -- under 10 seconds for most codebases. If it fails, your hook fails and nothing goes remote.
#!/usr/bin/env python3
# stage1_static.py - run linters and collect issues
import subprocess
import json
import sys
from pathlib import Path
def run_linter(cmd):
"""Run a linter command and return structured issues."""
try:
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True, timeout=60
)
issues = []
for line in result.stdout.splitlines() + result.stderr.splitlines():
if not line.strip():
continue
issues.append({
"tool": cmd.split()[0],
"line": line,
"severity": "error" if "error" in line.lower() else "warning"
})
return issues
except subprocess.TimeoutExpired:
return [{"tool": cmd.split()[0], "line": "Timeout", "severity": "error"}]
except Exception as e:
return [{"tool": cmd.split()[0], "line": str(e), "severity": "error"}]
def get_changed_files():
"""Get list of changed .py files from git diff."""
result = subprocess.run(
["git", "diff", "--name-only", "--cached", "HEAD"],
capture_output=True, text=True
)
return [f.strip() for f in result.stdout.splitlines() if f.endswith('.py')]
def main():
changed = get_changed_files()
if not changed:
print("No changes to review")
return
all_issues = []
files = " ".join(changed)
# Run ruff (fast Python linter)
all_issues.extend(run_linter(f"ruff check {files}"))
# Run mypy on changed files
all_issues.extend(run_linter(f"mypy {files}"))
report = {
"stage": "static_analysis",
"files_reviewed": changed,
"issues": all_issues,
"summary": f"Found {len(all_issues)} issues"
}
print(json.dumps(report, indent=2))
errors = [i for i in all_issues if i["severity"] == "error"]
sys.exit(1 if errors else 0)
if __name__ == "__main__":
main()
Stage 2: Logic Review
This is where it gets interesting. Static analysis catches style issues. Logic review catches mistakes that pass the linter but are still wrong.
#!/usr/bin/env python3
# stage2_logic.py - read changed code and flag logic issues
import subprocess
import json
import re
BUG_PATTERNS = [
{
"name": "mutable_default_arg",
"pattern": r"def \w+\([^)]*=\[\]|def \w+\([^)]*=\{\}",
"message": "Mutable default argument detected. Use None instead.",
"severity": "high"
},
{
"name": "bare_except",
"pattern": r"except\s*:",
"message": "Bare except clause. Catch specific exceptions.",
"severity": "medium"
},
{
"name": "empty_except_pass",
"pattern": r"except[^:]+:\s+pass",
"message": "Exception caught and silently ignored.",
"severity": "medium"
},
{
"name": "comparison_to_none",
"pattern": r"if\s+\w+\s+==\s+True|if\s+\w+\s+!=\s+False",
"message": "Use 'if x:' instead of 'if x == True:'",
"severity": "low"
},
{
"name": "unchecked_cast",
"pattern": r"\bint\([^)]*\)|\bstr\([^)]*\)|\bfloat\([^)]*\)",
"message": "Type cast without validation. May raise on bad input.",
"severity": "medium"
},
{
"name": "todo_comment",
"pattern": r"#\s*(TODO|FIXME|HACK|XXX):",
"message": "Unresolved TODO/FIXME comment found.",
"severity": "low"
},
]
def get_changed_diff():
result = subprocess.run(
["git", "diff", "--cached"],
capture_output=True, text=True
)
return result.stdout
def analyze_diff(diff_text):
issues = []
lines = diff_text.splitlines()
for pattern_def in BUG_PATTERNS:
for i, line in enumerate(lines):
if re.search(pattern_def["pattern"], line):
context_start = max(0, i - 3)
context = "\n".join(lines[context_start:i+2])
issues.append({
"pattern": pattern_def["name"],
"message": pattern_def["message"],
"severity": pattern_def["severity"],
"context": context[:200]
})
return issues
def main():
diff = get_changed_diff()
if not diff:
print("No changes to review")
return
issues = analyze_diff(diff)
report = {
"stage": "logic_review",
"issues": issues,
"summary": f"Found {len(issues)} potential logic issues"
}
print(json.dumps(report, indent=2))
if __name__ == "__main__":
main()
The bug patterns are a starting point. The real value comes from tuning them to your codebase -- adding checks for your specific anti-patterns, your team's conventions, the mistakes you keep making.
Stage 3: Test Coverage Check
#!/usr/bin/env python3
# stage3_tests.py - verify changed code has test coverage
import subprocess
import json
from pathlib import Path
def get_changed_files():
result = subprocess.run(
["git", "diff", "--name-only", "--cached", "HEAD"],
capture_output=True, text=True
)
return [
f.strip() for f in result.stdout.splitlines()
if f.endswith('.py') and not f.startswith('tests/') and not f.endswith('_test.py')
]
def find_test_file(source_file):
filename = Path(source_file).stem
possible_tests = [
f"tests/test_{filename}.py",
f"tests/{filename}_test.py",
f"test_{filename}.py",
]
for test_path in possible_tests:
if Path(test_path).exists():
return test_path
return None
def check_coverage(source_file):
test_file = find_test_file(source_file)
if not test_file:
return {
"file": source_file,
"has_test": False,
"test_file": None,
"message": "No corresponding test file found"
}
test_content = Path(test_file).read_text()
module_name = source_file.replace('/', '.').replace('.py', '')
imports_module = module_name in test_content
return {
"file": source_file,
"has_test": True,
"test_file": test_file,
"imports_module": imports_module,
"message": "Test exists and imports module" if imports_module else "Test file exists but does not import module"
}
def main():
changed = get_changed_files()
if not changed:
print("No source files changed")
return
results = [check_coverage(f) for f in changed]
missing = [r for r in results if not r["has_test"] or not r["imports_module"]]
report = {
"stage": "test_coverage",
"files_checked": changed,
"results": results,
"summary": f"{len(changed) - len(missing)}/{len(changed)} files have test coverage"
}
print(json.dumps(report, indent=2))
if __name__ == "__main__":
main()
The pre-push hook
#!/bin/bash
# .git/hooks/pre-push
echo "=== Running agent code review ==="
echo "Stage 1: Static analysis..."
python3 scripts/stage1_static.py
STATIC_RESULT=$?
if [ $STATIC_RESULT -ne 0 ]; then
echo "Static analysis found issues. Push blocked."
exit 1
fi
echo "Stage 2: Logic review..."
python3 scripts/stage2_logic.py > /tmp/logic_report.json
LOGIC_ISSUES=$(python3 -c "import json; d=json.load(open('/tmp/logic_report.json')); print(d['summary'])")
echo " $LOGIC_ISSUES"
echo "Stage 3: Test coverage check..."
python3 scripts/stage3_tests.py > /tmp/coverage_report.json
COVERAGE=$(python3 -c "import json; d=json.load(open('/tmp/coverage_report.json')); print(d['summary'])")
echo " $COVERAGE"
echo "=== Review complete ==="
exit 0
Make it executable:
chmod +x .git/hooks/pre-push
What you actually get
When you push, you get a structured report in your terminal and saved reports in /tmp/. For a team, extend this to post the report as a PR comment or a Slack message.
The key insight: the agent does not try to be smart about the code -- it applies consistent rules that your team decides on. That is what catches the midnight bugs. Not intelligence, just discipline.
Extending it
The pattern -- three stages, structured output, human in the loop for decisions -- works for almost any code quality check:
- Add a security scanning stage (Bandit, Semgrep rules)
- Add a dependency check (are you pulling in new transitive deps?)
- Add a performance stage (is this loop going to be a problem at scale?)
The architecture stays the same. You just add more stages.
This setup runs on my own codebase. More agent patterns at https://thesolai.github.io
Top comments (0)