Build Your Own Code Security Scanner in 30 Minutes (Python + Semgrep + Claude)
Most developers don't scan their code for vulnerabilities until it's too late. The tools exist — Semgrep, Bandit, CodeQL — but setting them up, writing custom rules, and interpreting results takes hours.
I built a security scanner that chains these tools together with AI-powered triage. It finds real vulnerabilities in real codebases. Here's exactly how to build one yourself.
What We're Building
A Python script that:
- Clones any GitHub repo
- Runs Semgrep with security-focused rulesets
- Uses Claude to triage findings (filtering false positives)
- Outputs a ranked vulnerability report
I've used this workflow to find real vulnerabilities in open-source projects — including path traversal bugs, SSRF via user-controlled URLs, and command injection through unsanitized inputs.
Prerequisites
pip install semgrep anthropic
You'll need a Claude API key and Semgrep installed locally.
Step 1: The Scanner Core
import subprocess
import json
import os
from pathlib import Path
def run_semgrep(repo_path: str, ruleset: str = "p/security-audit") -> list:
"""Run Semgrep against a repository and return findings."""
result = subprocess.run(
["semgrep", "--config", ruleset, "--json", repo_path],
capture_output=True, text=True
)
if result.returncode not in (0, 1): # 1 = findings exist
raise RuntimeError(f"Semgrep failed: {result.stderr}")
data = json.loads(result.stdout)
return data.get("results", [])
This gives you raw Semgrep output — every potential vulnerability with file path, line number, and rule ID.
The problem? Semgrep is noisy. On a typical 10K-line codebase, you'll get 50-200 findings. Most are low-severity or false positives. That's where AI triage comes in.
Step 2: AI-Powered Triage
from anthropic import Anthropic
client = Anthropic()
def triage_finding(finding: dict, source_context: str) -> dict:
"""Use Claude to assess severity and exploitability."""
prompt = f"""Analyze this security finding:
Rule: {finding['check_id']}
File: {finding['path']}:{finding['start']['line']}
Code: {finding['extra']['lines']}
Surrounding context:
{source_context}
Assess:
1. Is this a true positive or false positive? Why?
2. Severity (critical/high/medium/low)
3. Is it exploitable? What would an attacker need?
4. Suggested fix (one-liner)
Be concise. No disclaimers."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return {
"finding": finding,
"triage": response.content[0].text
}
The key insight: giving Claude the surrounding source context (not just the flagged line) dramatically improves triage accuracy. A subprocess.run() call isn't dangerous if the arguments are hardcoded constants. It IS dangerous if they come from user input three functions up the call stack.
Step 3: Context Extraction
def get_source_context(file_path: str, line: int, window: int = 15) -> str:
"""Extract source code around the finding for better triage."""
try:
with open(file_path) as f:
lines = f.readlines()
start = max(0, line - window)
end = min(len(lines), line + window)
return "".join(
f"{'>>> ' if i == line - 1 else ' '}{i+1}: {l}"
for i, l in enumerate(lines[start:end], start=start)
)
except FileNotFoundError:
return "(file not found)"
Step 4: Putting It Together
def scan_repo(repo_url: str) -> list:
"""Full scan pipeline: clone, scan, triage, rank."""
# Clone
repo_name = repo_url.rstrip("/").split("/")[-1]
repo_path = f"/tmp/scan-{repo_name}"
if not os.path.exists(repo_path):
subprocess.run(["git", "clone", "--depth", "1", repo_url, repo_path], check=True)
# Scan with multiple rulesets
findings = []
for ruleset in ["p/security-audit", "p/owasp-top-ten", "p/python"]:
findings.extend(run_semgrep(repo_path, ruleset))
# Deduplicate by file+line
seen = set()
unique = []
for f in findings:
key = (f["path"], f["start"]["line"])
if key not in seen:
seen.add(key)
unique.append(f)
# Triage top findings (limit API calls)
triaged = []
for finding in unique[:30]: # Triage top 30
full_path = os.path.join(repo_path, finding["path"])
context = get_source_context(full_path, finding["start"]["line"])
triaged.append(triage_finding(finding, context))
# Sort by severity
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
triaged.sort(key=lambda x: severity_order.get(
next((s for s in severity_order if s in x["triage"].lower()), "low"), 3
))
return triaged
Step 5: Generate the Report
def print_report(results: list):
"""Print a clean vulnerability report."""
print(f"\n{'='*60}")
print(f"SECURITY SCAN REPORT — {len(results)} findings triaged")
print(f"{'='*60}\n")
for i, r in enumerate(results, 1):
f = r["finding"]
print(f"[{i}] {f['check_id']}")
print(f" File: {f['path']}:{f['start']['line']}")
print(f" Code: {f['extra']['lines'].strip()}")
print(f"\n AI Triage:")
for line in r["triage"].split("\n"):
print(f" {line}")
print(f"\n{'-'*60}\n")
# Run it
results = scan_repo("https://github.com/some-org/some-repo")
print_report(results)
Real Results: What This Finds
I've run this against dozens of open-source repos. Common findings that hold up after triage:
-
Path traversal in file-serving endpoints where
os.path.join()doesn't prevent../— more common than you'd think in Python web apps -
SSRF where URL parameters get passed directly to
requests.get()without allowlist validation -
SQL injection in ORMs used with raw queries (SQLAlchemy's
text()with f-strings) -
Command injection via
subprocess.run(f"cmd {user_input}", shell=True)
The AI triage step typically filters out 60-70% of Semgrep findings as false positives, leaving you with a focused list of actually-exploitable issues.
Scaling This: Claude Code Skills
The manual version above works, but I eventually packaged this into a reusable Claude Code skill. Instead of running a Python script, I type /scan https://github.com/target/repo and get a full triaged report in my terminal.
If you want a pre-built version with more rulesets, better reporting, and automatic fix suggestions, I packaged my production scanner as a Claude Code Security Scanner Skill — it includes custom Semgrep rules I've tuned over months of bounty hunting.
For teams that need API-level monitoring alongside code scanning, the API Connector Skill chains with this to test live endpoints for the same vulnerability classes.
What Makes This Better Than Running Semgrep Alone
- Triage cuts noise by 60-70% — you only review real issues
- Context-aware — the AI reads surrounding code, not just the flagged line
- Actionable — each finding comes with a fix suggestion
- Fast — scanning a 20K-line repo takes ~2 minutes including triage
Next Steps
- Add more rulesets:
p/jwt,p/secrets,p/dockerfor broader coverage - Integrate with CI: run on every PR via GitHub Actions
- Track findings over time: pipe results to a JSON file and diff between scans
- Target bounty programs: sort repos by bounty size, scan systematically
The code above is fully functional. Clone a repo, scan it, triage the results. You'll find real vulnerabilities in real projects — I have.
Build developer tools faster with Claude Code skills. Check out the Security Scanner, Dashboard Builder, and API Connector on Gumroad.
Top comments (0)