manja316

Posted on Apr 6

Build Your Own Code Security Scanner in 30 Minutes (Python + Semgrep + Claude)

#ai #python #security #tutorial

Build Your Own Code Security Scanner in 30 Minutes (Python + Semgrep + Claude)

Most developers don't scan their code for vulnerabilities until it's too late. The tools exist — Semgrep, Bandit, CodeQL — but setting them up, writing custom rules, and interpreting results takes hours.

I built a security scanner that chains these tools together with AI-powered triage. It finds real vulnerabilities in real codebases. Here's exactly how to build one yourself.

What We're Building

A Python script that:

Clones any GitHub repo
Runs Semgrep with security-focused rulesets
Uses Claude to triage findings (filtering false positives)
Outputs a ranked vulnerability report

I've used this workflow to find real vulnerabilities in open-source projects — including path traversal bugs, SSRF via user-controlled URLs, and command injection through unsanitized inputs.

Prerequisites

pip install semgrep anthropic

You'll need a Claude API key and Semgrep installed locally.

Step 1: The Scanner Core

import subprocess
import json
import os
from pathlib import Path

def run_semgrep(repo_path: str, ruleset: str = "p/security-audit") -> list:
    """Run Semgrep against a repository and return findings."""
    result = subprocess.run(
        ["semgrep", "--config", ruleset, "--json", repo_path],
        capture_output=True, text=True
    )

    if result.returncode not in (0, 1):  # 1 = findings exist
        raise RuntimeError(f"Semgrep failed: {result.stderr}")

    data = json.loads(result.stdout)
    return data.get("results", [])

This gives you raw Semgrep output — every potential vulnerability with file path, line number, and rule ID.

The problem? Semgrep is noisy. On a typical 10K-line codebase, you'll get 50-200 findings. Most are low-severity or false positives. That's where AI triage comes in.

Step 2: AI-Powered Triage

from anthropic import Anthropic

client = Anthropic()

def triage_finding(finding: dict, source_context: str) -> dict:
    """Use Claude to assess severity and exploitability."""
    prompt = f"""Analyze this security finding:

Rule: {finding['check_id']}
File: {finding['path']}:{finding['start']['line']}
Code: {finding['extra']['lines']}

Surrounding context:
{source_context}

Assess:
1. Is this a true positive or false positive? Why?
2. Severity (critical/high/medium/low)
3. Is it exploitable? What would an attacker need?
4. Suggested fix (one-liner)

Be concise. No disclaimers."""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )

    return {
        "finding": finding,
        "triage": response.content[0].text
    }

The key insight: giving Claude the surrounding source context (not just the flagged line) dramatically improves triage accuracy. A subprocess.run() call isn't dangerous if the arguments are hardcoded constants. It IS dangerous if they come from user input three functions up the call stack.

Step 3: Context Extraction

def get_source_context(file_path: str, line: int, window: int = 15) -> str:
    """Extract source code around the finding for better triage."""
    try:
        with open(file_path) as f:
            lines = f.readlines()

        start = max(0, line - window)
        end = min(len(lines), line + window)

        return "".join(
            f"{'>>> ' if i == line - 1 else '    '}{i+1}: {l}"
            for i, l in enumerate(lines[start:end], start=start)
        )
    except FileNotFoundError:
        return "(file not found)"

Step 4: Putting It Together

def scan_repo(repo_url: str) -> list:
    """Full scan pipeline: clone, scan, triage, rank."""
    # Clone
    repo_name = repo_url.rstrip("/").split("/")[-1]
    repo_path = f"/tmp/scan-{repo_name}"

    if not os.path.exists(repo_path):
        subprocess.run(["git", "clone", "--depth", "1", repo_url, repo_path], check=True)

    # Scan with multiple rulesets
    findings = []
    for ruleset in ["p/security-audit", "p/owasp-top-ten", "p/python"]:
        findings.extend(run_semgrep(repo_path, ruleset))

    # Deduplicate by file+line
    seen = set()
    unique = []
    for f in findings:
        key = (f["path"], f["start"]["line"])
        if key not in seen:
            seen.add(key)
            unique.append(f)

    # Triage top findings (limit API calls)
    triaged = []
    for finding in unique[:30]:  # Triage top 30
        full_path = os.path.join(repo_path, finding["path"])
        context = get_source_context(full_path, finding["start"]["line"])
        triaged.append(triage_finding(finding, context))

    # Sort by severity
    severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
    triaged.sort(key=lambda x: severity_order.get(
        next((s for s in severity_order if s in x["triage"].lower()), "low"), 3
    ))

    return triaged

Step 5: Generate the Report

def print_report(results: list):
    """Print a clean vulnerability report."""
    print(f"\n{'='*60}")
    print(f"SECURITY SCAN REPORT — {len(results)} findings triaged")
    print(f"{'='*60}\n")

    for i, r in enumerate(results, 1):
        f = r["finding"]
        print(f"[{i}] {f['check_id']}")
        print(f"    File: {f['path']}:{f['start']['line']}")
        print(f"    Code: {f['extra']['lines'].strip()}")
        print(f"\n    AI Triage:")
        for line in r["triage"].split("\n"):
            print(f"    {line}")
        print(f"\n{'-'*60}\n")

# Run it
results = scan_repo("https://github.com/some-org/some-repo")
print_report(results)

Real Results: What This Finds

I've run this against dozens of open-source repos. Common findings that hold up after triage:

Path traversal in file-serving endpoints where os.path.join() doesn't prevent ../ — more common than you'd think in Python web apps
SSRF where URL parameters get passed directly to requests.get() without allowlist validation
SQL injection in ORMs used with raw queries (SQLAlchemy's text() with f-strings)
Command injection via subprocess.run(f"cmd {user_input}", shell=True)

The AI triage step typically filters out 60-70% of Semgrep findings as false positives, leaving you with a focused list of actually-exploitable issues.

Scaling This: Claude Code Skills

The manual version above works, but I eventually packaged this into a reusable Claude Code skill. Instead of running a Python script, I type /scan https://github.com/target/repo and get a full triaged report in my terminal.

If you want a pre-built version with more rulesets, better reporting, and automatic fix suggestions, I packaged my production scanner as a Claude Code Security Scanner Skill — it includes custom Semgrep rules I've tuned over months of bounty hunting.

For teams that need API-level monitoring alongside code scanning, the API Connector Skill chains with this to test live endpoints for the same vulnerability classes.

What Makes This Better Than Running Semgrep Alone

Triage cuts noise by 60-70% — you only review real issues
Context-aware — the AI reads surrounding code, not just the flagged line
Actionable — each finding comes with a fix suggestion
Fast — scanning a 20K-line repo takes ~2 minutes including triage

Next Steps

Add more rulesets: p/jwt, p/secrets, p/docker for broader coverage
Integrate with CI: run on every PR via GitHub Actions
Track findings over time: pipe results to a JSON file and diff between scans
Target bounty programs: sort repos by bounty size, scan systematically

The code above is fully functional. Clone a repo, scan it, triage the results. You'll find real vulnerabilities in real projects — I have.

Build developer tools faster with Claude Code skills. Check out the Security Scanner, Dashboard Builder, and API Connector on Gumroad.

DEV Community

Build Your Own Code Security Scanner in 30 Minutes (Python + Semgrep + Claude)

Build Your Own Code Security Scanner in 30 Minutes (Python + Semgrep + Claude)

What We're Building

Prerequisites

Step 1: The Scanner Core

Step 2: AI-Powered Triage

Step 3: Context Extraction

Step 4: Putting It Together

Step 5: Generate the Report

Real Results: What This Finds

Scaling This: Claude Code Skills

What Makes This Better Than Running Semgrep Alone

Next Steps

Top comments (0)