Why I built attack-chain correlation on top of Semgrep and Joern

#security #ai #appsec #opensource

I've been running security scans on codebases for a while, and the thing that always bothered me about Semgrep wasn't the false positive rate or the speed. It was that the output was useless in the wrong way.

You'd get a list. Line 42, SQL injection. Line 187, hardcoded secret. Line 304, missing auth check. Fifty findings, no story. Nothing that says "here's how an attacker actually gets from the front door to the database." Just a queue of problems with no context for how bad any of them actually are.

So I built Vulnchain to fix that.

What Semgrep is good at

Semgrep is a pattern matcher. Fast, accurate within a file, easy to write rules for. If you want to catch mysql_query($_GET['id']) across 50 PHP files, it does that in seconds.

The problem is it stops at function boundaries. Take this from DVWA:

php// login.php
function login($user, $pass) {
    $query = buildQuery($user, $pass);  // Semgrep stops here
    return db_execute($query);
}
// db_helpers.php
function buildQuery($u, $p) {
    return "SELECT * FROM users WHERE user='$u' AND password='$p'";  // actual sink
}

Semgrep sees buildQuery() and moves on. It doesn't follow the call. The SQLi goes undetected.

Joern goes further

Joern builds a Code Property Graph — AST, control flow graph, and data flow graph combined into one structure.
When I ran Vulnchain against DVWA, the Joern pass found 11 findings that didn't overlap at all with Semgrep's 63. All inter-procedural. All things Semgrep couldn't see.

The taint script for SQLi:

def source = cpg.parameter.name(".*user.*|.*pass.*|.*input.*")
def sink = cpg.call.name("mysql_query|pg_query|sqlite_exec")
sink.reachableByFlows(source).l

reachableByFlows traverses the whole call graph. Doesn't matter how many hops the data takes.

Running it on DVWA

Here's what the pipeline logged:

[run_semgrep] Semgrep found 63 findings [run_joern] CPG built at /tmp/joern-ws (1,549,296 bytes) [run_joern] Joern found 11 findings across 13 scripts [llm_code_review] 13 LLM findings across 6 files [synthesize_attack_chains] invoking LLM 179 files, 4.5 minutes

The attack chain that came out the other end:

SQL Injection → Credential Dump → Admin Takeover (CVSS 9.1) Inject into the login form, pull the users table via UNION-based SQLi. DVWA stores passwords as unsalted MD5 — crack them offline. Log in as admin. Full database read/write, session hijack, potential RCE via file write.

That's the output I wanted. Not "SQLi on line 42." Three steps, one chain, obvious business impact.

Why correlation matters

A security team staring at 87 findings will prioritize wrong half the time. The SQLi on line 42 sounds bad in isolation.
It sounds a lot worse when it's the first move in a chain that ends at admin access. The chain does the triage work — you don't have to manually trace through the codebase to figure out what's actually dangerous.

I wanted a scanner that reasons about the code the way an attacker would, not one that just flags lines.

Self-hosted, MIT licensed, needs nothing except an Anthropic API key and Docker:

docker compose up
curl -X POST http://localhost:8080/api/scans \
  -H "Content-Type: application/json" \
  -d '{"repo_url": "https://github.com/digininja/DVWA"}'