<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hamza Miladin</title>
    <description>The latest articles on DEV Community by Hamza Miladin (@hamza_miladin).</description>
    <link>https://dev.to/hamza_miladin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3865366%2F82b0eb86-1a4e-4f8d-bc24-0dc72eb6ca82.jpg</url>
      <title>DEV Community: Hamza Miladin</title>
      <link>https://dev.to/hamza_miladin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hamza_miladin"/>
    <language>en</language>
    <item>
      <title>Why I built attack-chain correlation on top of Semgrep and Joern</title>
      <dc:creator>Hamza Miladin</dc:creator>
      <pubDate>Tue, 07 Apr 2026 09:29:45 +0000</pubDate>
      <link>https://dev.to/hamza_miladin/why-i-built-attack-chain-correlation-on-top-of-semgrep-and-joern-1gcd</link>
      <guid>https://dev.to/hamza_miladin/why-i-built-attack-chain-correlation-on-top-of-semgrep-and-joern-1gcd</guid>
      <description>&lt;p&gt;I've been running security scans on codebases for a while, and the thing that always bothered me about Semgrep wasn't the false positive rate or the speed. It was that the output was useless in the wrong way.&lt;/p&gt;

&lt;p&gt;You'd get a list. Line 42, SQL injection. Line 187, hardcoded secret. Line 304, missing auth check. Fifty findings, no story. Nothing that says "here's how an attacker actually gets from the front door to the database." Just a queue of problems with no context for how bad any of them actually are.&lt;/p&gt;

&lt;p&gt;So I built Vulnchain to fix that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Semgrep is good at
&lt;/h2&gt;

&lt;p&gt;Semgrep is a pattern matcher. Fast, accurate within a file, easy to write rules for. If you want to catch mysql_query($_GET['id']) across 50 PHP files, it does that in seconds.&lt;/p&gt;

&lt;p&gt;The problem is it stops at function boundaries. Take this from DVWA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="n"&gt;php&lt;/span&gt;&lt;span class="c1"&gt;// login.php&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$pass&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$pass&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Semgrep stops here&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;db_execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// db_helpers.php&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;buildQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM users WHERE user='&lt;/span&gt;&lt;span class="nv"&gt;$u&lt;/span&gt;&lt;span class="s2"&gt;' AND password='&lt;/span&gt;&lt;span class="nv"&gt;$p&lt;/span&gt;&lt;span class="s2"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// actual sink&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Semgrep sees buildQuery() and moves on. It doesn't follow the call. The SQLi goes undetected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Joern goes further
&lt;/h2&gt;

&lt;p&gt;Joern builds a Code Property Graph — AST, control flow graph, and data flow graph combined into one structure. &lt;br&gt;
When I ran Vulnchain against DVWA, the Joern pass found &lt;strong&gt;11 findings&lt;/strong&gt; that didn't overlap at all with Semgrep's 63. All inter-procedural. All things Semgrep couldn't see.&lt;/p&gt;

&lt;p&gt;The taint script for SQLi:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;source&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;cpg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;parameter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".*user.*|.*pass.*|.*input.*"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sink&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;cpg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;call&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mysql_query|pg_query|sqlite_exec"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;sink&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;reachableByFlows&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;reachableByFlows traverses the whole call graph. Doesn't matter how many hops the data takes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running it on DVWA
&lt;/h2&gt;

&lt;p&gt;Here's what the pipeline logged:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[run_semgrep] Semgrep found 63 findings&lt;br&gt;
[run_joern]   CPG built at /tmp/joern-ws (1,549,296 bytes)&lt;br&gt;
[run_joern]   Joern found 11 findings across 13 scripts&lt;br&gt;
[llm_code_review] 13 LLM findings across 6 files&lt;br&gt;
[synthesize_attack_chains] invoking LLM&lt;br&gt;
179 files, 4.5 minutes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The attack chain that came out the other end:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SQL Injection → Credential Dump → Admin Takeover (CVSS 9.1)&lt;br&gt;
Inject into the login form, pull the users table via UNION-based SQLi. DVWA stores passwords as unsalted MD5 — crack them offline. Log in as admin. Full database read/write, session hijack, potential RCE via file write.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That's the output I wanted. Not "SQLi on line 42." Three steps, one chain, obvious business impact.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why correlation matters
&lt;/h2&gt;

&lt;p&gt;A security team staring at 87 findings will prioritize wrong half the time. The SQLi on line 42 sounds bad in isolation. &lt;br&gt;
It sounds a lot worse when it's the first move in a chain that ends at admin access. The chain does the triage work — you don't have to manually trace through the codebase to figure out what's actually dangerous.&lt;/p&gt;

&lt;p&gt;I wanted a scanner that reasons about the code the way an attacker would, not one that just flags lines.&lt;/p&gt;

&lt;p&gt;Self-hosted, MIT licensed, needs nothing except an Anthropic API key and Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/api/scans &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"repo_url": "https://github.com/digininja/DVWA"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;→ &lt;a href="https://github.com/hamzamiladin/Vulnchain" rel="noopener noreferrer"&gt;github.com/hamzamiladin/Vulnchain&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>appsec</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
