<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patience Mpofu</title>
    <description>The latest articles on DEV Community by Patience Mpofu (@pgmpofu).</description>
    <link>https://dev.to/pgmpofu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3805080%2F73f107c0-c84d-4ef3-aa44-8c4d2dc40b03.jpeg</url>
      <title>DEV Community: Patience Mpofu</title>
      <link>https://dev.to/pgmpofu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pgmpofu"/>
    <language>en</language>
    <item>
      <title>Why I Built an ML-Powered Secrets Detector Instead of Just Using Regex</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Sun, 10 May 2026 15:40:08 +0000</pubDate>
      <link>https://dev.to/pgmpofu/why-i-built-an-ml-powered-secrets-detector-instead-of-just-using-regex-4koa</link>
      <guid>https://dev.to/pgmpofu/why-i-built-an-ml-powered-secrets-detector-instead-of-just-using-regex-4koa</guid>
      <description>&lt;p&gt;ost secrets scanners work the same way.&lt;/p&gt;

&lt;p&gt;They maintain a list of regex patterns — one for AWS access keys, one for GitHub personal access tokens, one for Stripe keys, one for JWT headers — and they scan your code looking for matches. When a pattern fires, they report a finding. When it doesn't, they stay silent.&lt;/p&gt;

&lt;p&gt;This works well for secrets that have distinctive, consistent formats. An AWS access key always starts with &lt;code&gt;AKIA&lt;/code&gt; followed by 16 uppercase alphanumeric characters. A GitHub PAT has a recognisable prefix. A private key has a PEM header. Regex catches these reliably.&lt;/p&gt;

&lt;p&gt;But it's only part of the problem. And the part it misses is exactly where real breaches happen.&lt;/p&gt;

&lt;p&gt;This is the story of why I built a machine learning secrets detector — what the existing approaches get wrong, what ML adds, and what the combined system catches that neither approach catches alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Two Failure Modes of Existing Tools
&lt;/h2&gt;

&lt;p&gt;Before building anything, I spent time understanding where the leading tools fail. TruffleHog, detect-secrets, and Gitleaks are all excellent tools. They're also all vulnerable to the same two failure modes in different proportions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Mode 1: The Regex Gap
&lt;/h3&gt;

&lt;p&gt;Regex-only scanners miss secrets that don't match a known pattern.&lt;/p&gt;

&lt;p&gt;The most dangerous class of missed secrets is the &lt;strong&gt;generic hardcoded credential&lt;/strong&gt; — a password, database URL, or internal API key that doesn't follow any publicly documented format because it was generated internally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# No regex pattern catches this reliably
&lt;/span&gt;&lt;span class="n"&gt;DB_PASSWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tr0ub4dor&amp;amp;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;INTERNAL_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-backend-service-key-2019&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SMTP_PASSWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;companyname_mail_2018!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are real secrets. They're low entropy by the standards of a cryptographically random key. They don't match any known service's key format. A regex scanner walks past them silently.&lt;/p&gt;

&lt;p&gt;This is not a theoretical concern. A significant proportion of credential exposures in real breaches involve exactly this type of secret — human-chosen passwords and internal tokens that were never designed to be detected by pattern matching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Mode 2: The Entropy False Positive Flood
&lt;/h3&gt;

&lt;p&gt;Some tools compensate by flagging anything with high Shannon entropy — the reasoning being that secrets are random, and random strings have high entropy.&lt;/p&gt;

&lt;p&gt;This is directionally correct and practically unusable in many codebases.&lt;/p&gt;

&lt;p&gt;High-entropy strings that are not secrets appear constantly in normal code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# UUID — high entropy, not a secret
&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;550e8400-e29b-41d4-a716-446655440000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# SHA-256 hash — very high entropy, not a secret
&lt;/span&gt;&lt;span class="n"&gt;expected_checksum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;d8e8fca2dc0f896fd7cb4cb0031ba249&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Base64-encoded image data — extremely high entropy, not a secret
&lt;/span&gt;&lt;span class="n"&gt;avatar_placeholder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Package integrity hash — high entropy, not a secret
&lt;/span&gt;&lt;span class="n"&gt;integrity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sha512-abc123def456...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A pure entropy scanner flags all of these. In a Node.js project with a &lt;code&gt;package-lock.json&lt;/code&gt;, an entropy scanner generates thousands of findings from integrity hashes alone. Engineers learn to ignore it within a week.&lt;/p&gt;




&lt;h2&gt;
  
  
  What ML Adds: Context-Aware Classification
&lt;/h2&gt;

&lt;p&gt;The insight that drove the ML approach is that whether a string is a secret depends on &lt;strong&gt;context&lt;/strong&gt;, not just the string itself.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;d8e8fca2dc0f896fd7cb4cb0031ba249&lt;/code&gt; is either a secret or a benign hash depending on what variable contains it. A human security engineer can tell these apart instantly by reading the surrounding code. A regex scanner and an entropy scanner cannot.&lt;/p&gt;

&lt;p&gt;The question I asked was: can I teach a classifier to do what a human engineer does — look at the full context of a string and make a judgment about whether it's a secret?&lt;/p&gt;

&lt;p&gt;The answer turned out to be yes, with a 26-dimensional feature vector that captures what a human eye actually processes when making that judgment.&lt;/p&gt;

&lt;p&gt;Here's the comparison that drove the design:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Catches High-Entropy Secrets&lt;/th&gt;
&lt;th&gt;Catches Low-Entropy Secrets&lt;/th&gt;
&lt;th&gt;False Positive Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Regex only&lt;/td&gt;
&lt;td&gt;Yes (known formats)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entropy only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML classifier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Significantly reduced&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ML classifier doesn't replace regex — it adds a second layer. Known-format secrets (AWS keys, GitHub PATs, JWTs) are still caught by pattern flags that are part of the feature vector. Generic hardcoded credentials that no regex would catch are caught by the combination of entropy, character distribution, and — most importantly — the variable name context.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Feature That Changed Everything: Key Name Risk
&lt;/h2&gt;

&lt;p&gt;When I looked at feature importances after training the initial model, one feature stood above all others: &lt;code&gt;key_name_risk&lt;/code&gt;, with an importance score of 0.28 out of 1.0.&lt;/p&gt;

&lt;p&gt;That's the variable name. Not the value — the name of the variable holding the value.&lt;/p&gt;

&lt;p&gt;This makes intuitive sense once you see it. These two lines of code contain the same string value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;checksum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;d8e8fca2dc0f896fd7cb4cb0031ba249&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;d8e8fca2dc0f896fd7cb4cb0031ba249&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A human engineer looks at these and immediately knows: the first is almost certainly a hash, the second is almost certainly a secret. The string itself carries no information about its purpose. The variable name carries everything.&lt;/p&gt;

&lt;p&gt;I built a risk scoring function that assigns numerical scores to variable names based on their semantic association with sensitive data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;password&lt;/code&gt;, &lt;code&gt;passwd&lt;/code&gt;, &lt;code&gt;secret&lt;/code&gt;, &lt;code&gt;private_key&lt;/code&gt; → score 1.0&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;api_key&lt;/code&gt;, &lt;code&gt;token&lt;/code&gt;, &lt;code&gt;credential&lt;/code&gt;, &lt;code&gt;auth&lt;/code&gt; → score 0.9
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;access_key&lt;/code&gt;, &lt;code&gt;client_secret&lt;/code&gt;, &lt;code&gt;bearer&lt;/code&gt; → score 0.85&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;config&lt;/code&gt;, &lt;code&gt;setting&lt;/code&gt;, &lt;code&gt;value&lt;/code&gt; → score 0.1&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;checksum&lt;/code&gt;, &lt;code&gt;hash&lt;/code&gt;, &lt;code&gt;version&lt;/code&gt;, &lt;code&gt;id&lt;/code&gt; → score 0.0
The classifier learns to combine this score with the entropy and character distribution features to make decisions that mirror what a human reviewer would make.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: &lt;code&gt;password = "abc123"&lt;/code&gt; gets flagged despite low entropy. &lt;code&gt;checksum = "d8e8fca2dc0f896fd7cb4cb0031ba249"&lt;/code&gt; gets passed despite high entropy. Neither outcome is achievable with regex or entropy alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Random Forest, Not a Neural Network
&lt;/h2&gt;

&lt;p&gt;When people hear "ML classifier," they often assume deep learning. I chose Random Forest deliberately, and it's worth explaining why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interpretability.&lt;/strong&gt; A Random Forest tells you exactly why it made a decision — which features contributed how much to a particular classification. When an engineer asks "why did the scanner flag this?", I can show them the feature breakdown: high entropy (0.82), key name risk (0.95), matches JWT pattern (true). A neural network produces a probability with no explanation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Size.&lt;/strong&gt; The trained model is approximately 1MB as a pickle file. It ships with the tool, requires no internet connection, and adds negligible overhead to a scan. A neural network of sufficient sophistication would be orders of magnitude larger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training speed.&lt;/strong&gt; The model trains on 6,000 labeled samples in seconds on a standard laptop CPU. No GPU required. This matters enormously for the retraining feature — teams can add their own training samples and retrain in their local environment without specialist infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No overfitting on small data.&lt;/strong&gt; With 6,000 training samples — which is small by deep learning standards — Random Forest generalises better than a neural network would. The structured feature engineering does the heavy lifting; the model itself doesn't need to be sophisticated.&lt;/p&gt;

&lt;p&gt;The tradeoff is ceiling accuracy. A neural network operating on raw token sequences would likely achieve higher peak accuracy given sufficient data. But for a tool that needs to be deployable, explainable, and retrainable by a team without ML expertise, Random Forest is the right choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Synthetic Training Data: The Ethical Constraint
&lt;/h2&gt;

&lt;p&gt;One early design decision shaped everything else: I would not train on real leaked secrets from public repositories.&lt;/p&gt;

&lt;p&gt;The alternative — scraping GitHub for accidentally committed credentials and using them as positive training examples — is technically straightforward and has been done. It's also legally and ethically problematic. Those credentials belong to real people and organisations. Even if the data is technically public, using it to train a commercial tool raises questions I didn't want to answer.&lt;/p&gt;

&lt;p&gt;Instead, I built a synthetic data generator that produces realistic examples of both secrets and benign high-entropy strings:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets (label=1):&lt;/strong&gt; Algorithmically generated AWS access keys, GitHub PAT formats, JWT structures, OpenAI key formats, Slack tokens, database connection strings, and — critically — synthetically generated "human-chosen" passwords that follow common patterns without being anyone's real password.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benign (label=0):&lt;/strong&gt; UUIDs, MD5 and SHA-256 hashes, version strings, base64-encoded image data fragments, color hex codes, package integrity hashes, lorem ipsum text fragments.&lt;/p&gt;

&lt;p&gt;The synthetic approach has one significant advantage beyond ethics: I can generate unlimited training data and precisely control the class distribution. The 6,000 sample baseline can be scaled to 50,000 samples with a single command, which meaningfully improves model accuracy on edge cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three-Layer Detection Architecture
&lt;/h2&gt;

&lt;p&gt;The final tool combines three detection mechanisms, each compensating for the others' weaknesses:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Pattern matching flags.&lt;/strong&gt; Sixteen binary features in the feature vector correspond to known secret formats (AWS, GitHub, JWT, OpenAI, Slack, database URLs, private key headers, and so on). These fire on known formats with near-zero false positives and form the backbone of high-confidence detections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Entropy and character analysis.&lt;/strong&gt; Shannon entropy, character class ratios, repetition ratio, longest run of repeated characters — these features capture the statistical "shape" of a secret without requiring a specific format match. High entropy combined with a high-risk key name is a strong signal even when no pattern matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — Key name risk scoring.&lt;/strong&gt; The variable name context that neither regex nor entropy captures. This is what allows the classifier to catch &lt;code&gt;password = "simple123"&lt;/code&gt; despite its low entropy and lack of a recognisable format.&lt;/p&gt;

&lt;p&gt;A finding is reported when the classifier's confidence exceeds a configurable threshold (default: 0.7). Findings include the confidence score, the matched pattern if any, and — for CI/CD integration — an exit code that can gate builds.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Actually Catches
&lt;/h2&gt;

&lt;p&gt;I ran the tool against a collection of test cases designed to stress each approach. Results that illustrate the gap:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caught by all approaches:&lt;/strong&gt; &lt;code&gt;AWS_KEY = "AKIAIOSFODNN7EXAMPLE"&lt;/code&gt; — known format, high entropy, high-risk key name. Every tool gets this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caught only by ML:&lt;/strong&gt; &lt;code&gt;DB_PASS = "Winter2019!"&lt;/code&gt; — low entropy, no known format, but the key name &lt;code&gt;DB_PASS&lt;/code&gt; scores 1.0 and the classifier flags it at 89% confidence. Regex misses it. Entropy misses it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positive in entropy tools, not in ML:&lt;/strong&gt; &lt;code&gt;expected_hash = "d8e8fca2dc0f896fd7cb4cb0031ba249"&lt;/code&gt; — high entropy, but key name scores 0.0 and the ML classifier correctly passes it. A pure entropy scanner flags it; the ML classifier does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positive in regex tools, not in ML:&lt;/strong&gt; An internal test file with &lt;code&gt;TEST_TOKEN = "fake-token-for-testing"&lt;/code&gt; annotated with &lt;code&gt;# secrets-ignore&lt;/code&gt; — the suppression annotation is respected, and the low-entropy value combined with a test file context (another feature) keeps the confidence below threshold even without the annotation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Fits in a Security Programme
&lt;/h2&gt;

&lt;p&gt;A secrets detector — even an ML-powered one — is one layer of a defence-in-depth approach, not a complete solution.&lt;/p&gt;

&lt;p&gt;It catches secrets at the point of scanning. It doesn't prevent secrets from being created in the first place (that's developer education and code review). It doesn't rotate compromised credentials (that's incident response). It doesn't enforce secrets management policies (that's your secrets manager — Vault, AWS Secrets Manager, Azure Key Vault).&lt;/p&gt;

&lt;p&gt;What it does well: systematically surface secret exposure across a codebase and git history, prevent new secrets from reaching the repository via pre-commit hooks, and provide a measurable baseline for "how many secret exposures exist in our codebase right now."&lt;/p&gt;

&lt;p&gt;That baseline matters more than most teams realise — you can't improve what you can't measure.&lt;/p&gt;




&lt;p&gt;The full source, including the feature extractor, trainer, and pre-commit hook, is at &lt;a href="https://github.com/pgmpofu/secrets-detector" rel="noopener noreferrer"&gt;github.com/pgmpofu/secrets-detector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next up: a deep dive into the 26-dimensional feature vector — exactly what the model sees when it evaluates a candidate secret, and how each feature contributes to the final decision.&lt;/p&gt;

</description>
      <category>security</category>
      <category>machinelearning</category>
      <category>appsec</category>
      <category>python</category>
    </item>
    <item>
      <title>What Building a SAST Tool Taught Me About AppSec That 13 Years of Software Engineering Didn't</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Sat, 09 May 2026 23:16:23 +0000</pubDate>
      <link>https://dev.to/pgmpofu/what-building-a-sast-tool-taught-me-about-appsec-that-13-years-of-software-engineering-didnt-3n2l</link>
      <guid>https://dev.to/pgmpofu/what-building-a-sast-tool-taught-me-about-appsec-that-13-years-of-software-engineering-didnt-3n2l</guid>
      <description>&lt;p&gt;I've been writing software professionally since 2011.&lt;/p&gt;

&lt;p&gt;Java, C#, Kotlin, Node.js. Enterprise backends, microservices, APIs, data pipelines. I've shipped production code that millions of people have used without knowing it. I've led teams, reviewed architectures, mentored junior engineers, and done all the things that accumulate into what people call "senior software engineer."&lt;/p&gt;

&lt;p&gt;And yet, when I decided to transition into application security, I realised I had significant blind spots — not about how software works, but about how software &lt;em&gt;fails&lt;/em&gt;. Specifically, how it fails in ways that attackers can exploit.&lt;/p&gt;

&lt;p&gt;This is the final article in a series about building a SAST scanner from scratch, embedding it in CI/CD pipelines, writing custom detection rules, and managing false positives. But it's really about what that whole process taught me about application security as a discipline — and what I wish I'd understood earlier.&lt;/p&gt;




&lt;h2&gt;
  
  
  I Knew How to Write Secure Code. I Didn't Know Why It Was Secure.
&lt;/h2&gt;

&lt;p&gt;Here's an embarrassing admission: I've been using parameterised queries for SQL for at least a decade. I knew you were supposed to use them. I used them every time. I would have told you confidently that they prevent SQL injection.&lt;/p&gt;

&lt;p&gt;But if you'd asked me, before I started studying AppSec seriously, to explain &lt;em&gt;why&lt;/em&gt; they prevent SQL injection — the actual mechanism — I would have given you a hand-wavy answer about "the database handling it separately."&lt;/p&gt;

&lt;p&gt;Building the SQL injection detection rule forced me to get precise. I had to understand exactly what makes &lt;code&gt;"SELECT * FROM users WHERE id = " + userId&lt;/code&gt; dangerous, what makes &lt;code&gt;SELECT * FROM users WHERE id = ?&lt;/code&gt; with a bound parameter safe, and why the difference matters at the level of how the database parses and executes the statement.&lt;/p&gt;

&lt;p&gt;The answer — that parameterised queries send the query structure and the data in separate messages, so the database never attempts to parse the data as SQL syntax — is not complicated. But I didn't actually know it at that level of precision until I had to write a rule that distinguishes between the two patterns.&lt;/p&gt;

&lt;p&gt;This was a theme throughout the project. I knew the &lt;em&gt;what&lt;/em&gt; of secure coding from years of following conventions and best practices. Building detection rules forced me to learn the &lt;em&gt;why&lt;/em&gt; — the actual attack mechanics that the conventions are defending against.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Knowing the secure pattern is not the same as understanding the vulnerability. For a software engineer, the secure pattern is enough to write safe code. For an AppSec engineer, you need to understand the attack, because your job is to find it when someone else didn't write the safe pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security Is an Adversarial Discipline
&lt;/h2&gt;

&lt;p&gt;Software engineering is largely a collaborative discipline. You're building something. The goal is for it to work. Your mental model of the system is oriented around the happy path — the flow where inputs are valid, networks are reliable, and users do what you expect.&lt;/p&gt;

&lt;p&gt;AppSec is adversarial. The mental shift required is genuinely disorienting at first.&lt;/p&gt;

&lt;p&gt;When I was building the JWT algorithm none rule, I had to think like someone who wants to forge authentication tokens. Not because I want to do that, but because unless I understand exactly how the attack works — what the attacker controls, what assumptions the vulnerable code makes, what the exploit chain looks like — I can't write a rule that reliably detects it.&lt;/p&gt;

&lt;p&gt;This is the skill that 13 years of software engineering didn't develop: adversarial thinking. The question isn't "does this code do what it's supposed to do?" It's "how could someone make this code do something it's not supposed to do?"&lt;/p&gt;

&lt;p&gt;The OWASP Top 10 is, at its core, a catalogue of the assumptions developers make that attackers exploit. A03 — Injection assumes that input is data, not instructions. A07 — Authentication Failures assumes that the code correctly validates identity. A02 — Cryptographic Failures assumes that encryption means the data is protected.&lt;/p&gt;

&lt;p&gt;Every category is a place where the developer's mental model of the system diverges from what an attacker can actually do to it. Understanding OWASP deeply means understanding those divergences — not as a checklist, but as a way of thinking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; You can't find vulnerabilities you can't imagine. Developing adversarial thinking — the habit of asking "how could this go wrong for someone who wants it to go wrong" — is the most important cognitive shift in the AppSec transition.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools Are Amplifiers, Not Answers
&lt;/h2&gt;

&lt;p&gt;Before I built my own SAST tool, I used SAST tools. And I treated them roughly like a compiler warning: something fires, I look at it, I decide whether to fix it or ignore it.&lt;/p&gt;

&lt;p&gt;Building one changed how I think about what a SAST tool actually is.&lt;/p&gt;

&lt;p&gt;A SAST tool is a codified set of heuristics about what vulnerable code looks like. Those heuristics are written by humans, based on human understanding of vulnerability patterns, with human decisions about confidence levels and severity ratings. The tool doesn't know your codebase. It doesn't know your threat model. It doesn't know whether the finding it just generated is actually exploitable in your specific deployment context.&lt;/p&gt;

&lt;p&gt;This sounds like a criticism. It isn't. It's a description of a tool's appropriate role.&lt;/p&gt;

&lt;p&gt;When I run Snyk or Semgrep now, I engage with the results differently than I did before. I ask: what pattern is this rule trying to catch? Is that pattern present in my code for the reason the rule assumes? Does the vulnerability the rule targets actually apply in my context? What would an attacker need to control to exploit this?&lt;/p&gt;

&lt;p&gt;Those are AppSec questions, not DevOps questions. A DevOps mindset treats SAST output as a compliance gate. An AppSec mindset treats it as a starting point for analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; A SAST scanner is a signal generator, not an oracle. The value it provides is proportional to the quality of thinking applied to its output — not to the number of findings it generates or suppresses.&lt;/p&gt;




&lt;h2&gt;
  
  
  False Positives Taught Me About Risk Tolerance
&lt;/h2&gt;

&lt;p&gt;Every time I suppressed a finding in my own scanner, I had to make a decision: is this actually safe, and how confident am I?&lt;/p&gt;

&lt;p&gt;That turns out to be the central skill of AppSec: structured risk assessment under uncertainty.&lt;/p&gt;

&lt;p&gt;You almost never have complete information. You can't always trace every data flow through a complex system. You can't always know whether a finding is exploitable without building a proof of concept. You have to make a judgment call about whether the risk is acceptable given what you know.&lt;/p&gt;

&lt;p&gt;What I learned from managing false positives is that risk tolerance is not a feeling — it's a position that needs to be documented and defensible. "I suppressed this because it looked fine" is not a risk assessment. "I suppressed this because the data being processed is always from our internal configuration system and never from user input, as confirmed by tracing the call stack in lines 42–67" is a risk assessment.&lt;/p&gt;

&lt;p&gt;The difference matters when something goes wrong. And in security, things go wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Risk assessment is a core AppSec competency, not a soft skill. Developing a structured, documented approach to risk decisions — even informal ones — is more valuable than any specific technical knowledge.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gap Between Writing Secure Code and Finding Insecure Code
&lt;/h2&gt;

&lt;p&gt;These are related skills. They are not the same skill.&lt;/p&gt;

&lt;p&gt;Writing secure code is a constructive activity. You know what you're building. You apply secure patterns. You follow established conventions. The feedback loop is relatively tight — if you use parameterised queries, you know you're not vulnerable to SQL injection there.&lt;/p&gt;

&lt;p&gt;Finding insecure code is a forensic activity. You're examining code you didn't write, often without full context, looking for patterns that indicate vulnerability. The feedback loop is loose — you might flag something, triage it, determine it's a false positive, and never know whether your triage was correct.&lt;/p&gt;

&lt;p&gt;The cognitive skills are different. Construction requires knowing the secure pattern. Detection requires knowing the vulnerable pattern and all its variations. It requires understanding which variations are genuinely dangerous and which are contextually safe. It requires maintaining a mental model of an attacker's perspective while reading code that was written from a developer's perspective.&lt;/p&gt;

&lt;p&gt;I've spent 13 years getting good at construction. Building this scanner was the first systematic exercise I did in detection. It was harder than I expected — not technically, but cognitively. Shifting from "I'm building this thing to work" to "I'm looking for ways this thing could be exploited" is a genuine gear change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; AppSec is not "software engineering plus security knowledge." It's a different cognitive discipline that happens to use the same raw material. Senior software engineers making this transition should expect a genuine learning curve, not just a knowledge gap.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Tell Someone Starting This Transition
&lt;/h2&gt;

&lt;p&gt;If you're a software engineer moving into AppSec — or considering it — here's what I'd tell you based on this project and the broader transition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build something.&lt;/strong&gt; Reading about OWASP is useful. Reading CVE writeups is useful. Neither teaches you what building a detection rule teaches you. The act of translating "this is a vulnerability" into "this is what the vulnerable code looks like in text" forces a precision of understanding that passive learning doesn't produce.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Study the attacks, not just the defences.&lt;/strong&gt; Most of your software engineering career was spent learning defences — secure patterns, safe APIs, frameworks that handle the dangerous parts for you. AppSec requires understanding the attacks those defences are designed against. Read exploit writeups. Understand how CVEs actually work. Build your own vulnerable applications and attack them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get comfortable with ambiguity.&lt;/strong&gt; Software engineering has right answers. Does this code compile? Does this test pass? Does this function return the correct value? AppSec often doesn't. Is this finding exploitable? Is this suppression justified? Is this risk acceptable? These questions frequently don't have clean answers, and developing comfort with that ambiguity is part of the transition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use your engineering background as a superpower, not a crutch.&lt;/strong&gt; The thing that makes engineers valuable in AppSec is the ability to read code at scale, understand system architecture, and reason about data flows — skills most pure security professionals develop slowly. Use that. But don't assume that understanding how the code is supposed to work means you understand how it can be broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write about what you're learning.&lt;/strong&gt; This series started as a way to document my own thinking. Every article forced me to be more precise about something I thought I understood. The act of explaining something to someone else reveals the gaps in your own understanding faster than almost anything else.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Goes Next
&lt;/h2&gt;

&lt;p&gt;Building this scanner and writing this series was one project. The transition is ongoing.&lt;/p&gt;

&lt;p&gt;The next project is taking an old Java service and doing something I haven't done yet in this series: running Snyk against a real dependency tree on real legacy code, remediating real CVEs, and measuring the before-and-after security posture with actual metrics.&lt;/p&gt;

&lt;p&gt;That's a different kind of AppSec work — Software Composition Analysis rather than static analysis, dependency vulnerabilities rather than code vulnerabilities, Snyk's recommendations rather than my own rules. But the underlying skills are the same: understand the attack, assess the risk, make a defensible decision, measure the outcome.&lt;/p&gt;

&lt;p&gt;The transition from software engineer to AppSec engineer is not a destination. It's an ongoing process of developing adversarial thinking, structured risk assessment, and the forensic discipline of finding what's broken rather than building what works.&lt;/p&gt;

&lt;p&gt;Thirteen years in, I'm still learning. That's the right state to be in.&lt;/p&gt;




&lt;p&gt;The full SAST tool that this series was built around is at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If this series was useful to you — or if you're making a similar transition and want to compare notes — I'd genuinely like to hear from you. Find me here on dev.to or connect on LinkedIn.&lt;/p&gt;

</description>
      <category>career</category>
      <category>security</category>
      <category>webdev</category>
      <category>appsec</category>
    </item>
    <item>
      <title>False Positives in SAST — How I Built Suppression Into My Scanner and Why It Matters</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Sat, 09 May 2026 05:02:58 +0000</pubDate>
      <link>https://dev.to/pgmpofu/false-positives-in-sast-how-i-built-suppression-into-my-scanner-and-why-it-matters-48lo</link>
      <guid>https://dev.to/pgmpofu/false-positives-in-sast-how-i-built-suppression-into-my-scanner-and-why-it-matters-48lo</guid>
      <description>&lt;p&gt;There's a failure mode that kills security tooling programmes quietly, without drama, and it's not a technical failure.&lt;/p&gt;

&lt;p&gt;It's a trust failure.&lt;/p&gt;

&lt;p&gt;It goes like this: a team enables a SAST scanner. The scanner fires on 200 things. Engineers triage 40 of them and discover that 25 are false positives. They fix the 15 real findings, suppress the 25 false positives, and then face another 160 findings they haven't looked at yet. Two sprints later, nobody is triaging anymore. The scanner still runs. The reports still generate. Nobody reads them. The security programme is theatre.&lt;/p&gt;

&lt;p&gt;False positives are the mechanism by which this happens. Not because developers are lazy — because time is finite and trust is fragile. If a scanner cries wolf enough times, engineers stop listening. That's rational behaviour, not negligence.&lt;/p&gt;

&lt;p&gt;This article is about how I thought about false positives when building my SAST tool, what I built to manage them, and why the suppression system design matters as much as the detection rules themselves.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a False Positive Actually Costs
&lt;/h2&gt;

&lt;p&gt;Before getting into solutions, it's worth being precise about the cost.&lt;/p&gt;

&lt;p&gt;A false positive in a SAST scanner costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Triage time&lt;/strong&gt; — an engineer has to read the finding, understand the rule, examine the code in context, and reach a conclusion. Even for an experienced engineer, that's 5–15 minutes per finding for anything non-trivial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust capital&lt;/strong&gt; — every false positive is a small withdrawal from the trust account between the security team and the engineering team. Trust capital is finite and slow to rebuild.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attention budget&lt;/strong&gt; — the more false positives exist, the less attention real findings receive. This is the most dangerous cost. Security is fundamentally an attention allocation problem.
A scanner with a 40% false positive rate isn't 40% less useful. It's potentially useless, because the signal-to-noise ratio has collapsed to the point where engineers can't efficiently find real findings among the noise.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Three Sources of False Positives
&lt;/h2&gt;

&lt;p&gt;Not all false positives are the same. Understanding where they come from determines how to address them.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Context-Blind Pattern Matching
&lt;/h3&gt;

&lt;p&gt;This is the most common source in regex-based scanners. The pattern matches the text but doesn't understand what the code is doing.&lt;/p&gt;

&lt;p&gt;The MD5 example I've used throughout this series is the canonical case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# False positive — MD5 for file integrity, not passwords
&lt;/span&gt;&lt;span class="n"&gt;file_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_content&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# True positive — MD5 for password storage
&lt;/span&gt;&lt;span class="n"&gt;stored_password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_password&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both lines match the pattern &lt;code&gt;\bmd5\s*\(&lt;/code&gt;. Only the second is a vulnerability. A regex scanner cannot tell them apart without understanding the semantic context — what type of data is being hashed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Safe Framework Usage That Looks Dangerous
&lt;/h3&gt;

&lt;p&gt;Some frameworks make inherently dangerous operations safe through abstraction. The dangerous-looking code is actually fine because the framework handles the dangerous part.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Looks like SQL injection — it's not&lt;/span&gt;
&lt;span class="c1"&gt;// Spring Data JPA with @Query annotation handles parameterisation&lt;/span&gt;
&lt;span class="nd"&gt;@Query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT u FROM User u WHERE u.email = :email"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="nf"&gt;findByEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@Param&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"email"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A naive injection rule that flags anything resembling a SQL query with a variable near it would fire here. The JPA annotation system makes this perfectly safe — but the scanner doesn't know that.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Test and Configuration Code
&lt;/h3&gt;

&lt;p&gt;Test files are full of patterns that would be alarming in production code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_auth.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_jwt_none_algorithm_rejected&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Testing that we correctly REJECT the none algorithm
&lt;/span&gt;    &lt;span class="n"&gt;malicious_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algorithm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;malicious_token&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;  &lt;span class="c1"&gt;# Should be rejected
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test is doing exactly the right thing — verifying that the application rejects the none algorithm attack. But a scanner looking for &lt;code&gt;algorithm="none"&lt;/code&gt; will flag it as AUTHN-001 without understanding that this is a negative test case.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: The Suppression System
&lt;/h2&gt;

&lt;p&gt;My scanner supports two suppression mechanisms, each designed for different scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inline Suppression Annotations
&lt;/h3&gt;

&lt;p&gt;The simplest mechanism: a comment on the same line as the finding tells the scanner to skip it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;file_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_content&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# sast-ignore
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I support two annotation formats — &lt;code&gt;# sast-ignore&lt;/code&gt; and &lt;code&gt;# nosec&lt;/code&gt; — because &lt;code&gt;nosec&lt;/code&gt; is the Bandit convention and teams coming from Bandit shouldn't have to change their existing annotations.&lt;/p&gt;

&lt;p&gt;The scanner checks for these annotations before reporting a finding. If either is present on the matched line, the finding is suppressed silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem with silent suppression:&lt;/strong&gt; It's invisible. If every suppression silently disappears from the report, there's no way to audit whether suppressions are legitimate or whether engineers are using them to hide real findings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suppression With Justification
&lt;/h3&gt;

&lt;p&gt;The better pattern — and what I recommend teams enforce in code review — is annotating &lt;em&gt;why&lt;/em&gt; the suppression is valid:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MD5 used for file integrity checking only, not credential storage
# Tracked in SEC-REVIEW-2024-041 — confirmed non-sensitive context
&lt;/span&gt;&lt;span class="n"&gt;file_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_content&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# sast-ignore
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The annotation still suppresses the finding, but the comment creates a paper trail. When a security audit happens — and it will — every suppression has a documented rationale that a reviewer can evaluate. "We reviewed this and it's fine because X" is defensible. A bare &lt;code&gt;# sast-ignore&lt;/code&gt; with no context is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Suppression Inventory in JSON Output
&lt;/h3&gt;

&lt;p&gt;Here's a design decision I'm particularly pleased with: suppressed findings don't disappear from the JSON report. They appear in a separate &lt;code&gt;suppressed_findings&lt;/code&gt; array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"findings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CRYPTO-002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SHA-1 Usage Detected"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HIGH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/utils/crypto.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;47&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suppressed_findings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CRYPTO-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Weak Hashing — MD5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HIGH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/utils/file_integrity.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"suppression_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MD5 used for file integrity only — sast-ignore"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_findings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suppressed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"by_severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"HIGH"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The pipeline counts only active findings when deciding whether to fail&lt;/li&gt;
&lt;li&gt;The full report shows both active and suppressed findings&lt;/li&gt;
&lt;li&gt;Security reviewers can audit suppressions without looking at individual source files&lt;/li&gt;
&lt;li&gt;Trend analysis can track suppression rates over time alongside finding rates
That last point matters for measuring programme health. If your suppression count is growing faster than your finding count, something is wrong — either your rules are too noisy, or engineers are gaming the system.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Confidence Levels as Pre-Emptive Noise Reduction
&lt;/h2&gt;

&lt;p&gt;The suppression system deals with false positives after they appear. Confidence levels deal with them before.&lt;/p&gt;

&lt;p&gt;Every pattern in my rule engine declares a confidence level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pickle\.loads?\s*\('&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;     &lt;span class="c1"&gt;# Almost always a real finding&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unserialize\s*\('&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;   &lt;span class="c1"&gt;# Real finding in PHP web context, benign in CLI context&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request\.headers\.get\(["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;Origin["\']\)'&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOW&lt;/span&gt;      &lt;span class="c1"&gt;# Could be proper allowlist implementation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confidence levels serve two purposes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For engineers reading findings:&lt;/strong&gt; Confidence communicates how much manual review a finding deserves. A HIGH confidence finding deserves immediate attention. A LOW confidence finding is a prompt to look at the code and make a judgment call. Without this signal, every finding looks equally important — which means either everything gets treated as urgent (unsustainable) or everything gets triaged with the same low attention (misses real issues).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For pipeline configuration:&lt;/strong&gt; Teams can configure their build gate to fail only on findings above a confidence threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fail on HIGH severity + HIGH confidence only&lt;/span&gt;
python main.py ./src &lt;span class="nt"&gt;--fail-on&lt;/span&gt; HIGH &lt;span class="nt"&gt;--min-confidence&lt;/span&gt; HIGH

&lt;span class="c"&gt;# See everything including LOW confidence findings in audit mode&lt;/span&gt;
python main.py ./src &lt;span class="nt"&gt;--fail-on&lt;/span&gt; none &lt;span class="nt"&gt;--min-confidence&lt;/span&gt; LOW
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a more nuanced gate than severity alone. A MEDIUM severity finding with HIGH confidence (this is almost certainly real, and it's moderately serious) might warrant blocking. A HIGH severity finding with LOW confidence (this is probably bad, but it might be fine) might not. The two dimensions together give you much more precise control over your signal-to-noise ratio.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Suppression Review Process
&lt;/h2&gt;

&lt;p&gt;The suppression mechanism is only as good as the governance around it. A suppression system without a review process is just a way to silence the scanner faster.&lt;/p&gt;

&lt;p&gt;Here's the process I'd implement in a team setting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Developer identifies a finding they believe is a false positive.&lt;/strong&gt;&lt;br&gt;
They don't suppress it immediately. They raise it in the PR for discussion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — The team reviews the claim.&lt;/strong&gt;&lt;br&gt;
Is the developer's reasoning sound? Is the code actually safe in context? Does anyone have concerns? This is a two-minute conversation in most cases, not a security committee meeting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — If accepted, the suppression is added with justification.&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;# sast-ignore&lt;/code&gt; goes in with a comment explaining why. The suppression is visible in the PR diff — it can't be hidden.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — The suppression is tracked.&lt;/strong&gt;&lt;br&gt;
In the JSON report, in a suppression registry spreadsheet, or in a dedicated Notion page — wherever works for your team. What matters is that someone periodically reviews the suppression inventory and asks: are these still valid?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Periodic suppression review.&lt;/strong&gt;&lt;br&gt;
Suppressions rot. Code changes. The context that made a suppression valid six months ago may no longer apply. A quarterly review of active suppressions — not of the whole codebase, just the suppression inventory — keeps the list honest.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tuning Rules to Reduce Systemic False Positives
&lt;/h2&gt;

&lt;p&gt;When a specific rule consistently generates false positives across the codebase, the right answer isn't to suppress every instance — it's to tune the rule.&lt;/p&gt;

&lt;p&gt;The MD5 rule is a good example. Rather than flagging every &lt;code&gt;md5(&lt;/code&gt; call at HIGH confidence, I could tighten the pattern to focus on contexts that suggest credential handling:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (noisy):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\bmd5\s*\('&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (tighter):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;md5\s*\(\s*(password|passwd|pwd|secret|credential|token)'&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(password|passwd|pwd)\s*=\s*.*md5\s*\('&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\bmd5\s*\('&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOW&lt;/span&gt;   &lt;span class="c1"&gt;# Generic usage — review context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the rule distinguishes between MD5 in credential contexts (HIGH confidence, almost certainly a problem) and generic MD5 usage (LOW confidence, warrants a look but probably fine). The total finding count might be the same, but the actionable finding count — the ones that genuinely require a fix — goes up as a proportion of the total.&lt;/p&gt;

&lt;p&gt;This is the most sustainable way to reduce false positives: better rules, not more suppressions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The False Negative Trade-off
&lt;/h2&gt;

&lt;p&gt;Every time you tune a rule to reduce false positives, you risk introducing false negatives — real vulnerabilities the scanner no longer catches.&lt;/p&gt;

&lt;p&gt;This is the fundamental tension in SAST tool design. It has no clean resolution. It only has a deliberate choice.&lt;/p&gt;

&lt;p&gt;If you tighten the MD5 rule to only flag credential contexts, you'll miss the case where a developer uses a custom variable name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Now invisible to the tightened rule
&lt;/span&gt;&lt;span class="n"&gt;user_auth_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_password&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The question is: which failure mode is more expensive for your specific context?&lt;/p&gt;

&lt;p&gt;If your team is diligent about triage and the cost of a false negative (missed vulnerability) is high — financial services, healthcare, anything with regulatory consequences — keep rules broader and invest in the triage process.&lt;/p&gt;

&lt;p&gt;If your team is drowning in noise and findings aren't getting triaged at all — the scanner has already effectively failed — tighten the rules to rebuild trust, accept the trade-off, and plan to layer in additional controls elsewhere.&lt;/p&gt;

&lt;p&gt;There's no universally correct answer. There's only an honest assessment of your specific situation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Healthy Suppression Profile Looks Like
&lt;/h2&gt;

&lt;p&gt;After a few months of running the scanner with a consistent process, here's what healthy metrics look like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suppression rate below 20%.&lt;/strong&gt; If more than 1 in 5 findings is being suppressed, your rules are too noisy for your codebase. Tune the rules rather than suppressing everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No suppressions without justification comments.&lt;/strong&gt; Bare &lt;code&gt;# sast-ignore&lt;/code&gt; annotations with no explanation are a red flag. Make justification comments a code review requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suppression inventory reviewed quarterly.&lt;/strong&gt; Old suppressions that are no longer valid are silent technical debt. A quarterly review catches them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positive rate declining over time.&lt;/strong&gt; As you tune rules based on real-world results, your false positive rate should go down. If it's stable or increasing, you're not learning from your suppression data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New findings triaged within one sprint.&lt;/strong&gt; If findings from a scan are still unreviewed after two weeks, your triage process isn't keeping up. Either reduce the finding volume (tune rules) or increase triage capacity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Point
&lt;/h2&gt;

&lt;p&gt;False positive management is not a technical problem. It's a trust and process problem that has technical levers.&lt;/p&gt;

&lt;p&gt;The suppression system in my scanner — inline annotations, justification comments, suppressed findings in the JSON output, confidence levels on patterns — these are all technical levers. But they only work in the context of a team that has agreed on how to use them.&lt;/p&gt;

&lt;p&gt;The best SAST implementation I can imagine is one where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engineers trust the scanner because it has a low false positive rate&lt;/li&gt;
&lt;li&gt;The scanner trusts engineers because suppressions are reviewed and justified&lt;/li&gt;
&lt;li&gt;Security teams trust both because the suppression inventory is auditable and periodically reviewed
That's not a configuration. That's a culture. The configuration just makes the culture possible.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Full source and suppression documentation at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next up — the final article in this series: what building all of this taught me about application security that 13 years of software engineering didn't.&lt;/p&gt;

</description>
      <category>appsec</category>
      <category>devops</category>
      <category>testing</category>
      <category>security</category>
    </item>
    <item>
      <title>The Adoption Trap to Avoid</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Thu, 07 May 2026 18:41:12 +0000</pubDate>
      <link>https://dev.to/pgmpofu/the-adoption-trap-to-avoid-2ekd</link>
      <guid>https://dev.to/pgmpofu/the-adoption-trap-to-avoid-2ekd</guid>
      <description>&lt;p&gt;The single biggest mistake teams make with CI/CD-integrated security tooling is treating it as a one-time setup rather than an ongoing programme.&lt;/p&gt;

&lt;p&gt;The scanner is not the security programme. The scanner is a signal generator. The security programme is the process by which signals become fixes, fixes become patterns, and patterns become rules that prevent the same issue from appearing again.&lt;/p&gt;

&lt;p&gt;Configurable thresholds give you the controls to introduce that programme without breaking your team's deployment workflow. Use them gradually, communicate the reasoning at each phase, and invest as much in the suppression review process as you do in the initial setup.&lt;/p&gt;

&lt;p&gt;A scanner your team trusts and engages with is worth ten scanners that get bypassed.&lt;/p&gt;




&lt;p&gt;Full source and GitHub Actions workflow examples at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next up: the one everyone's been asking about — false positives in SAST, how I built suppression into the scanner, and why managing false positives is as important as finding real vulnerabilities.&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>cicd</category>
      <category>github</category>
    </item>
    <item>
      <title>Writing Custom SAST Rules for Vulnerabilities Your Scanner Doesn't Cover</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Thu, 07 May 2026 18:38:18 +0000</pubDate>
      <link>https://dev.to/pgmpofu/writing-custom-sast-rules-for-vulnerabilities-your-scanner-doesnt-cover-5hhi</link>
      <guid>https://dev.to/pgmpofu/writing-custom-sast-rules-for-vulnerabilities-your-scanner-doesnt-cover-5hhi</guid>
      <description>&lt;p&gt;Every SAST tool ships with a default ruleset. And every default ruleset has gaps.&lt;/p&gt;

&lt;p&gt;Sometimes the gap is a framework-specific vulnerability that the tool's authors didn't anticipate. Sometimes it's an internal pattern unique to your organisation — a custom authentication library, a legacy data access layer, a home-grown serialisation format that every engineer knows is sensitive but no off-the-shelf rule covers.&lt;/p&gt;

&lt;p&gt;This is the article where I show you how to close those gaps using the YAML rule engine I built. No Python required. No rebuilding the scanner. Just a YAML file and an understanding of what you're trying to detect.&lt;/p&gt;

&lt;p&gt;By the end, you'll have written three custom rules from scratch — a Java-specific one, a Node.js-specific one, and an organisation-level one that catches usage of a fictional internal library pattern. The process is the same for any vulnerability you want to target.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before You Write a Rule: The Four Questions
&lt;/h2&gt;

&lt;p&gt;Every good detection rule starts with the same four questions. Skip them and you end up with either a rule that fires on everything or a rule that fires on nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What does the vulnerable code actually look like in text?&lt;/strong&gt;&lt;br&gt;
Not the conceptual vulnerability — the literal characters that appear on screen when a developer writes the bad pattern. Be specific. "SQL injection" is not an answer. &lt;code&gt;"SELECT * FROM users WHERE id = " + userId&lt;/code&gt; is an answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. What does safe code look like?&lt;/strong&gt;&lt;br&gt;
You need the counterexample. If your pattern would also match safe code, you have a false positive problem. If you can't articulate what safe code looks like, you don't understand the vulnerability well enough to write a rule yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Which languages does this apply to?&lt;/strong&gt;&lt;br&gt;
Some patterns are universal — hardcoded secrets look similar everywhere. Others are language or framework-specific. Writing a broad rule when a narrow one is appropriate generates noise and erodes trust in the scanner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. What's the right confidence level?&lt;/strong&gt;&lt;br&gt;
HIGH means "this is almost certainly a real vulnerability." MEDIUM means "this warrants human review." LOW means "this is suspicious but probably benign." If you're unsure, start at MEDIUM and tighten it after you see the results on real code.&lt;/p&gt;

&lt;p&gt;Now let's write some rules.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Rule Format (Quick Reference)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CUSTOM-001&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Short descriptive title&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;What the vulnerability is and why it matters.&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL | HIGH | MEDIUM | LOW&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Injection | Secrets | Cryptography | Authentication | Misconfiguration | Path Traversal&lt;/span&gt;
    &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-XXX&lt;/span&gt;
    &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AXX:2021 - Category Name&lt;/span&gt;
    &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;javascript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;typescript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csharp"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kotlin"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ruby"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;php"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;What the developer should do instead.&lt;/span&gt;
    &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-pattern-here'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH | MEDIUM | LOW&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Save it anywhere — the scanner discovers all YAML files in the &lt;code&gt;rules/&lt;/code&gt; directory automatically. If you want to keep your custom rules separate from the core ruleset, create a &lt;code&gt;rules/custom/&lt;/code&gt; subdirectory and point the scanner at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py ./src &lt;span class="nt"&gt;--rules&lt;/span&gt; ./rules/custom/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Rule 1: Java — Spring &lt;code&gt;@Transactional&lt;/code&gt; on Public Methods Exposing Sensitive Data
&lt;/h2&gt;

&lt;p&gt;This one is Java-specific and framework-specific. It's the kind of vulnerability that no generic SAST tool covers because it requires understanding Spring's transaction management model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; In Spring, &lt;code&gt;@Transactional&lt;/code&gt; annotations on &lt;code&gt;public&lt;/code&gt; methods in &lt;code&gt;@Service&lt;/code&gt; or &lt;code&gt;@Repository&lt;/code&gt; classes work as expected because Spring creates a proxy. But when &lt;code&gt;@Transactional&lt;/code&gt; is placed on a &lt;code&gt;private&lt;/code&gt; method, Spring's proxy-based AOP cannot intercept it — the transaction is silently ignored. This is especially dangerous when the private method performs database writes that need to be atomic.&lt;/p&gt;

&lt;p&gt;This isn't a traditional security vulnerability in the CVE sense — it's a correctness issue that can become a security issue when the failed transaction silently corrupts data, leaves partial writes in the database, or bypasses audit logging that was supposed to be transactional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What safe code looks like:&lt;/strong&gt; &lt;code&gt;@Transactional&lt;/code&gt; on &lt;code&gt;public&lt;/code&gt; methods, or using &lt;code&gt;TransactionTemplate&lt;/code&gt; for programmatic transaction management on private methods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What vulnerable code looks like:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PaymentService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Transactional&lt;/span&gt;  &lt;span class="c1"&gt;// silent no-op — Spring proxy can't intercept private methods&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;processRefund&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ledgerRepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;debit&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;auditRepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"REFUND"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// may not be in same transaction&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JAVA-001&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@Transactional&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Private&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Method&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Transaction&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Silently&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ignored"&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Spring's proxy-based AOP cannot intercept @Transactional annotations on&lt;/span&gt;
      &lt;span class="s"&gt;private methods. The annotation is silently ignored, meaning the method&lt;/span&gt;
      &lt;span class="s"&gt;executes without transaction management. This can cause partial writes,&lt;/span&gt;
      &lt;span class="s"&gt;data corruption, and bypassed audit logging in database operations.&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Misconfiguration&lt;/span&gt;
    &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-362&lt;/span&gt;
    &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A05:2021 - Security Misconfiguration&lt;/span&gt;
    &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Move @Transactional to public methods only. For private methods that&lt;/span&gt;
      &lt;span class="s"&gt;require transaction management, either make them public, use&lt;/span&gt;
      &lt;span class="s"&gt;TransactionTemplate for programmatic transactions, or restructure&lt;/span&gt;
      &lt;span class="s"&gt;the code so the public caller method is annotated instead.&lt;/span&gt;
    &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@Transactional[\s\S]{0,100}private\s+\w+\s+\w+\s*\('&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;private\s+\w+\s+\w+\s*\([\s\S]{0,100}@Transactional'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Testing your rule&lt;/strong&gt; — create a test file &lt;code&gt;test_java_transactional.java&lt;/code&gt; and verify it fires:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Should fire — JAVA-001&lt;/span&gt;
&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;updateBalance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Should NOT fire — public method is fine&lt;/span&gt;
&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;processPayment&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py ./test_java_transactional.java &lt;span class="nt"&gt;--rules&lt;/span&gt; ./rules/custom/java-rules.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Rule 2: Node.js — &lt;code&gt;child_process.exec&lt;/code&gt; with Template Literals
&lt;/h2&gt;

&lt;p&gt;This one targets a Node.js-specific pattern that's extremely common in backend services written by developers who came from a systems programming background.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; &lt;code&gt;child_process.exec()&lt;/code&gt; passes its argument to the shell for execution. If that argument contains user-controlled input — even through a template literal that looks clean — it enables OS command injection. The shell will happily interpret special characters like &lt;code&gt;;&lt;/code&gt;, &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;, &lt;code&gt;|&lt;/code&gt;, and backticks as command separators or subshell operators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What safe code looks like:&lt;/strong&gt; &lt;code&gt;child_process.execFile()&lt;/code&gt; or &lt;code&gt;child_process.spawn()&lt;/code&gt; with arguments as an array — these bypass the shell entirely and treat the command and arguments as separate values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What vulnerable code looks like:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Dangerous — shell injection possible&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`convert &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; -resize 800x600 output.jpg`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Also dangerous — looks safer but isn't&lt;/span&gt;
&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ffmpeg -i &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; output.mp4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What safe code looks like:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Safe — no shell involved&lt;/span&gt;
&lt;span class="nf"&gt;execFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;convert&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-resize&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;800x600&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output.jpg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Safe — spawn with args array&lt;/span&gt;
&lt;span class="nf"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ffmpeg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-i&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output.mp4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NODE-001&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;child_process.exec&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Dynamic&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Input&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Command&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Injection"&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;child_process.exec() passes its argument to the system shell, enabling&lt;/span&gt;
      &lt;span class="s"&gt;OS command injection when the argument includes user-controlled input,&lt;/span&gt;
      &lt;span class="s"&gt;template literals, or string concatenation. Attackers can inject shell&lt;/span&gt;
      &lt;span class="s"&gt;metacharacters to execute arbitrary commands on the host system.&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Injection&lt;/span&gt;
    &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-78&lt;/span&gt;
    &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A03:2021 - Injection&lt;/span&gt;
    &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;javascript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;typescript"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Replace exec() with execFile() or spawn() and pass command arguments&lt;/span&gt;
      &lt;span class="s"&gt;as an array. These functions bypass the shell entirely and treat each&lt;/span&gt;
      &lt;span class="s"&gt;argument as a literal string, preventing shell metacharacter injection.&lt;/span&gt;
      &lt;span class="s"&gt;Never concatenate user input into exec() arguments.&lt;/span&gt;
    &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exec\s*\(\s*`[^`]*\$\{'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exec\s*\(\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;^"\'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s]&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;w'&lt;/span&gt;
        &lt;span class="s"&gt;confidence:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;regex:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'exec&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;w+&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;+'&lt;/span&gt;
        &lt;span class="s"&gt;confidence:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three patterns cover the three common forms: template literals with interpolation, concatenation with a string prefix, and concatenation with a variable. The last one is MEDIUM because &lt;code&gt;exec("mycommand" + options)&lt;/code&gt; where &lt;code&gt;options&lt;/code&gt; is a static config value is less dangerous — but still warrants review.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rule 3: Organisation-Level — Internal Audit Logger Bypass
&lt;/h2&gt;

&lt;p&gt;This is the most interesting type of custom rule: one that only makes sense for your specific codebase.&lt;/p&gt;

&lt;p&gt;Imagine your organisation has an internal library called &lt;code&gt;AuditLogger&lt;/code&gt; that must be called for any database mutation. The security policy is clear: every write operation must produce an audit event. But the library has a &lt;code&gt;skipAudit()&lt;/code&gt; method that was added for performance testing and was never supposed to reach production code.&lt;/p&gt;

&lt;p&gt;This isn't in any public CVE database. No off-the-shelf SAST tool would ever flag it. But it's a real security control bypass in your organisation's context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ORG-001&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AuditLogger.skipAudit()&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Security&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Control&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Bypass"&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;The skipAudit() method on AuditLogger disables audit event generation&lt;/span&gt;
      &lt;span class="s"&gt;for database mutations. This method was introduced for load testing&lt;/span&gt;
      &lt;span class="s"&gt;only and must never appear in production code. Its presence bypasses&lt;/span&gt;
      &lt;span class="s"&gt;the organisation's regulatory audit trail requirement and may&lt;/span&gt;
      &lt;span class="s"&gt;constitute a compliance violation.&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Misconfiguration&lt;/span&gt;
    &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-778&lt;/span&gt;
    &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A09:2021 - Security Logging and Monitoring Failures&lt;/span&gt;
    &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kotlin"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csharp"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Remove skipAudit() immediately. All database mutations must generate&lt;/span&gt;
      &lt;span class="s"&gt;audit events via AuditLogger. If performance is a concern, use&lt;/span&gt;
      &lt;span class="s"&gt;AuditLogger.asyncLog() instead, which queues events without blocking&lt;/span&gt;
      &lt;span class="s"&gt;the main thread. Contact the security team if an exemption is required.&lt;/span&gt;
    &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\.skipAudit\s*\('&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AuditLogger\s*\.\s*skip'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what this rule does that a generic tool can't: it encodes your organisation's security policy directly into the scanner. The remediation text names the correct alternative (&lt;code&gt;asyncLog()&lt;/code&gt;). The description mentions the regulatory context. The severity is CRITICAL because in this fictional organisation, bypassing audit logging is a compliance issue, not just a best practice.&lt;/p&gt;

&lt;p&gt;This is the highest-value type of custom rule because it's completely unavailable from any third-party source.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Pattern Rules: Increasing Coverage Without Losing Precision
&lt;/h2&gt;

&lt;p&gt;One pattern rarely catches all instances of a vulnerability. The best rules use multiple patterns with appropriate confidence levels to maximise coverage while communicating certainty to the reviewer.&lt;/p&gt;

&lt;p&gt;Here's a well-structured multi-pattern rule for detecting hardcoded database credentials in connection strings — a pattern that appears differently across languages and frameworks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CUSTOM-DB-001&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hardcoded&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Database&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Credentials&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Connection&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;String"&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Database connection strings with embedded credentials expose sensitive&lt;/span&gt;
      &lt;span class="s"&gt;authentication material in source code, version control history, and&lt;/span&gt;
      &lt;span class="s"&gt;build artifacts.&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secrets&lt;/span&gt;
    &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-798&lt;/span&gt;
    &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A07:2021 - Identification and Authentication Failures&lt;/span&gt;
    &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csharp"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;javascript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;typescript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kotlin"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Move credentials to environment variables or a secrets manager such as&lt;/span&gt;
      &lt;span class="s"&gt;AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault. Never commit&lt;/span&gt;
      &lt;span class="s"&gt;credentials to version control.&lt;/span&gt;
    &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# JDBC connection strings&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jdbc:[a-z]+://[^/]+/[^?]+\?.*password=[^&amp;amp;\s"'&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]{3,}'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="c1"&gt;# .NET connection strings&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Password\s*=\s*[^;"\s]{4,}\s*;'&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
      &lt;span class="c1"&gt;# Generic password assignment near connection context&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(conn|connection|db).*password\s*=\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;^"''&lt;/span&gt;&lt;span class="pi"&gt;]{&lt;/span&gt;&lt;span class="nv"&gt;4&lt;/span&gt;&lt;span class="pi"&gt;,}[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;']'&lt;/span&gt;
        &lt;span class="s"&gt;confidence:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MEDIUM&lt;/span&gt;
      &lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SQLAlchemy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Django&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;URLs&lt;/span&gt;
      &lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;regex:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'(postgresql|mysql|sqlite|mongodb)://&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;w+:[^@&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s"&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="pi"&gt;]{&lt;/span&gt;&lt;span class="nv"&gt;4&lt;/span&gt;&lt;span class="pi"&gt;,}&lt;/span&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;
        &lt;span class="s"&gt;confidence:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each pattern has a different confidence because each has a different false positive profile. JDBC connection strings with password parameters are nearly always real findings. The generic &lt;code&gt;connection.password =&lt;/code&gt; pattern might match configuration loading code where the value comes from an environment variable on the right-hand side.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing Your Custom Rules
&lt;/h2&gt;

&lt;p&gt;Before you add a rule to your pipeline, test it against both positive and negative cases.&lt;/p&gt;

&lt;p&gt;Create a dedicated test file with clearly labelled sections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_custom_rules.py
&lt;/span&gt;
&lt;span class="c1"&gt;# --- SHOULD FIRE ---
# NODE-001: exec with template literal
&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sb"&gt;`convert ${userInput} output.jpg`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# CUSTOM-DB-001: hardcoded JDBC credentials
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc:postgresql://localhost/mydb?user=admin&amp;amp;password=supersecret123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# --- SHOULD NOT FIRE ---
# Safe: spawn with args array
&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;convert&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output.jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Safe: password from environment
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc:postgresql://localhost/mydb?user=admin&amp;amp;password=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_PASS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the scanner and verify the output matches your expectations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py ./test_custom_rules.py &lt;span class="nt"&gt;--rules&lt;/span&gt; ./rules/custom/ &lt;span class="nt"&gt;--format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every &lt;code&gt;SHOULD FIRE&lt;/code&gt; comment corresponds to a finding in the output&lt;/li&gt;
&lt;li&gt;Every &lt;code&gt;SHOULD NOT FIRE&lt;/code&gt; comment has no corresponding finding&lt;/li&gt;
&lt;li&gt;The confidence and severity levels match what you intended
If a false positive appears, either tighten the regex or downgrade the confidence level. If a true positive is missed, your pattern isn't covering that form of the vulnerability.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Broader Point: Rules as Institutional Knowledge
&lt;/h2&gt;

&lt;p&gt;The most valuable thing about a YAML-driven rule engine isn't the rules it ships with. It's the rules your team writes over time.&lt;/p&gt;

&lt;p&gt;Every time a security engineer finds a vulnerability in a code review, there's a question worth asking: &lt;em&gt;could this have been caught by a scanner rule?&lt;/em&gt; If the answer is yes, write the rule. Now the scanner catches that pattern forever, across every future PR, without anyone needing to remember it.&lt;/p&gt;

&lt;p&gt;Rules become institutional knowledge. They encode the hard-won understanding of what goes wrong in your specific codebase, your specific frameworks, your specific compliance requirements. That's something no off-the-shelf tool can give you — and it compounds over time.&lt;/p&gt;




&lt;p&gt;The full scanner and core ruleset are at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;. Drop your custom rules in &lt;code&gt;rules/&lt;/code&gt; and they're picked up automatically on the next scan.&lt;/p&gt;

&lt;p&gt;Next up: embedding the scanner in a CI/CD pipeline with configurable severity thresholds — how to go from zero security gates to blocking builds on critical findings without breaking your team's deployment workflow.&lt;/p&gt;

</description>
      <category>security</category>
      <category>appsec</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Modelled the OWASP Top 10 Into a YAML Rule Engine</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Thu, 07 May 2026 18:35:39 +0000</pubDate>
      <link>https://dev.to/pgmpofu/how-i-modelled-the-owasp-top-10-into-a-yaml-rule-engine-2g48</link>
      <guid>https://dev.to/pgmpofu/how-i-modelled-the-owasp-top-10-into-a-yaml-rule-engine-2g48</guid>
      <description>&lt;p&gt;When I set out to write detection rules for my SAST tool, I didn't start with a list of regex patterns. I started with the OWASP Top 10.&lt;/p&gt;

&lt;p&gt;That might sound obvious, but it matters. The OWASP Top 10 is the closest thing the AppSec world has to a universal curriculum. Every security engineer speaks it. Every compliance framework references it. When I map my rules to OWASP categories, I'm not just organising them — I'm making them legible to the people who will ultimately use them.&lt;/p&gt;

&lt;p&gt;This article is about the thought process behind translating OWASP into a machine-readable rule engine. Not just &lt;em&gt;what&lt;/em&gt; rules I wrote, but &lt;em&gt;why&lt;/em&gt; I wrote them the way I did, and where the tricky ones gave me the most trouble.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rule Schema
&lt;/h2&gt;

&lt;p&gt;Every rule in the engine follows the same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AUTHN-001&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JWT&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Algorithm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;None&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Attack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Vector"&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;"&lt;/span&gt;
    &lt;span class="s"&gt;The application accepts JWTs with algorithm set to 'none', allowing&lt;/span&gt;
    &lt;span class="s"&gt;attackers to forge tokens without a valid signature.&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
  &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Authentication&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-347&lt;/span&gt;
  &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A07:2021 - Identification and Authentication Failures&lt;/span&gt;
  &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;javascript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csharp"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Always explicitly specify and enforce the expected algorithm when&lt;/span&gt;
    &lt;span class="s"&gt;verifying JWTs. Never accept 'none' as a valid algorithm. Use an&lt;/span&gt;
    &lt;span class="s"&gt;allowlist of accepted algorithms.&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;algorithm[s]?\s*[=:]\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;none["\']'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;verify\s*=\s*False'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six things matter in this schema beyond the obvious metadata:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CWE ID&lt;/strong&gt; — links to the Common Weakness Enumeration, which is the language of vulnerability databases and CVEs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OWASP category&lt;/strong&gt; — maps to a Top 10 entry using the 2021 version&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Languages array&lt;/strong&gt; — controls which file types the pattern is applied to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple patterns&lt;/strong&gt; — a rule can have several patterns, each with its own confidence level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence&lt;/strong&gt; — HIGH means the pattern is very likely a real vulnerability; MEDIUM means it warrants manual review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remediation&lt;/strong&gt; — not just "this is bad" but "here's what to do instead"
That last one is deliberate. A scanner that flags vulnerabilities without telling developers how to fix them creates noise, not security. Every rule in my tool includes actionable remediation guidance.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  How the 28 Rules Map to OWASP
&lt;/h2&gt;

&lt;p&gt;Here's the full picture before we go deep on individual rules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OWASP 2021 Category&lt;/th&gt;
&lt;th&gt;My Rules&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A01 — Broken Access Control&lt;/td&gt;
&lt;td&gt;AUTHN-005 (IDOR), MISC-001 (Path Traversal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A02 — Cryptographic Failures&lt;/td&gt;
&lt;td&gt;CRYPTO-001 through CRYPTO-006, SEC-003, SEC-004&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A03 — Injection&lt;/td&gt;
&lt;td&gt;INJ-001 through INJ-005, MISC-003 (XXE), MISC-006 (Deserialization)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A04 — Insecure Design&lt;/td&gt;
&lt;td&gt;MISC-004 (File Upload)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A05 — Security Misconfiguration&lt;/td&gt;
&lt;td&gt;MISC-002 (Debug Mode), MISC-003, MISC-005 (CORS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A07 — Auth &amp;amp; Identity Failures&lt;/td&gt;
&lt;td&gt;AUTHN-001 through AUTHN-005, SEC-001 through SEC-006&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A08 — Software &amp;amp; Data Integrity&lt;/td&gt;
&lt;td&gt;MISC-006 (Insecure Deserialization)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some OWASP categories are underrepresented — A06 (Vulnerable Components) is better handled by SCA tools like Snyk than a SAST scanner, and A09 (Logging Failures) and A10 (SSRF) would require data flow analysis that regex can't reliably deliver. I'll come back to this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive: The Rules That Required Real Thought
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AUTHN-001 — JWT Algorithm None Attack Vector
&lt;/h3&gt;

&lt;p&gt;This one is my favourite rule in the entire set, because it targets a specific, well-known attack that is both elegant and devastating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; The JWT specification allows the &lt;code&gt;alg&lt;/code&gt; header to be set to &lt;code&gt;"none"&lt;/code&gt;, which means "no signature required." Some libraries honour this. If an attacker intercepts a JWT, changes the payload (for example, escalating &lt;code&gt;"role": "user"&lt;/code&gt; to &lt;code&gt;"role": "admin"&lt;/code&gt;), sets &lt;code&gt;alg: none&lt;/code&gt;, and removes the signature, a vulnerable library will accept it as valid.&lt;/p&gt;

&lt;p&gt;This is CWE-347 — Improper Verification of Cryptographic Signature. It's not a cryptographic weakness in the algorithm — it's a logic flaw in how the algorithm is selected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The detection challenge:&lt;/strong&gt; The attack can be enabled in several ways. The most obvious is setting the algorithm explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;algorithms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But it can also be enabled by disabling verification entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Python jwt library
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or by using a wildcard algorithm list that implicitly includes &lt;code&gt;none&lt;/code&gt;. My rule covers the first two patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;algorithm[s]?\s*[=:]\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;none["\']'&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;verify\s*=\s*False'&lt;/span&gt;
    &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second pattern (&lt;code&gt;verify=False&lt;/code&gt;) is MEDIUM confidence rather than HIGH because disabling verification has legitimate uses in test environments. That's an important distinction — the same code can be correct or dangerous depending on context, and the confidence level communicates that to the developer reviewing the finding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation:&lt;/strong&gt; Always pass an explicit allowlist of algorithms when decoding JWTs and never include &lt;code&gt;none&lt;/code&gt;. In Python's PyJWT library, that looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algorithms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HS256&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# explicit allowlist
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  MISC-006 — Insecure Deserialization
&lt;/h3&gt;

&lt;p&gt;This is the rule I found hardest to write well, because insecure deserialization is one of those vulnerability classes where the &lt;em&gt;presence&lt;/em&gt; of the function call isn't necessarily dangerous — it's the &lt;em&gt;source&lt;/em&gt; of the data being deserialized that makes it dangerous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; Deserializing untrusted data can lead to remote code execution. In Python, &lt;code&gt;pickle.loads()&lt;/code&gt; will execute arbitrary Python code embedded in the serialized payload. In Java, &lt;code&gt;ObjectInputStream.readObject()&lt;/code&gt; has been the source of countless critical CVEs. In PHP, &lt;code&gt;unserialize()&lt;/code&gt; is a classic RCE vector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The detection challenge:&lt;/strong&gt; I can't tell from the call site alone whether the data being deserialized is trusted (coming from a file the application wrote itself) or untrusted (coming from a user-submitted HTTP body or a message queue). Both look identical to a regex scanner.&lt;/p&gt;

&lt;p&gt;My decision was to flag it at HIGH confidence with a remediation note that acknowledges the context-dependence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MISC-006&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Insecure Deserialization&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-502&lt;/span&gt;
  &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A08:2021 - Software and Data Integrity Failures&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pickle\.loads?\s*\('&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ObjectInputStream\s*\('&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unserialize\s*\('&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
  &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Avoid deserializing untrusted data. If deserialization is required,&lt;/span&gt;
    &lt;span class="s"&gt;use safer formats like JSON. If using pickle, only deserialize data&lt;/span&gt;
    &lt;span class="s"&gt;from trusted, integrity-verified sources. Consider signing serialized&lt;/span&gt;
    &lt;span class="s"&gt;payloads. For Java, use safer alternatives like Jackson or Gson for&lt;/span&gt;
    &lt;span class="s"&gt;JSON deserialization.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I gave &lt;code&gt;unserialize()&lt;/code&gt; in PHP a MEDIUM confidence rather than HIGH because PHP codebases legitimately use it in contexts where the data comes from internal sources. The confidence difference is a signal to the developer: &lt;em&gt;look harder at this one, but don't automatically treat it as a defect.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  AUTHN-004 — Timing Attack in Auth Comparison
&lt;/h3&gt;

&lt;p&gt;This is the subtlest rule in the set, and the one most likely to generate confused questions from developers who haven't encountered it before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; When you compare two strings — say, a provided token against a stored token — using a standard equality operator (&lt;code&gt;==&lt;/code&gt;), most implementations short-circuit on the first mismatched character. This means comparing a completely wrong token takes microseconds, while a token that matches the first 30 characters takes longer.&lt;/p&gt;

&lt;p&gt;An attacker can exploit this by measuring response times to brute-force secrets character by character. It sounds theoretical. It isn't — it's been used in practice against authentication systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Use a constant-time comparison function. In Python, that's &lt;code&gt;hmac.compare_digest()&lt;/code&gt;. In Node.js, it's &lt;code&gt;crypto.timingSafeEqual()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The detection:&lt;/strong&gt; I look for direct string comparison in contexts that suggest authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AUTHN-004&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Timing Attack in Auth Comparison&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-208&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(token|secret|password|api_key)\s*==\s*'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;==\s*(token|secret|password|api_key)'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MEDIUM severity, MEDIUM confidence. The false positive rate here is real — lots of code compares passwords or tokens with &lt;code&gt;==&lt;/code&gt; in contexts where timing attacks are a genuine concern, but also in test code, logging, and input validation where they aren't. The finding is a prompt to review, not an automatic defect.&lt;/p&gt;




&lt;h3&gt;
  
  
  CRYPTO-005 — ECB Mode Encryption
&lt;/h3&gt;

&lt;p&gt;This rule catches one of the most common misuses of encryption that isn't immediately obvious to developers who aren't cryptographers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; AES-ECB (Electronic Codebook) mode encrypts each block of plaintext independently using the same key. This means identical plaintext blocks produce identical ciphertext blocks, which leaks structural information about the data even when it's "encrypted."&lt;/p&gt;

&lt;p&gt;The classic demonstration is encrypting a bitmap image with AES-ECB — the overall pattern of the image remains visible in the ciphertext because regions of the same colour encrypt to the same blocks. For structured data like JSON or database rows, the same leakage applies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The detection:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRYPTO-005&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ECB Mode Encryption&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-327&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AES\.MODE_ECB|Cipher\.getInstance\(["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;AES["\']|AES/ECB'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern catches Java's &lt;code&gt;Cipher.getInstance("AES")&lt;/code&gt; because Java's default AES mode — when you don't specify one — is ECB. This is a documentation trap that developers fall into all the time. They think they're using secure AES; they're actually using AES-ECB because they didn't know to specify AES/GCM or AES/CBC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation:&lt;/strong&gt; Use AES-GCM for authenticated encryption (preferred) or AES-CBC with a random IV and separate HMAC for integrity verification.&lt;/p&gt;




&lt;h3&gt;
  
  
  MISC-005 — CORS Wildcard / Reflected Origin
&lt;/h3&gt;

&lt;p&gt;This rule sits at MEDIUM severity because CORS misconfiguration is context-dependent in a way that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vulnerability:&lt;/strong&gt; A wildcard CORS header (&lt;code&gt;Access-Control-Allow-Origin: *&lt;/code&gt;) allows any website to make credentialed cross-origin requests to your API. A reflected origin header — where the server echoes back whatever &lt;code&gt;Origin&lt;/code&gt; header the request sent — is even worse, because it's a wildcard that bypasses the &lt;code&gt;credentials: true&lt;/code&gt; restriction that wildcards technically can't combine with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The patterns:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MISC-005&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CORS Wildcard / Reflected Origin&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-942&lt;/span&gt;
  &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A05:2021 - Security Misconfiguration&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Access-Control-Allow-Origin['&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;]?&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*[,:]&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;s*['&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;]?&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;allow_origins\s*=\s*\[?\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;\*["\']'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request\.headers\.get\(["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;Origin["\']\)'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEDIUM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The third pattern — looking for code that reads the &lt;code&gt;Origin&lt;/code&gt; header — is a signal that reflected origin might be happening, not a definitive finding. A developer reading the &lt;code&gt;Origin&lt;/code&gt; header might be implementing proper allowlist validation. MEDIUM confidence reflects that ambiguity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Categories I Deliberately Left Out
&lt;/h2&gt;

&lt;p&gt;Being honest about gaps matters as much as documenting what you built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A06 — Vulnerable and Outdated Components&lt;/strong&gt; belongs to Software Composition Analysis (SCA), not SAST. SCA tools like Snyk and Dependabot check your dependency versions against CVE databases. A regex scanner can't do this — it would need to parse package manifests and cross-reference them against live vulnerability feeds. I deferred this entirely to dedicated SCA tooling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A09 — Security Logging and Monitoring Failures&lt;/strong&gt; requires understanding what &lt;em&gt;isn't&lt;/em&gt; in the code — which authentication events aren't being logged, which error handlers swallow exceptions silently. Pattern matching can only find things that are present in the text. Detecting absence requires semantic understanding the tool doesn't have.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A10 — Server-Side Request Forgery (SSRF)&lt;/strong&gt; requires taint analysis. An SSRF vulnerability exists when user-controlled input reaches an HTTP request function without validation. That's exactly the kind of multi-step data flow that regex can't trace. I flagged this in the README as a known gap and a candidate for future AST-based analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Mapping to OWASP Gave Me
&lt;/h2&gt;

&lt;p&gt;Structuring the rules against OWASP rather than building them ad hoc gave me three things I didn't expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coverage gaps become visible.&lt;/strong&gt; When you're mapping rules to a framework, the categories with no rules stand out immediately. That's a forcing function for honesty about what your tool actually covers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The output speaks to security professionals.&lt;/strong&gt; When a finding says &lt;code&gt;A03:2021 - Injection&lt;/code&gt; and &lt;code&gt;CWE-89&lt;/code&gt;, a security engineer doesn't need to read the description to understand what they're looking at. The taxonomy does the communication work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's defensible.&lt;/strong&gt; If someone asks why I chose to flag MD5 usage, I can say: &lt;em&gt;because CWE-327 maps to A02:2021 - Cryptographic Failures, and OWASP identifies weak hashing as a top-tier risk category.&lt;/em&gt; That's not me making a judgment call — it's me implementing an industry-standard framework.&lt;/p&gt;

&lt;p&gt;Building your own tool is one of the fastest ways to understand why the standards are structured the way they are. You don't really understand OWASP until you've had to decide how to implement it.&lt;/p&gt;




&lt;p&gt;The full rule set is in the &lt;code&gt;rules/&lt;/code&gt; directory at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;. Each YAML file corresponds to a rule category, and every rule follows the schema described above.&lt;/p&gt;

&lt;p&gt;Next up: writing custom SAST rules for vulnerabilities your scanner doesn't cover — a practical tutorial using the YAML rule format to extend the tool for stack-specific patterns.&lt;/p&gt;

</description>
      <category>appsec</category>
      <category>security</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Why I Chose Regex Over AST Parsing in My SAST Tool (And When That Would Be Wrong)</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Mon, 04 May 2026 18:16:26 +0000</pubDate>
      <link>https://dev.to/pgmpofu/why-i-chose-regex-over-ast-parsing-in-my-sast-tool-and-when-that-would-be-wrong-md2</link>
      <guid>https://dev.to/pgmpofu/why-i-chose-regex-over-ast-parsing-in-my-sast-tool-and-when-that-would-be-wrong-md2</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-and-ran-it-against-4-famous-vulnerable-apps-heres-what-it-1ko"&gt;last article&lt;/a&gt;, I mentioned that my SAST tool uses regex-based pattern matching instead of AST parsing, and that this was a deliberate tradeoff. A few people asked me to go deeper on that decision — because on the surface, it sounds like I took a shortcut.&lt;/p&gt;

&lt;p&gt;I didn't. Or rather — I did, but it was an informed shortcut, and there's a meaningful difference.&lt;/p&gt;

&lt;p&gt;Let me explain what AST parsing actually is, why it's considered the "correct" approach, why I chose not to use it, and — most importantly — when that choice would be the wrong one.&lt;/p&gt;




&lt;h2&gt;
  
  
  First, What's the Difference?
&lt;/h2&gt;

&lt;p&gt;When your SAST tool scans a file, it needs to understand what the code is doing. There are two fundamentally different ways to approach this.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Regex Approach
&lt;/h3&gt;

&lt;p&gt;Regex treats source code as plain text and looks for patterns that &lt;em&gt;look like&lt;/em&gt; vulnerabilities. Here's a simplified version of what my SQL injection rule does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(execute|query|cursor)\s*\(\s*["\'].*\+.*["\']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern says: &lt;em&gt;find any call to &lt;code&gt;execute&lt;/code&gt;, &lt;code&gt;query&lt;/code&gt;, or &lt;code&gt;cursor&lt;/code&gt; that contains a string concatenation inside the parentheses.&lt;/em&gt; If it matches, flag it as a potential SQL injection.&lt;/p&gt;

&lt;p&gt;It's fast, simple, and language-agnostic. The same pattern catches suspicious SQL construction in Python, Java, PHP, and JavaScript without modification.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AST Approach
&lt;/h3&gt;

&lt;p&gt;AST stands for Abstract Syntax Tree. When a compiler or interpreter reads your code, it doesn't see text — it parses the text into a structured tree that represents the &lt;em&gt;meaning&lt;/em&gt; of the code.&lt;/p&gt;

&lt;p&gt;Take this Python snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE id = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AST parser doesn't just see the word &lt;code&gt;execute&lt;/code&gt; followed by some text. It understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt; is a variable assigned from &lt;code&gt;request.args&lt;/code&gt; — a known source of user-controlled input&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;query&lt;/code&gt; is a string built by concatenating that variable — which is a taint propagation step&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cursor.execute(query)&lt;/code&gt; is a database call receiving that tainted string — which is a sink
This is &lt;strong&gt;taint analysis&lt;/strong&gt; — tracking the flow of untrusted data from a source to a dangerous sink. It's the gold standard of SAST analysis because it understands context, not just surface patterns.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Regex Gets Wrong
&lt;/h2&gt;

&lt;p&gt;Let me show you a concrete example of where regex fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  False Positive: The Innocuous MD5
&lt;/h3&gt;

&lt;p&gt;My rule &lt;code&gt;CRYPTO-001&lt;/code&gt; flags any use of MD5 as a potential weak hashing vulnerability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRYPTO-001&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Weak Hashing — MD5&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\bmd5\s*\('&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will correctly flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hashed_password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_password&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# BAD — MD5 for passwords
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But it will also flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;file_checksum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_contents&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# FINE — MD5 for file integrity
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AST-based tool with data flow analysis could potentially distinguish between these cases by understanding what type of data is being hashed. A regex tool cannot. It sees &lt;code&gt;md5(&lt;/code&gt; and fires regardless.&lt;/p&gt;

&lt;h3&gt;
  
  
  False Negative: The Indirect Injection
&lt;/h3&gt;

&lt;p&gt;Regex also misses vulnerabilities that span multiple lines or involve intermediate variables. Consider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getParameter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buildQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// vulnerability travels through this function&lt;/span&gt;
&lt;span class="nc"&gt;Statement&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createStatement&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;                         &lt;span class="c1"&gt;// regex might not flag this&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My regex looks for string concatenation &lt;em&gt;at the point of execution&lt;/em&gt;. If the tainted input is assembled in a helper function and passed in as a completed string, the regex never fires. The vulnerability is invisible to pattern matching.&lt;/p&gt;

&lt;p&gt;An AST tool with interprocedural taint analysis would follow the data through &lt;code&gt;buildQuery()&lt;/code&gt; and flag the eventual &lt;code&gt;execute()&lt;/code&gt; call correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  So Why Did I Choose Regex Anyway?
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Language-Agnostic by Design
&lt;/h3&gt;

&lt;p&gt;AST parsing is inherently language-specific. Every language has its own grammar and its own parser. Python's AST looks nothing like Java's. Kotlin's is different again. JavaScript has multiple competing parsers with different behaviours across versions.&lt;/p&gt;

&lt;p&gt;To support AST-based analysis across 12 languages — Python, Java, Kotlin, JavaScript, TypeScript, C#, Go, Ruby, PHP, Shell, YAML, Terraform — I'd need 12 separate parsing libraries, each with their own dependencies, version constraints, and maintenance requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;tree-sitter&lt;/strong&gt; comes closest to solving this problem. It's a parser generator that provides a unified API across dozens of languages, and it's what tools like GitHub's code scanning use under the hood. But even with tree-sitter, you still need to write language-specific query logic to express what you're looking for in each language's AST structure.&lt;/p&gt;

&lt;p&gt;Regex patterns, by contrast, can be written once and applied across any language where the vulnerable pattern looks similar in text form. Hardcoded AWS access keys follow the same format everywhere. JWT secrets look the same in any language. That's genuine value that regex delivers cheaply.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Vulnerability Surface I'm Targeting
&lt;/h3&gt;

&lt;p&gt;Not all vulnerability classes require deep analysis. Some are genuinely well-served by pattern matching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets detection&lt;/strong&gt; — hardcoded API keys, passwords, connection strings, private key material — is almost entirely a pattern matching problem. The secret has to appear literally in the source code for it to be a finding. Regex is exactly the right tool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SEC-001&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Hardcoded AWS Access Key&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AKIA[0-9A-Z]{16}'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern will catch a hardcoded AWS key in any language, in any file, instantly. AST analysis adds nothing here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Misconfiguration detection&lt;/strong&gt; — debug mode enabled, CORS wildcards, insecure session settings — is similarly pattern-oriented. These are usually single-line declarations that look the same regardless of context.&lt;/p&gt;

&lt;p&gt;The injection and authentication categories are where regex struggles most. But even there, high-confidence patterns — direct string concatenation in SQL calls, &lt;code&gt;algorithm: "none"&lt;/code&gt; in JWT configurations — catch a meaningful portion of real vulnerabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Pragmatic Scope
&lt;/h3&gt;

&lt;p&gt;I built this tool to learn application security deeply, not to compete with Checkmarx. Scope matters. A tool that actually ships with 28 working rules across 6 categories is more valuable than a tool that was going to have perfect taint analysis but never got finished.&lt;/p&gt;

&lt;p&gt;The regex approach let me build a complete, functional, deployable tool. That's not nothing.&lt;/p&gt;




&lt;h2&gt;
  
  
  When This Choice Would Be Wrong
&lt;/h2&gt;

&lt;p&gt;I want to be direct about the scenarios where choosing regex would be the wrong call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're scanning a single language at scale&lt;/strong&gt;, the language-agnostic argument evaporates. If you're only scanning Java — which is common in enterprise AppSec programmes — you should be using a Java AST parser or a tool like SpotBugs or SonarQube that understands Java's type system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you need to catch data flow vulnerabilities reliably&lt;/strong&gt;, regex will miss too much. Injection vulnerabilities that travel through multiple functions, variables, or modules require taint analysis. The indirect injection example I showed earlier is not an edge case — it's the norm in real codebases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're running this in a high-security environment where false negatives are more dangerous than false positives&lt;/strong&gt;, the calculus changes. A false negative means a real vulnerability gets missed. In a financial services or healthcare context, that might be unacceptable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're trying to replace a commercial SAST tool&lt;/strong&gt;, you need AST analysis. There's no way around it. Tools like Semgrep (which uses a hybrid AST/pattern approach), Checkmarx, and Veracode achieve their accuracy because they understand code structure. Pattern matching is a starting point, not a destination.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hybrid Path Forward
&lt;/h2&gt;

&lt;p&gt;The most pragmatic production approach is a hybrid — which is exactly what Semgrep does.&lt;/p&gt;

&lt;p&gt;Semgrep's rule syntax looks like pattern matching but operates on the AST. When you write a Semgrep rule that matches &lt;code&gt;cursor.execute($X + $Y)&lt;/code&gt;, Semgrep isn't doing string matching. It's matching against the AST, which means it correctly handles whitespace, string formatting variations, and code structure in ways that regex cannot.&lt;/p&gt;

&lt;p&gt;For my tool, the natural evolution would be to keep the YAML rule engine and regex patterns as the default layer, but add an optional tree-sitter AST pass for languages where it's available. The two approaches aren't mutually exclusive — they're complementary. Regex for speed and coverage, AST for accuracy on the highest-risk patterns.&lt;/p&gt;

&lt;p&gt;That's the architecture note I left in the README:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;For a production tool, layering in tree-sitter AST analysis per language would reduce false positives.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's not hedging. That's honest engineering — knowing where your current approach has limits and documenting the path to improving it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're building a SAST tool or evaluating one, here's how to think about the regex vs AST question:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Right Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multi-language scanning, broad coverage&lt;/td&gt;
&lt;td&gt;Regex or hybrid (Semgrep-style)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single language, high accuracy&lt;/td&gt;
&lt;td&gt;AST-based analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets detection&lt;/td&gt;
&lt;td&gt;Regex — it's optimal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Taint/data flow analysis&lt;/td&gt;
&lt;td&gt;AST — regex can't do this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD gate with low false positive tolerance&lt;/td&gt;
&lt;td&gt;AST or hybrid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning how SAST works&lt;/td&gt;
&lt;td&gt;Build both and compare&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "correct" approach depends entirely on your threat model, your team's language footprint, and how much false positive noise your developers will tolerate before they disable the scanner entirely.&lt;/p&gt;

&lt;p&gt;A scanner that developers trust and actually use is more valuable than a theoretically perfect scanner that gets switched off after the first sprint.&lt;/p&gt;




&lt;p&gt;The full source code, including all YAML rules, is at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next up: how I modelled the OWASP Top 10 into a YAML rule engine — and the thought process behind some of the trickier rules like JWT algorithm confusion and insecure deserialization.&lt;/p&gt;

</description>
      <category>security</category>
      <category>regex</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Built a SAST Scanner From Scratch — Here's Every Design Decision I Made</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Sun, 03 May 2026 13:18:49 +0000</pubDate>
      <link>https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-heres-every-design-decision-i-made-5454</link>
      <guid>https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-heres-every-design-decision-i-made-5454</guid>
      <description>&lt;p&gt;When most developers want to scan their code for security vulnerabilities, they install Semgrep or Snyk and call it a day. I did the opposite. I built one from scratch.&lt;/p&gt;

&lt;p&gt;Not because the existing tools are bad — they're excellent. But because I'm transitioning from 13 years of software engineering into application security, and I wanted to understand what a SAST tool actually &lt;em&gt;is&lt;/em&gt; underneath the hood. What decisions go into building one? What tradeoffs do you make? What does "language-agnostic" really mean when you have to implement it yourself?&lt;/p&gt;

&lt;p&gt;This is the story of those decisions. Some were obvious. Some I got wrong the first time. All of them taught me something.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Set Out to Build
&lt;/h2&gt;

&lt;p&gt;The goal was a tool that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan source code across &lt;strong&gt;any language&lt;/strong&gt; without needing language-specific parsers&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;rule engine&lt;/strong&gt; that non-engineers could extend without touching code&lt;/li&gt;
&lt;li&gt;Produce output in &lt;strong&gt;three formats&lt;/strong&gt; — terminal, JSON, and HTML — so it could fit into both human workflows and CI/CD pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail builds&lt;/strong&gt; when findings exceeded a configurable severity threshold&lt;/li&gt;
&lt;li&gt;Handle &lt;strong&gt;false positive suppression&lt;/strong&gt; with inline annotations
That's not a toy. That's a real tool with real requirements. So let's talk about how I built it.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Decision 1: Regex Over AST (And Why I'd Make the Same Choice Again)
&lt;/h2&gt;

&lt;p&gt;This was the most consequential decision in the whole project, and I want to be honest about the tradeoffs.&lt;/p&gt;

&lt;p&gt;A proper SAST tool ideally parses code into an &lt;strong&gt;Abstract Syntax Tree (AST)&lt;/strong&gt; — a structured representation of the code's meaning, not just its text. AST-based analysis can understand context. It knows that &lt;code&gt;password&lt;/code&gt; on line 42 is a variable assignment, not a string literal. It can trace data flow. It can detect that user input on line 10 reaches an unparameterised SQL query on line 87 without being sanitised in between.&lt;/p&gt;

&lt;p&gt;Regex can't do any of that. Regex sees text.&lt;/p&gt;

&lt;p&gt;So why did I choose regex?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because AST parsing is language-specific by definition.&lt;/strong&gt; Every language has its own grammar, its own AST format, its own parsing library. Java's AST looks nothing like Python's. Kotlin's is different again. If I wanted to support 12+ languages — Python, Java, Kotlin, JavaScript, TypeScript, C#, Go, Ruby, PHP, Shell, YAML, Terraform — I'd need 12+ separate AST parsers, each with its own dependencies, its own quirks, its own maintenance burden.&lt;/p&gt;

&lt;p&gt;Regex patterns, by contrast, can be written to match suspicious code constructs across any language where those constructs look similar in text form. SQL injection via string concatenation looks recognisably similar in Java, Python, and PHP. Hardcoded AWS access keys follow the same pattern everywhere. MD5 usage reads roughly the same in most languages.&lt;/p&gt;

&lt;p&gt;The tradeoff is accuracy. Regex-based SAST has higher false positive rates than AST-based analysis because it can't understand context. It sees &lt;code&gt;md5(&lt;/code&gt; and flags it regardless of whether it's being used for a password hash or a file integrity check.&lt;/p&gt;

&lt;p&gt;My answer to this was &lt;strong&gt;confidence scoring on rules&lt;/strong&gt; and &lt;strong&gt;inline suppression annotations&lt;/strong&gt;. Rules can declare their confidence level (&lt;code&gt;HIGH&lt;/code&gt;, &lt;code&gt;MEDIUM&lt;/code&gt;, &lt;code&gt;LOW&lt;/code&gt;), and developers can annotate lines with &lt;code&gt;# sast-ignore&lt;/code&gt; or &lt;code&gt;# nosec&lt;/code&gt; to suppress false positives with a documented reason. That's not perfect, but it's pragmatic — and it mirrors how production tools like Bandit handle the same problem.&lt;/p&gt;

&lt;p&gt;If I were building a production-grade commercial tool, I'd layer in AST analysis per language using something like &lt;code&gt;tree-sitter&lt;/code&gt;, which provides a unified API across dozens of languages. But for a portfolio project built to understand the domain? Regex got me 80% of the value at 20% of the complexity.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 2: YAML-Driven Rules (The Best Decision I Made)
&lt;/h2&gt;

&lt;p&gt;Every detection rule in the scanner is defined in a YAML file. Not in code. Here's what one looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;INJ-001&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SQL Injection — String Concatenation&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;User-controlled input is concatenated directly into a SQL query,&lt;/span&gt;
    &lt;span class="s"&gt;bypassing parameterisation and enabling SQL injection attacks.&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
  &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;INJECTION&lt;/span&gt;
  &lt;span class="na"&gt;cwe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CWE-89&lt;/span&gt;
  &lt;span class="na"&gt;owasp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A03:2021 - Injection&lt;/span&gt;
  &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;java"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;javascript"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;php"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csharp"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;remediation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Use parameterised queries or prepared statements. Never concatenate&lt;/span&gt;
    &lt;span class="s"&gt;user input directly into SQL strings.&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(execute|query|cursor)\s*\(\s*["\'&lt;/span&gt;&lt;span class="err"&gt;]&lt;/span&gt;&lt;span class="s"&gt;.*\+.*["\']'&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why is this the best decision I made?&lt;/p&gt;

&lt;p&gt;Because it &lt;strong&gt;separates the detection logic from the engine&lt;/strong&gt;. The scanner engine — the part that reads files, applies patterns, generates findings, produces reports — never needs to change when you add a new vulnerability category. You just write a new YAML file and drop it in the &lt;code&gt;rules/&lt;/code&gt; directory. The engine discovers it automatically on startup.&lt;/p&gt;

&lt;p&gt;This is exactly how production tools like Semgrep and Nuclei work. Rules are data. The engine is infrastructure. Keeping them separate means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security teams can contribute new detections without needing to understand Python&lt;/li&gt;
&lt;li&gt;Rules are version-controlled and diff-able like any other file&lt;/li&gt;
&lt;li&gt;Rules can be reviewed in pull requests by people who've never written a line of the engine&lt;/li&gt;
&lt;li&gt;Custom organisational rules can be maintained separately from the core ruleset
I ended up with five rule files covering 28 rules across six categories: Injection, Secrets, Cryptography, Authentication, Misconfiguration, and Path Traversal. Every rule maps to a CWE identifier and an OWASP Top 10 category. That structure matters — it's the language that security professionals and auditors actually speak.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Decision 3: Three Output Formats From Day One
&lt;/h2&gt;

&lt;p&gt;I could have built a tool that prints to terminal and called it done. Instead I built three output formats simultaneously: rich terminal output, JSON, and HTML.&lt;/p&gt;

&lt;p&gt;This wasn't vanity. Each format serves a completely different consumer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terminal output&lt;/strong&gt; is for developers running scans locally during development. It needs to be immediately readable, colour-coded by severity, and show exactly the file and line number of each finding. I used Python's &lt;code&gt;rich&lt;/code&gt; library for this, which gives you nice bordered panels with colour-coded severity labels without writing a lot of custom formatting code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON output&lt;/strong&gt; is for machines. CI/CD pipelines, SIEM systems, dashboards, and any downstream tooling that needs to process findings programmatically. The JSON schema includes everything: finding ID, title, severity, category, CWE, OWASP reference, file path, line number, matched content, and remediation guidance. That's a schema a security team could ingest into Splunk or Elastic without modification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTML output&lt;/strong&gt; is for stakeholders. Developers understand terminal output. Product managers and engineering leads don't. The HTML report is a self-contained file — no server required, just open it in a browser — with severity filtering and full remediation guidance. You generate it, you email it, anyone can read it.&lt;/p&gt;

&lt;p&gt;The design principle here is that a security tool's effectiveness is limited by how well its output reaches the people who need to act on it. Building three output formats from the start wasn't over-engineering — it was thinking about the full workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 4: CI/CD Exit Codes and Configurable Severity Thresholds
&lt;/h2&gt;

&lt;p&gt;This is where the tool goes from "interesting project" to "actually useful in production."&lt;/p&gt;

&lt;p&gt;The scanner exits with code &lt;code&gt;1&lt;/code&gt; when findings meet or exceed a configurable severity threshold. In practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fail the build on any HIGH or CRITICAL finding&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SAST Scan&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;docker run --rm -v ${{ github.workspace }}:/src \&lt;/span&gt;
      &lt;span class="s"&gt;sast-tool /src \&lt;/span&gt;
      &lt;span class="s"&gt;--fail-on HIGH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Audit mode — run without failing the build&lt;/span&gt;
python main.py ./src &lt;span class="nt"&gt;--fail-on&lt;/span&gt; none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--fail-on&lt;/code&gt; flag is the key design decision here. It lets teams adopt the tool incrementally:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start in audit mode (&lt;code&gt;--fail-on none&lt;/code&gt;). Get a baseline of your existing findings. Don't break anything.&lt;/li&gt;
&lt;li&gt;Tighten to &lt;code&gt;--fail-on CRITICAL&lt;/code&gt;. Only the most severe issues block releases.&lt;/li&gt;
&lt;li&gt;Over time, tighten to &lt;code&gt;--fail-on HIGH&lt;/code&gt; as the codebase gets cleaned up.
This reflects something I learned from running Snyk against a production Node.js codebase: you can't go from zero security gates to blocking every high-severity finding overnight. The build will fail constantly and engineers will start disabling the check to ship. Incremental adoption with configurable thresholds is how security tooling actually gets embedded into teams.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Decision 5: Docker-First Distribution
&lt;/h2&gt;

&lt;p&gt;The scanner ships as a Docker image. Local Python installation is an option, but Docker is the primary recommended path.&lt;/p&gt;

&lt;p&gt;Why? &lt;strong&gt;Zero dependency hell.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Python dependency management is a known pain point. Different teams run different Python versions. &lt;code&gt;pip install&lt;/code&gt; on one machine behaves differently on another. A tool that fails to install never gets used.&lt;/p&gt;

&lt;p&gt;Docker eliminates this. One command, any machine with Docker installed, consistent results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:/src sast-tool /src
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI/CD integration — which is where this tool matters most — Docker is even more natural. GitHub Actions, GitLab CI, Jenkins — they all run steps in containers. A Docker-first tool drops into any pipeline without configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd add tree-sitter for at least two or three languages.&lt;/strong&gt; Python and JavaScript are well-supported, and adding AST-based passes for those two languages would dramatically reduce false positives on the most commonly scanned codebases. The regex engine would remain the fallback for everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd add a findings baseline.&lt;/strong&gt; The first time you scan a legacy codebase, you might get 200 findings. That's not useful — it's noise. A baseline file that records the current state of findings and only alerts on &lt;em&gt;new&lt;/em&gt; ones since the last scan is critical for real-world adoption. Snyk does this. I didn't build it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd invest more in the HTML report.&lt;/strong&gt; The current version is functional but basic. A proper interactive report with trend data across multiple scans, drill-down into individual findings, and a remediation progress tracker would make it genuinely compelling for security leadership conversations.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Building This Taught Me
&lt;/h2&gt;

&lt;p&gt;Understanding how SAST tools work made me a better consumer of them. When I run Snyk or Semgrep now, I have a much clearer mental model of what's happening under the hood, why certain findings are false positives, and what "confidence level" actually means.&lt;/p&gt;

&lt;p&gt;The design decisions in a security tool aren't just engineering decisions — they're security decisions. Choosing regex over AST isn't just a technical tradeoff; it's a decision about your false positive rate, which is a decision about how much friction you introduce into developer workflows, which determines whether the tool actually gets used.&lt;/p&gt;

&lt;p&gt;Building something from scratch is still one of the fastest ways to understand a domain deeply.&lt;/p&gt;




&lt;p&gt;The full source code is on GitHub at &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;. The rules are all in &lt;code&gt;rules/&lt;/code&gt;, the engine is in &lt;code&gt;sast/&lt;/code&gt;, and there's a &lt;code&gt;vulnerable_sample.py&lt;/code&gt; you can use to test it immediately.&lt;/p&gt;

&lt;p&gt;If you want to see it in action against real vulnerable applications, I wrote about that in &lt;a href="https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-and-ran-it-against-4-famous-vulnerable-apps-heres-what-it-1ko"&gt;my previous article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next up: the regex vs AST debate in depth — when pattern matching is good enough, and when it'll get you into trouble.&lt;/p&gt;

</description>
      <category>security</category>
      <category>design</category>
      <category>python</category>
      <category>owasp</category>
    </item>
    <item>
      <title>I Built a SAST Scanner from Scratch and Ran It Against 4 Famous Vulnerable Apps — Here's What It Found</title>
      <dc:creator>Patience Mpofu</dc:creator>
      <pubDate>Wed, 04 Mar 2026 04:47:10 +0000</pubDate>
      <link>https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-and-ran-it-against-4-famous-vulnerable-apps-heres-what-it-1ko</link>
      <guid>https://dev.to/pgmpofu/i-built-a-sast-scanner-from-scratch-and-ran-it-against-4-famous-vulnerable-apps-heres-what-it-1ko</guid>
      <description>&lt;p&gt;Static Application Security Testing (SAST) tools are a staple of any mature AppSec programme. Tools like Semgrep, Bandit, and SonarQube are used daily by security engineers to catch vulnerabilities before code ships to production. But how do they actually work under the hood?&lt;/p&gt;

&lt;p&gt;As part of my transition from 13 years of software engineering into application security, I built my own SAST scanner from scratch in Python and ran it against four of the most well-known intentionally vulnerable applications in the OWASP ecosystem. This post covers what I built, how I tested it, and what the results tell us about real-world vulnerability patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;The tool is a language-agnostic, regex-based static analysis scanner with a YAML-driven rule engine. The core design decisions were:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;YAML rules over hardcoded logic.&lt;/strong&gt; Every detection is a YAML file — no code changes required to add new vulnerability patterns. This mirrors how production tools like Semgrep work, and means a security team could extend the ruleset without touching the engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language-agnostic by design.&lt;/strong&gt; Rather than building AST parsers per language (which is how deeper tools work), the scanner uses regex patterns that fire across any file type. This trades some precision for breadth — it can scan Java, TypeScript, PHP, Python, Kotlin, Go, and more in a single pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three output modes.&lt;/strong&gt; Terminal output for quick scans, JSON for CI/CD pipeline integration (with configurable exit codes to fail builds), and an HTML report for sharing findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker-first.&lt;/strong&gt; The entire tool runs as a container — no local Python environment needed. Mount your codebase, get a report.&lt;/p&gt;

&lt;p&gt;The scanner covers 27 rules across five categories mapped to CWE identifiers and OWASP Top 10:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Injection&lt;/strong&gt; (INJ): SQL injection, command injection, XSS, SSTI, LDAP injection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets&lt;/strong&gt; (SEC): AWS keys, hardcoded passwords, private keys, JWT secrets, database connection strings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cryptography&lt;/strong&gt; (CRYPTO): MD5/SHA-1 usage, insecure random, ECB mode, disabled TLS verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt; (AUTHN): JWT algorithm confusion, insecure session cookies, timing attacks, IDOR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Misconfiguration&lt;/strong&gt; (MISC): Path traversal, debug mode, XXE, unrestricted file upload, CORS wildcards, insecure deserialization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full source is on GitHub: &lt;a href="https://github.com/pgmpofu/sast-tool" rel="noopener noreferrer"&gt;github.com/pgmpofu/sast-tool&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Test Targets
&lt;/h2&gt;

&lt;p&gt;I chose four intentionally vulnerable applications maintained by OWASP — each written in a different language, each targeting a different slice of the vulnerability landscape:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;App&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Files Scanned&lt;/th&gt;
&lt;th&gt;Scan Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WebGoat&lt;/td&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;562&lt;/td&gt;
&lt;td&gt;6.70s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OWASP Juice Shop&lt;/td&gt;
&lt;td&gt;TypeScript/Node.js&lt;/td&gt;
&lt;td&gt;805&lt;/td&gt;
&lt;td&gt;12.62s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;PHP&lt;/td&gt;
&lt;td&gt;189&lt;/td&gt;
&lt;td&gt;1.17s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NodeGoat&lt;/td&gt;
&lt;td&gt;JavaScript/Node.js&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;1.49s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Results at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;App&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;th&gt;High&lt;/th&gt;
&lt;th&gt;Medium&lt;/th&gt;
&lt;th&gt;Low&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WebGoat&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Juice Shop&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;67&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NodeGoat&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;34&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;145&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;182&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;182 findings across four codebases in under 23 seconds of total scan time.&lt;/p&gt;




&lt;h2&gt;
  
  
  WebGoat — 92 Findings (31 Critical)
&lt;/h2&gt;

&lt;p&gt;WebGoat is a Spring Boot Java application built by OWASP specifically to teach developers about security vulnerabilities. It had the highest finding count by far, which makes sense — it's a structured learning platform with one lesson per vulnerability type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Injection was the dominant finding.&lt;/strong&gt; The scanner flagged string concatenation in SQL queries across multiple lesson files. A representative example from &lt;code&gt;SqlInjectionLesson6a.java&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"SELECT * FROM user_data WHERE last_name = '"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accountName&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"'"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is textbook SQLi — user input concatenated directly into a query string with no parameterization. The fix is straightforward: use &lt;code&gt;PreparedStatement&lt;/code&gt; with &lt;code&gt;?&lt;/code&gt; placeholders. What's notable here is that the same pattern appeared in 8 separate lesson files, each demonstrating a different variant of the attack (blind injection, union-based, error-based).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insecure deserialization showed up as Critical.&lt;/strong&gt; In &lt;code&gt;InsecureDeserializationTask.java&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ObjectInputStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ByteArrayInputStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Base64&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getDecoder&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;decode&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b64token&lt;/span&gt;&lt;span class="o"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Java's &lt;code&gt;ObjectInputStream&lt;/code&gt; is one of the most dangerous APIs in the language. Deserializing untrusted data with it can lead to remote code execution — this class of vulnerability was at the heart of the Apache Commons Collections exploit chain that affected thousands of Java applications. The scanner correctly flagged both the task file and the &lt;code&gt;SerializationHelper&lt;/code&gt; utility class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Command injection via &lt;code&gt;Runtime.exec()&lt;/code&gt;&lt;/strong&gt; was caught in &lt;code&gt;VulnerableTaskHolder.java&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Process&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;taskAction&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Passing unsanitized user input to &lt;code&gt;exec()&lt;/code&gt; is equivalent to &lt;code&gt;os.system()&lt;/code&gt; in Python — an attacker who controls &lt;code&gt;taskAction&lt;/code&gt; can run arbitrary OS commands on the server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Private key material hardcoded in source.&lt;/strong&gt; The scanner found PEM key headers in &lt;code&gt;CryptoUtil.java&lt;/code&gt; — a reminder that even educational codebases can demonstrate the exact mistakes they're teaching against.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coverage verdict: Partial.&lt;/strong&gt; WebGoat covers lessons across the full OWASP Top 10. The scanner did well on the categories it has rules for — SQL injection, deserialization, command injection, and cryptographic failures. However, several vulnerability classes present in WebGoat went undetected:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vulnerability&lt;/th&gt;
&lt;th&gt;In WebGoat&lt;/th&gt;
&lt;th&gt;Scanner Detected&lt;/th&gt;
&lt;th&gt;Gap Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Regex matched string concatenation patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Runtime.exec()&lt;/code&gt; pattern matched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insecure Deserialization&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ObjectInputStream&lt;/code&gt; pattern matched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardcoded Private Key&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;PEM header pattern matched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XXE (XML External Entity)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Scanner has XXE rule but WebGoat's parser config is in XML files, not Java — rule needs expanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Path Traversal&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;WebGoat's path traversal uses Spring's &lt;code&gt;@RequestParam&lt;/code&gt; binding, not direct &lt;code&gt;open()&lt;/code&gt; calls — SAST missed it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broken Access Control / IDOR&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires understanding of authorization logic — not detectable by regex alone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CSRF&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;No CSRF token rule exists in current ruleset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JWT Vulnerabilities&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;JWTHeaderKIDEndpoint&lt;/code&gt; SQLi was caught; algorithm confusion attack was not&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insecure HTTP Communication&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Runtime/config issue, not detectable in source code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  OWASP Juice Shop — 67 Findings (3 Critical)
&lt;/h2&gt;

&lt;p&gt;Juice Shop is a modern Node.js/TypeScript e-commerce application and the most widely used security training platform in existence. Its 805 files took the scanner 12.6 seconds — the largest codebase of the four.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The most interesting Critical finding was a private RSA key hardcoded in &lt;code&gt;lib/insecurity.ts&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-----BEGIN RSA PRIVATE KEY-----
MIICXAIBAAKBgQDNwqLEe9wgTXCbC7+RPdDbBbeqjdbs4kOPOIGzqLpXvJXlxxW8...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is intentional in Juice Shop's case — it's used to sign JWTs as part of a challenge. But in a real application, committing an RSA private key to a public repository means every JWT the application has ever issued can now be forged by anyone who cloned the repo. This is exactly the kind of finding that would be Severity 1 in a real penetration test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL injection appeared in the challenge fix files&lt;/strong&gt; — specifically in &lt;code&gt;codefixes/dbSchemaChallenge_1.ts&lt;/code&gt;. This is a meta-finding: Juice Shop includes intentionally wrong "fix" options as part of its challenge mechanic, and our scanner correctly identified that one of the proposed fixes still contained vulnerable code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sequelize&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT * FROM Products WHERE ((name LIKE '%&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;criteria&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;%'...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The bulk of findings (63 HIGH) were insecure random number generation&lt;/strong&gt; — &lt;code&gt;Math.random()&lt;/code&gt; used throughout data generation scripts. These are mostly false positives in context (seeding test data doesn't require cryptographic randomness), which is an important lesson about SAST tuning in practice. A production deployment of this tool would add &lt;code&gt;# sast-ignore&lt;/code&gt; annotations or exclude data generation files from scans. SAST tools always require triage — raw finding counts are never the whole story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coverage verdict: Low — but not because the scanner is weak.&lt;/strong&gt; Juice Shop encompasses vulnerabilities from the entire OWASP Top Ten and beyond, including categories that are fundamentally undetectable by static analysis:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vulnerability&lt;/th&gt;
&lt;th&gt;In Juice Shop&lt;/th&gt;
&lt;th&gt;Detected&lt;/th&gt;
&lt;th&gt;Gap Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;String concatenation in challenge files caught&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardcoded Private Key&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;PEM header in &lt;code&gt;insecurity.ts&lt;/code&gt; caught&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insecure Random&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (mostly FP)&lt;/td&gt;
&lt;td&gt;Correct pattern, wrong context — test data seeding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broken Access Control / IDOR&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires understanding which basket belongs to which user — dataflow, not regex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broken Authentication&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Logic flaw enforced at runtime, not visible in source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XSS (DOM-based)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;innerHTML&lt;/code&gt; rule fires on some cases; Angular template bindings missed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security Misconfiguration&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Express security headers are a runtime/config concern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NoSQL Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;No NoSQL injection rules in current ruleset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prototype Pollution&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires AST-level analysis of object property assignments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSRF&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires taint tracking from user input to HTTP call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most important gap here is the entire class of &lt;strong&gt;logic vulnerabilities&lt;/strong&gt; — IDOR, broken auth flows, business logic abuse. These are the vulnerabilities that cause real breaches and they are essentially invisible to regex-based static analysis. They require DAST (dynamic testing against a running application) or manual review.&lt;/p&gt;




&lt;h2&gt;
  
  
  DVWA — 14 Findings (0 Critical, 12 High)
&lt;/h2&gt;

&lt;p&gt;DVWA (Damn Vulnerable Web Application) is a classic PHP application. Its smaller codebase (189 files) and simpler architecture produced a tighter, more focused set of findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XSS via &lt;code&gt;innerHTML&lt;/code&gt; assignment&lt;/strong&gt; was the most frequent pattern, appearing 5 times in &lt;code&gt;vulnerabilities/authbypass/authbypass.js&lt;/code&gt;. Each instance directly assigned user-controlled data to &lt;code&gt;innerHTML&lt;/code&gt; — the canonical XSS pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;cell0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;input type="hidden" ...&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An attacker who can influence &lt;code&gt;user['user_id']&lt;/code&gt; can inject arbitrary HTML and JavaScript. The fix is to use &lt;code&gt;textContent&lt;/code&gt; for plain text or sanitize with DOMPurify before any HTML rendering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CORS wildcard&lt;/strong&gt; was flagged twice in the API vulnerability module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nb"&gt;header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Access-Control-Allow-Origin: *"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A wildcard CORS policy allows any website to make cross-origin requests to the API. When combined with sensitive endpoints, this enables cross-site request forgery at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One genuinely interesting false positive:&lt;/strong&gt; the scanner flagged &lt;code&gt;$password = "password";&lt;/code&gt; in &lt;code&gt;vulnerabilities/sqli/test.php&lt;/code&gt; as a hardcoded credential. It is technically a hardcoded password — but it's a test fixture, not an application secret. This illustrates a core challenge in SAST: the tool can't understand intent, only pattern. A mature AppSec workflow would suppress this with an inline comment and document the reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coverage verdict: Moderate.&lt;/strong&gt; DVWA has 14 named vulnerability categories. The scanner caught XSS and CORS-related issues, but missed the majority:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vulnerability&lt;/th&gt;
&lt;th&gt;In DVWA&lt;/th&gt;
&lt;th&gt;Detected&lt;/th&gt;
&lt;th&gt;Gap Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;XSS (DOM, Reflected, Stored)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;innerHTML&lt;/code&gt; caught in JS files; PHP &lt;code&gt;echo&lt;/code&gt; XSS patterns not in ruleset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CORS Wildcard&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Wildcard header pattern matched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;DVWA's SQLi is in PHP — our PHP SQL injection pattern (&lt;code&gt;$_GET&lt;/code&gt;, &lt;code&gt;$_POST&lt;/code&gt; concatenation) missing from ruleset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;DVWA uses &lt;code&gt;shell_exec()&lt;/code&gt; and &lt;code&gt;system()&lt;/code&gt; in PHP — no PHP command injection rule exists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CSRF&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;No CSRF token validation rule exists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File Inclusion (LFI/RFI)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;PHP &lt;code&gt;include($_GET['page'])&lt;/code&gt; pattern not in ruleset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File Upload&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;move_uploaded_file()&lt;/code&gt; without MIME validation — no PHP upload rule&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brute Force / Weak Session IDs&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Runtime behaviour, not detectable statically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blind SQL Injection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Same as above — PHP SQL patterns missing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CSP Bypass&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Runtime/header concern&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The DVWA results reveal a specific gap: &lt;strong&gt;the current ruleset underserves PHP&lt;/strong&gt;. Most of DVWA's vulnerabilities are in PHP server-side code using functions like &lt;code&gt;shell_exec()&lt;/code&gt;, &lt;code&gt;include()&lt;/code&gt;, &lt;code&gt;mysql_query()&lt;/code&gt;, and &lt;code&gt;move_uploaded_file()&lt;/code&gt;. Adding PHP-specific rules for these functions would significantly improve coverage.&lt;/p&gt;




&lt;h2&gt;
  
  
  NodeGoat — 9 Findings (0 Critical, 9 High)
&lt;/h2&gt;

&lt;p&gt;NodeGoat is a smaller Node.js application (34 files) that maps directly to the OWASP Top 10. Its low finding count reflects its size rather than its security posture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All 8 code findings were &lt;code&gt;Math.random()&lt;/code&gt; usage&lt;/strong&gt; in financial contexts — generating stock quantities and fund amounts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;funds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a real financial application, predictable random number generation for account values would be a significant finding. An attacker who can predict the seed could game the system. Here it's test data seeding, but the scanner correctly flags the pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ninth finding was particularly interesting&lt;/strong&gt; — it came from &lt;code&gt;package-lock.json&lt;/code&gt;, where a deprecated dependency's own warning message mentioned &lt;code&gt;Math.random()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"deprecated": "Please upgrade to version 7 or higher. Older versions may use Math.random()..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scanner flagged the deprecation warning itself as a finding. This is a genuine false positive — useful as a demonstration that SAST tools scan all files indiscriminately unless you configure exclusions. In practice, &lt;code&gt;package-lock.json&lt;/code&gt; and similar lockfiles should be excluded from scans.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Tells Us About SAST in Practice
&lt;/h2&gt;

&lt;p&gt;Running the tool across four codebases surfaced some clear patterns worth highlighting for any engineer new to application security:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Injection vulnerabilities are everywhere, even in educational apps.&lt;/strong&gt; SQL injection via string concatenation is one of the oldest vulnerabilities in existence — it was in the OWASP Top 10 when it was first published in 2003 and it's still there today. The fact that it appears across Java, PHP, and TypeScript codebases in the same scan demonstrates that it's a language-agnostic problem rooted in developer habit, not language design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives are inevitable and manageable.&lt;/strong&gt; Of the 182 findings, a meaningful portion are context-dependent: &lt;code&gt;Math.random()&lt;/code&gt; seeding test data, hardcoded values in test fixtures, private keys that are intentionally public for training purposes. A SAST tool's job is to surface candidates for human review, not to replace it. The value is in the signal-to-noise ratio and the speed — 182 candidates across 1,590 files in 23 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SAST is most powerful as a prevention tool, not a detection tool.&lt;/strong&gt; Every finding in these codebases was already known — they're intentionally vulnerable. The real value of SAST is catching these patterns before they reach a code review, not after they've been running in production. Embedding this kind of scanner as a pre-commit hook or CI/CD gate means developers get feedback at the moment they write the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regex-based scanning has limits.&lt;/strong&gt; Several real vulnerabilities in these apps weren't caught — NoSQL injection in NodeGoat's MongoDB queries, prototype pollution patterns, and some of the more subtle authentication bypass issues in DVWA. These require either AST-level analysis or semantic understanding of data flow that regex alone can't provide. Tools like Semgrep's taint tracking address this, and it's the natural next step for evolving this scanner.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Would Full Coverage Require?
&lt;/h2&gt;

&lt;p&gt;Across all four apps, a pattern emerges around three distinct gaps. Each requires a different technical approach to close.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Missing Rules (Fixable Now)
&lt;/h3&gt;

&lt;p&gt;The simplest gaps — vulnerabilities the scanner &lt;em&gt;could&lt;/em&gt; detect with regex but currently has no rule for. These are straightforward YAML additions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Missing Rule&lt;/th&gt;
&lt;th&gt;Target Apps&lt;/th&gt;
&lt;th&gt;Example Pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PHP SQL Injection&lt;/td&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mysql_query("SELECT * FROM users WHERE id=" . $_GET['id'])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PHP Command Injection&lt;/td&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;shell_exec($_GET['cmd'])&lt;/code&gt; or &lt;code&gt;system($_POST['input'])&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PHP File Inclusion&lt;/td&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;&lt;code&gt;include($_GET['page'])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PHP File Upload&lt;/td&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;move_uploaded_file()&lt;/code&gt; without MIME validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NoSQL Injection&lt;/td&gt;
&lt;td&gt;NodeGoat, Juice Shop&lt;/td&gt;
&lt;td&gt;&lt;code&gt;db.collection.find({$where: req.query.input})&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CSRF Token Absence&lt;/td&gt;
&lt;td&gt;DVWA, WebGoat&lt;/td&gt;
&lt;td&gt;State-changing forms/endpoints without token validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Redirect&lt;/td&gt;
&lt;td&gt;DVWA, Juice Shop&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;res.redirect(req.query.url)&lt;/code&gt; without validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server-Side Template Injection (PHP)&lt;/td&gt;
&lt;td&gt;DVWA&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;eval()&lt;/code&gt; or &lt;code&gt;preg_replace()&lt;/code&gt; with &lt;code&gt;/e&lt;/code&gt; modifier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These could be added to the ruleset in an afternoon and would immediately improve detection on PHP codebases.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Taint Tracking (Requires Architecture Change)
&lt;/h3&gt;

&lt;p&gt;Several missed vulnerabilities involve user input flowing through multiple functions before reaching a dangerous sink. For example, in NodeGoat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Input enters here (source)&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/profile&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// ...flows through several functions...&lt;/span&gt;

  &lt;span class="c1"&gt;// ...reaches the sink here&lt;/span&gt;
  &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;findOne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ObjectId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt; &lt;span class="nx"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regex can detect patterns at the sink (&lt;code&gt;findOne&lt;/code&gt; with a variable), but can't verify whether &lt;code&gt;userId&lt;/code&gt; actually came from user input without following the data flow across function boundaries. This is the core limitation of regex-based scanning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's needed:&lt;/strong&gt; A taint analysis engine that tracks data from &lt;em&gt;sources&lt;/em&gt; (HTTP request parameters, headers, cookies) to &lt;em&gt;sinks&lt;/em&gt; (SQL queries, shell commands, file paths, HTML output). This is how tools like Semgrep's Pro engine, CodeQL, and Checkmarx work. Implementing this would require moving from regex to an AST-based approach using a library like &lt;code&gt;tree-sitter&lt;/code&gt; for cross-language parsing.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Runtime/Logic Vulnerabilities (DAST Territory)
&lt;/h3&gt;

&lt;p&gt;Some vulnerabilities simply cannot be found by reading source code. They require observing the application's behaviour at runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Broken Access Control / IDOR&lt;/strong&gt; — can only be confirmed by making requests as User A and observing whether User B's data is returned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brute Force / Rate Limiting&lt;/strong&gt; — requires actually sending repeated requests and observing the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak Session IDs&lt;/strong&gt; — requires generating multiple sessions and analysing their entropy statistically
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Headers&lt;/strong&gt; — requires making an HTTP request and inspecting the response headers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Logic Flaws&lt;/strong&gt; — requires understanding the intended workflow and deviating from it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the domain of &lt;strong&gt;DAST tools&lt;/strong&gt; (project #10 and #11 on the portfolio list) and &lt;strong&gt;manual penetration testing&lt;/strong&gt;. No amount of SAST improvement will close this gap — it's a fundamental constraint of static analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Takeaway: SAST + DAST + Manual = Defence in Depth
&lt;/h3&gt;

&lt;p&gt;The industry consensus is that no single tool type provides complete coverage. The standard approach in mature AppSec programmes is layered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SAST (this tool)           → Catches code-level flaws early, in the IDE or CI pipeline
SCA (project #3)           → Catches vulnerable dependencies
DAST (projects #10, #11)   → Catches runtime behaviour, logic flaws, config issues  
Manual Review              → Catches everything that requires human judgement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer has different strengths and blind spots. The goal isn't a single perfect tool — it's defence in depth across the entire SDLC.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It Yourself
&lt;/h2&gt;

&lt;p&gt;The tool is open source. To scan any codebase:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the scanner&lt;/span&gt;
git clone https://github.com/pgmpofu/sast-tool
&lt;span class="nb"&gt;cd &lt;/span&gt;sast-tool
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; sast-tool &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Scan any project&lt;/span&gt;
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /path/to/your/project:/src &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/reports:/reports &lt;span class="se"&gt;\&lt;/span&gt;
  sast-tool /src &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--exclude&lt;/span&gt; &lt;span class="s2"&gt;"node_modules"&lt;/span&gt; &lt;span class="s2"&gt;"build"&lt;/span&gt; &lt;span class="s2"&gt;"dist"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; html &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /reports/report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HTML report includes severity filtering, CWE/OWASP references, and remediation guidance for every finding.&lt;/p&gt;

&lt;p&gt;Contributions and additional rule PRs are welcome — particularly for Terraform misconfiguration patterns, Kubernetes YAML security issues, and deeper Java deserialization gadget detection.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of a series documenting my transition from software engineering into application security. Next up: building a dependency vulnerability auditor that cross-references your &lt;code&gt;pom.xml&lt;/code&gt;, &lt;code&gt;build.gradle&lt;/code&gt;, and &lt;code&gt;package.json&lt;/code&gt; against the NVD and OSV databases.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>programming</category>
      <category>javascript</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
