Toni Antunovic

Posted on Mar 31 • Originally published at lucidshark.com

SAST False Positives in AI-Generated Code: Why 91% of Alerts Are Noise (And How to Fix It)

#sast #security #ai #codequality

This article was originally published on LucidShark Blog.

Your SAST scanner just flagged 847 issues across a codebase that Claude Code wrote over the weekend. You stare at the list. Most of it looks like noise. You're right: it probably is.

A March 2026 study by Ghost Security scanned public GitHub repositories in Go, Python, and PHP using traditional SAST tools. Of 2,116 vulnerabilities flagged, only 180 were real. That's a 91% false positive rate. And that's on human-written code.

AI-generated code makes this dramatically worse. CodeRabbit's 2026 analysis found AI-generated code contains 2.74 times more vulnerabilities than human-written code, and Snyk's research found 48% of AI-generated code contains security flaws. Feed AI-produced code into a traditional SAST scanner and you get a deluge of findings, the overwhelming majority of which lead nowhere actionable.

The result is alert fatigue. Developers tune out. The security team loses credibility. Real vulnerabilities slip through because they're buried under a thousand false alarms. This is the state of SAST in 2026.

The Scale of the Problem
The OX Security 2026 Application Security Benchmark analyzed 216 million findings across 250 organizations. The average enterprise now faces 865,398 security alerts per year. After reachability and exploitability analysis, only 795 were critical. That's 0.092% signal. The other 99.9% was noise.

Why Traditional SAST Breaks on AI Code

Traditional SAST tools operate on deterministic rule sets. They pattern-match against known vulnerability signatures, track data flows from sources to sinks, and flag anything that resembles a dangerous construct. This approach worked reasonably well when developers wrote every line and had clear intent behind structural patterns.

AI-generated code breaks this model in several ways.

First, AI models favor recognizable patterns from their training data. They tend to write code that looks like common open-source examples, including the boilerplate security antipatterns those examples sometimes contain. A SAST tool sees the pattern and flags it, even when the context makes exploitation impossible.

Second, AI agents like Claude Code are prolific. They generate hundreds or thousands of lines in minutes. The absolute count of flagged items scales with volume even if the ratio of real issues stays constant. More code means more alerts, not necessarily more risk.

Third, AI-generated code frequently lacks the nuanced defensive comments and context that help SAST tools distinguish intentional from accidental patterns. Human developers might write // intentionally using eval here for config parsing, input is validated above. Claude Code does not annotate its decisions in ways that help static analyzers calibrate.

`// Example: SAST flags this as a potential eval injection
// But the input comes from a config file with strict schema validation
const configValue = JSON.parse(process.env.APP_CONFIG);
const handler = new Function('ctx', configValue.handlerCode)(ctx);
// Traditional SAST: CRITICAL - new Function() with dynamic code
// Reality: Controlled config input, schema-validated, no user data path
`

The SAST scanner sees new Function() with dynamic input and raises a critical. The context that would exonerate it, the schema-validated config file, is invisible to a tool that only sees the data flow, not the provenance.

The New Research: GCN-Based False Positive Prediction

A paper published to arXiv on March 11, 2026, called FP-Predictor, directly addresses this problem. The researchers built a Graph Convolutional Network (GCN) that consumes Code Property Graphs (CPGs) to predict whether a SAST finding is a true or false positive.

CPGs capture the structural and semantic relationships within code in a way that flat AST analysis cannot. They encode control flow, data flow, and program dependency relationships into a unified graph. The GCN then learns to classify findings based on graph-level features rather than simple pattern matching.

Results on the CryptoAPI-Bench benchmark: up to 96.6% accuracy. On the test set: 100%.

How CPG-Based Analysis Differs from Classic SAST
A classic SAST tool asks: "does this code match a known vulnerable pattern?" A CPG-based tool asks: "given the full structural context of this code, including how data actually flows and what conditions gate execution, is this pattern reachable and exploitable?" The difference is the difference between keyword search and semantic understanding.

The FP-Predictor research acknowledges current limitations: incomplete interprocedural control-flow representation and training data coverage. But the direction is clear. False positive reduction through ML is not a future research direction. It is an active deployment problem in 2026.

The Hybrid Approach: LLM Triage on Top of SAST Core

The most effective pattern emerging in 2026 is a two-stage pipeline. A SAST engine (Semgrep, Bandit, ESLint Security, Gosec) does the initial scan and produces findings with intermediate representations: data flow paths, source-to-sink traces, call graphs. An LLM layer then reads those representations alongside the surrounding code and makes a triage decision.

This hybrid approach consistently outperforms either layer alone. Semgrep alone has a reported precision of 35.7%. The same findings run through an LLM triage layer have shown false positive reductions of up to 91% in production deployments.

`# Traditional scan: 847 findings, 770 false positives
$ semgrep --config=auto ./src

# Hybrid approach: same codebase, 77 findings, 12 false positives  
$ lucidshark scan ./src --sast --llm-triage

# LucidShark runs the SAST tools locally, then uses the MCP connection
# to Claude Code to contextually triage each finding with codebase awareness
`

The key insight is that LLMs, trained on massive code datasets, understand common patterns, defensive idioms, and contextual signals that deterministic rules cannot capture. They can reason about whether a flagged eval() call is actually reachable with user-controlled input, whether a SQL concatenation is behind an ORM layer that sanitizes it, or whether a hardcoded value in a test file matters for production security.

Where Local-First Matters: Privacy and Speed

Cloud-based SAST-plus-LLM pipelines solve the false positive problem but introduce new ones: latency, cost, and privacy. Sending your entire codebase to a cloud API for triage is slow, expensive at scale, and raises questions about who else might see your proprietary code and findings.

If you're building with Claude Code on a local-first workflow, the security analysis should match that architecture. The SAST scans should run locally. The LLM triage should run through a local or on-prem inference path. The findings should never leave your machine unless you explicitly export them.

Why Sending Your SAST Findings to the Cloud Creates New Risk
SAST findings are a map of your codebase's weaknesses. A list of "SQL injection candidate at auth.ts:142, XSS candidate at dashboard.tsx:88, hardcoded credential at config.js:34" is highly valuable to an attacker. Cloud triage services need careful trust evaluation beyond just their core LLM quality.

This is one of the core design decisions behind LucidShark. The tool runs entirely locally. The SAST/SCA/linting pipeline executes on your machine. When LucidShark uses the Claude Code MCP integration for contextual analysis, that communication stays within your local Claude Code session: your code, your machine, your context.

What a LucidShark Scan Actually Looks Like

When you run LucidShark on an AI-generated codebase, it coordinates multiple static analysis passes in a single pipeline:

`# Install LucidShark
npm install -g lucidshark

# Run a full scan with SAST, SCA, linting, and coverage analysis
lucidshark scan ./src --format=json --output=report.json

# In Claude Code, use the MCP integration for interactive analysis
# The MCP server surfaces findings directly in your coding context
`

The output is structured and prioritized. LucidShark distinguishes between confirmed findings (with exploitable paths), probable findings (pattern matches with supporting context), and informational items (patterns worth noting but not blocking). This is the triage layer built into the tool, not bolted on afterward.

For each finding LucidShark surfaces, you get:

- The rule or analyzer that flagged it

- The data flow path from source to sink (for SAST findings)

- The dependency version and CVE references (for SCA findings)

- The contextual assessment: why this pattern appears risky in this specific location

- A remediation suggestion with a code diff

That contextual assessment is what traditional SAST cannot provide. It closes the gap between "this pattern matches a known dangerous construct" and "this specific instance, in this codebase, with this data flow, is actually exploitable."

Practical Triage: A Developer Workflow

Here's how to work with SAST findings on AI-generated code without drowning in false positives, regardless of whether you're using LucidShark or another tool.

Step 1: Separate tool categories. Linting findings (unused variables, style violations) are not security findings. Treat them differently. A real SAST pipeline focuses on security-relevant rules: injection, auth bypass, cryptographic weaknesses, insecure deserialization.

Step 2: Filter by reachability. A finding in dead code, unreachable branches, or test-only paths has near-zero production risk. Most modern SAST tools support reachability filtering. Use it.

Step 3: Prioritize by data flow completeness. A full source-to-sink trace with user-controlled input at the source is a high-priority finding. A pattern match with no confirmed input path is a candidate for triage, not immediate remediation.

Step 4: Use Claude Code for contextual triage. If you're already using Claude Code, you can paste a SAST finding directly into context:

`SAST Finding: potential SQL injection at src/db/users.ts:88
  Source: req.query.userId (user-controlled)
  Sink: db.query(`SELECT * FROM users WHERE id = ${userId}`)

Review this finding. Is the userId value validated or typed before this line?
Check the router middleware chain and any TypeScript type constraints.
`

Claude Code, with access to your codebase via MCP, can trace the actual data flow through your middleware, check TypeScript types, and confirm or deny the finding with full context. This is LLM-assisted triage in practice, no cloud SAST service required.

Do Not Use AI Triage to Dismiss Findings Wholesale
The goal of LLM triage is to prioritize and contextualize, not to rubber-stamp dismissals. If an AI assistant tells you a finding is a false positive, ask it to show you the specific code path that prevents exploitation. "It looks fine" is not a remediation. "The input is validated at line 44 by Zod schema X which rejects non-numeric values, making the SQL template injection inert" is.

The Alert Debt Problem

Veracode's 2026 State of Software Security report found that 82% of organizations now carry security debt, an 11% increase year-over-year. High-risk vulnerabilities spiked 36%. The finding that stands out: the backlog of unresolved vulnerabilities is growing faster than teams can fix them.

AI-generated code accelerates this problem. An engineer who would previously write 200 lines per day now ships 2,000. If even 1% of those lines contain a true security issue, the absolute count of unresolved vulnerabilities grows ten times faster. Traditional SAST, with its 91% false positive rate, makes this worse by obscuring the 1% that matters in a cloud of noise.

The answer is not to scan less. It's to scan smarter. Local-first, context-aware, triage-capable tooling is the way to maintain a manageable security posture while shipping at the velocity that AI coding tools enable.

LucidShark's Role in the SAST Triage Pipeline

LucidShark is purpose-built for this environment. It runs SAST (via ESLint Security, Bandit, and Semgrep rules), SCA (dependency CVE scanning), linting, coverage analysis, and duplication detection in a single local pass. The MCP integration with Claude Code means findings surface directly in your development context, not in a separate dashboard you have to remember to check.

The architecture keeps your code and your findings private. There is no upload, no cloud storage of analysis results, no third party learning from your vulnerability patterns. For teams working on proprietary codebases or operating under compliance constraints, this is a foundational requirement, not a nice-to-have.

Running LucidShark before committing AI-generated code is the equivalent of having a senior security engineer look at every diff before it lands. Except it runs in under a second and never gets tired of reviewing boilerplate.

**Start Cutting SAST Noise Today**
LucidShark is open source and installs in seconds. Run it on your next Claude Code session and see the difference between contextual security analysis and a raw SAST dump. [Install LucidShark from GitHub](https://github.com/toniantunovi/lucidshark) or follow the [quickstart guide](https://lucidshark.com/docs) to integrate it with Claude Code via MCP in under five minutes.

`npm install -g lucidshark
lucidshark scan ./src`

DEV Community