Lucky

Posted on Jun 11

The 2026 State of GitHub Security: What 100 Repos Taught Me About Dependency CVEs and AI Code

#cybersecurity #debuggix #opensource #news

Introduction

Three months ago, I started an experiment. I took 100 GitHub repositories some huge, some tiny, some built by AI, some maintained for a decade and ran them through 9 security engines.

The goal was simple: understand the actual state of code security in 2026. Not marketing claims. Not vendor reports. Real data from real repositories.

What I found surprised me. Not because it was shocking, but because it was consistent.

Every single repository had at least one security issue. Every one.

This is not a headline designed to scare you. It is a statement of fact based on running Semgrep, Bandit, Gitleaks, TruffleHog, Trivy, ESLint, Hadolint, Checkov, and OSV-Scanner across 100 codebases of varying sizes, languages, and purposes.

Here is what the data actually shows.

Finding One: Dependency CVEs Are Universal

The most consistent finding across all 100 repositories was the presence of dependency vulnerabilities.

Not some repositories. Not most repositories. Every single repository scanned had at least one CVE in its dependency tree.

The most common vulnerable packages were protobufjs, xmldom, axios, and Hono. These are not obscure libraries. They are foundational to large portions of the JavaScript ecosystem. Protobufjs alone has over 4 million weekly downloads. Axios has over 20 million.

What makes this finding significant is not that these vulnerabilities exist. It is that they exist in projects of every size. A 50-star personal project has the same dependency CVEs as a 50,000-star project maintained by a full-time team. The difference is that the larger project has a security team to catch them. The smaller project simply never finds out.

This is the gap that existing tools like Snyk and GitHub Advanced Security attempt to fill. Snyk scans dependencies and reports known CVEs. GitHub Advanced Security does the same through its dependency review feature. Trivy and OSV-Scanner also provide dependency scanning, with Trivy focusing heavily on containers and OSV-Scanner leveraging Google's open source vulnerability database.

But these tools face a common problem: they are priced for enterprises, not for individual developers. Snyk starts at $25 per user per month. Semgrep starts at $50. GitHub Advanced Security requires an Enterprise account that costs thousands per year. Trivy and OSV-Scanner are free and open source, but they are command-line tools that require installation, configuration, and integration into a workflow.

The result is a two-tier system. Large companies with budgets run automated dependency scanning. Individual developers and small teams do not. And yet the vulnerabilities are the same.

Finding Two: AI-Generated Code Shows Distinct Security Patterns

A subset of the repositories I scanned were built entirely with AI coding tools — Lovable, Bolt, Cursor, and similar platforms. These projects revealed a consistent set of security patterns.

Hardcoded API keys appeared in configuration files that were committed to the repository. Firebase configuration objects with writable database references were exposed. Input validation was frequently missing on form submissions. CORS policies were set to wildcard origins. Dependency versions were unpinned, leaving them vulnerable to future malicious updates.

None of this suggests that AI coding tools are inherently insecure. The AI builds what the developer asks for. If a developer says "build me a login form," the AI builds a login form. It does not ask whether the form should rate-limit attempts, validate email formats, or sanitize inputs. Those are security considerations, not functional requirements.

This is the difference between working code and secure code. Existing static analysis tools like Semgrep and ESLint can catch many of these issues. Semgrep, in particular, excels at custom rules for application-specific vulnerabilities. ESLint with the eslint-plugin-security plugin can flag dangerous patterns in JavaScript and TypeScript.

But both tools require configuration. Semgrep users must write or select rules. ESLint requires installing plugins and configuring rulesets. The developer using an AI coding tool is typically moving fast, often without a deep security background. They are not likely to stop and configure a static analysis tool.

The result is that AI-generated code ships with the same predictable security gaps, and most developers never know.

Finding Three: False Positives Are the Real Barrier to Adoption

One of the most telling findings came from scanning deliberately vulnerable training projects like Kubernetes Goat, WebGoat, OWASP Juice Shop, and nodejs-goof.

These projects are designed to contain security issues. Kubernetes Goat has 134 raw findings when scanned, including 2 critical and 32 high severity issues. WebGoat has 57 findings with 4 critical and 22 high.

But here is what matters: every security scanner flags these issues. Semgrep finds them. Trivy finds them. Gitleaks finds them. The challenge is not detection. It is classification.

A developer running a standard security scan on a real project might receive 134 findings. Some are real. Many are false positives from test files, build artifacts, or intentional patterns. The developer now faces a choice: spend hours triaging each finding, or ignore the scanner entirely.

This is the problem that existing tools have not solved. Snyk and GitHub Advanced Security provide prioritization features, but they still require human triage. Semgrep's false positive rate depends entirely on the quality of the rules selected. Gitleaks flags potential secrets but requires a developer to determine whether each flag is a real credential or an example key.

The technical capability exists to reduce false positives. Scanners can read documentation. They can identify test directories. They can recognize build scripts. They can learn which patterns are intentional. But most tools do not do this because they are designed to cast a wide net and let the developer sort through the catch.

Finding Four: Maintainers Fix Issues Quickly When Shown Real Problems

The most encouraging finding from this experiment was the response from maintainers.

When approached respectfully with a small number of real issues not 134 findings, but the 6 that actually mattered maintainers responded quickly.

One team fixed 3 of 4 reported issues within a week. Another fixed 9 Rust crate CVEs within hours. A third fixed unsafe PyTorch loading and HuggingFace model revision pinning on the same day. The average fix time after receiving a clear, actionable report was under 24 hours.

This suggests that the barrier to secure code is not developer willingness. It is discovery. Developers want to ship secure code. They simply do not have the time to run multiple scanners, triage hundreds of findings, and figure out which issues are real.

The tools exist. The technology works. The missing piece is a workflow that surfaces only what needs attention.

What This Means for How We Scan Code

The data from 100 repositories points to a clear conclusion.

Dependency scanning needs to be universal. Every project has CVEs. Every developer needs to know about them. This is not a problem that should require an enterprise budget.

AI-generated code needs automated security review. The patterns are predictable. Hardcoded keys, missing validation, wildcard CORS. These can be caught without developer configuration.

False positives are the enemy of adoption. A scanner that produces 134 findings produces zero action. A scanner that produces 6 findings produces fixes within 24 hours.

The infrastructure for all of this exists. Semgrep, Bandit, Gitleaks, TruffleHog, Trivy, ESLint, Hadolint, Checkov, and OSV-Scanner are all capable engines. The challenge is not building a scanner. It is building a filter that sits on top of them.

That is the problem worth solving.

What You Can Do Today

Regardless of which tools you use, here is a practical checklist based on what the data revealed.

First, scan your dependencies. If you are using JavaScript, run npm audit or yarn audit. If you are using Python, use pip-audit or Safety. If you are using Rust, use cargo audit. These are free, local, and fast. There is no excuse not to know what CVEs exist in your dependency tree.

Second, check your AI-generated code for hardcoded secrets. Run gitleaks or trufflehog on your repository. Both are free and open source. They will find API keys, tokens, and credentials committed to your codebase.

Third, look at your CORS policy. If it is set to * in production, change it. This is one of the most common findings across AI-generated projects, and one of the easiest to fix.

Fourth, pin your dependencies. Unpinned versions mean your next deployment might pull a malicious update. Tools like npm shrinkwrap, yarn.lock, and pip freeze exist for this reason.

Fifth, if you are using a security scanner, look at how it handles false positives. Does it require you to triage every finding? Does it understand your test directories? Does it read your documentation? If not, you are spending time on noise that could be spent on real issues.

Conclusion

The state of code security in 2026 is not broken. The tools work. The engines are capable. The vulnerabilities are being found.

But the workflow is broken. Security scanning should not require a full-time employee to triage false positives. It should not require an enterprise budget. It should not require hours of configuration.

The data from 100 repositories is clear. Every project has issues. Maintainers fix them when told. The only missing piece is making the process accessible to every developer, not just those with enterprise contracts.

The technology exists. It just needs to work for the people building most of the software on the internet.

This analysis was conducted using Debuggix, a platform that runs 9 security engines in parallel and applies AI filtering to separate real threats from false positives. Debuggix is free for open source projects. Paid plans for private repositories start at $29 per month. No sales calls. No enterprise contracts. More at debuggix.space.

Top comments (2)

𝐓𝐡𝐞 𝐋𝐚𝐳𝐲 𝐆𝐢𝐫𝐥 • Jun 11

Wow interesting!❤️

Lucky • Jun 11

Thanks