29 Million Secrets Leaked on GitHub Last Year. AI Coding Tools Made It Worse.

#security #secrets #aiagents #mcp

GitGuardian published the fifth edition of its State of Secrets Sprawl report on March 27. It's the largest study of credential exposure on public GitHub, and this year's edition lands a finding that the AI agent ecosystem needs to sit with.

AI-assisted commits leak secrets at roughly twice the rate of human-only commits. And 24,008 unique secrets were found specifically in MCP configuration files.

Those aren't estimates. They're counts.

The Numbers

The headline stats from the report:

28.65 million new hardcoded secrets detected in public GitHub commits in 2025. A 34% year-over-year increase and the largest single-year jump GitGuardian has recorded.
AI-assisted commits had a 3.2% secret-leak rate, versus a 1.5% baseline across all public GitHub commits. That's roughly 2x the baseline.
AI-service credentials (API keys for LLM providers, embedding services, AI platforms) increased 81% year-over-year, reaching 1,275,105 detected leaks.
24,008 unique secrets were found in MCP configuration files on public GitHub. Of those, 2,117 were confirmed valid — live credentials sitting in public repos.
64% of valid secrets from 2022 are still active in 2026. Four years later, not revoked.

Why AI Tools Leak More

The 2x leak rate for AI-assisted commits is not a simple "AI is bad at security" story. GitGuardian's report is careful about this, and the nuance matters.

Developers remain in control of what gets accepted, edited, and pushed. AI coding tools suggest code. Humans approve it, modify it, and commit it. The leak happens through a human workflow — but the workflow has changed.

Three things are different when AI is in the loop:

Speed. AI-assisted development moves faster. More code reviewed per hour, more commits per day, more surface area for a secret to slip through. The cognitive load of reviewing AI-generated code for security issues sits on top of reviewing it for correctness.

Confidence. When a tool generates code that works, the instinct is to ship it. The review step becomes shallower. A hardcoded API key in a config block generated by an AI assistant looks the same as any other config value — unremarkable, easy to miss.

Defaults. AI coding tools generate what they've seen in training data. If thousands of public repositories contain hardcoded API keys in configuration files, that pattern gets learned and reproduced. The model isn't being malicious. It's being accurate — accurately reproducing the insecure patterns it was trained on.

The result is not a tool failure. It's a process gap: the velocity increased, but the guardrails didn't.

24,008 Secrets in MCP Configs

This finding deserves its own section because it points to something structural.

MCP (Model Context Protocol) is how AI agents connect to external tools — databases, APIs, file systems, code repositories. An MCP configuration file defines which servers to connect to, what credentials to use, and how to authenticate.

GitGuardian found 24,008 unique secrets across MCP-related configuration files on public GitHub. The report identifies a root cause that's uncomfortable: the documentation itself encourages the pattern.

Popular MCP setup guides — including official quickstarts — routinely show API keys placed directly in configuration files or command-line arguments. When the getting-started guide puts the API key inline, developers follow that pattern. When those config files get committed, the secret goes with them.

This is not surprising. It's the same pattern that plagued .env files, Docker Compose files, and Kubernetes manifests before tooling caught up. The difference is scale and timing: MCP adoption is accelerating fast, and the ecosystem's security tooling hasn't caught up yet.

Of the 24,008 secrets found, 2,117 were confirmed valid. That means 2,117 live credentials — capable of authenticating against real services — were sitting in public GitHub repositories at the time of the scan.

The Remediation Gap

Perhaps the most alarming number in the report isn't about AI at all.

64% of valid secrets detected in 2022 are still active in 2026. Four years later. Not rotated, not revoked, not expired.

This isn't a detection problem. GitGuardian detected them. The problem is what happens after detection: somebody needs to identify the secret's owner, assess its blast radius, revoke it, rotate it, update every system that depends on it, and verify nothing breaks. For most organisations, that workflow either doesn't exist or stalls at "identify the owner."

AI agents make this worse in a specific way. An AI coding tool that generates a config file with a hardcoded secret doesn't know who owns that secret, what it connects to, or what the rotation procedure is. It can't file the remediation ticket. It just writes the code and moves on.

The gap between "secret detected" and "secret revoked" is where the real risk lives. And it's growing.

What This Means for Agent Infrastructure

If you're building or operating AI agents, three things from this report should change your threat model:

1. MCP config files are a credential attack surface. Treat .cursor/mcp.json, claude_desktop_config.json, and any MCP server configuration with the same paranoia you'd apply to .env files. Don't commit them. Don't share them in Slack. Don't paste them in documentation.

2. AI-generated code needs secret scanning in the commit pipeline. Pre-commit hooks that catch secrets before they hit the repository are no longer optional. Tools like GitGuardian, TruffleHog, and detect-secrets belong in every pipeline that ships AI-assisted code. The 2x leak rate makes this arithmetic simple.

3. The output side matters as much as the input side. Most discussion about AI agent security focuses on what goes into the agent — prompt injection, poisoned context, malicious tool responses. This report is about what comes out: the code the agent writes, the configs it generates, the credentials it embeds. Output scanning — including DLP on tool call payloads — catches the secrets that pre-commit hooks miss, because not all agent output flows through git.

Honest Context

We build DLP for MCP tool calls at mistaike.ai, so we have a stake in this conversation. We're not pretending otherwise.

But the GitGuardian data stands on its own. 29 million secrets. 24,008 in MCP configs. 2x leak rate from AI-assisted code. 64% still valid after four years. These are someone else's numbers from an independent study, and they describe a problem that exists whether or not you use our product.

The practical takeaway is simple: if your AI agents generate code, configs, or tool call payloads, something needs to be scanning that output for secrets. That something could be a pre-commit hook, a CI pipeline check, a runtime DLP layer, or all three. The specific tool matters less than having the coverage at all.

Right now, most teams don't.

Sources: GitGuardian State of Secrets Sprawl 2026 (March 27, 2026) · GitGuardian blog: AI-Service Leaks Surge 81% · Help Net Security: AI frenzy feeds credential chaos · HackerNoob: AI Coding Tools Double Secret Leak Rates

Originally published on mistaike.ai