Gus

Posted on Mar 1 • Edited on Mar 3

AI Agents Don't Understand Secrets. That's Your Problem.

#agents #ai #opensource #security

23.8 million new secrets were leaked on public GitHub in 2024. A 25% increase year-over-year. And 70% of them are still active two years later.

Now add AI coding assistants to the mix.

GitGuardian found that repositories where GitHub Copilot is active have a 40% higher secret leak rate than the baseline: 6.4% vs 4.6%. In a controlled test, Copilot generated 3.0 valid secrets per prompt on average across 8,127 code suggestions.

AI agents write code fast. They also hardcode credentials fast. And they do it without understanding what a secret is, why it matters, or what happens when it ships.

This post walks through the problem, the real-world data, and the practical defenses you can apply today.

The numbers

These are not projections. They come from published research:

Stat	Source
23.8M secrets leaked on public GitHub in 2024	GitGuardian State of Secrets Sprawl 2025
25% year-over-year increase	GitGuardian 2025
70% of leaked secrets still active 2 years later	GitGuardian 2025
6.4% of Copilot-active repos leak at least one secret	GitGuardian Copilot Research
3.0 valid secrets per Copilot prompt (avg)	GitGuardian Copilot Research
1,212x surge in OpenAI API key leaks (2023)	GitGuardian 2024
72% of Android AI apps contain hardcoded secrets	Cybernews
196 of 198 iOS AI apps had Firebase misconfigurations	CovertLabs
11,908 live API keys in Common Crawl (2.67B web pages)	Truffle Security
35% of private repos contain plaintext secrets	GitGuardian 2025
7,000 valid AWS keys exposed on DockerHub	GitGuardian 2025
1 in 5 vibe-coded websites exposes at least one secret	RedHuntLabs
90% of leaked secrets still active after 5 days	GitGuardian 2024

The pattern is clear: AI accelerates code production. It also accelerates secret sprawl.

How AI agents leak secrets

There are five main paths:

1. Hardcoding during generation

You ask the agent to integrate Stripe. It generates:

import stripe
stripe.api_key = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"

def create_charge(amount):
    return stripe.Charge.create(amount=amount, currency="usd")

The agent doesn't know that sk_live_ is a production key. It doesn't know it should reference an environment variable instead. It saw the pattern in training data and reproduced it.

The developer reviews the code, maybe notices the key, maybe doesn't. The commit goes through. The key is now in Git history forever, even if the file is later edited.

This isn't theoretical. The Moltbook platform was built entirely by "vibe coding" (prompting an AI assistant with no manual security review). The result: 1.5 million API tokens, 35,000 user email addresses, and private agent messages exposed to the public internet. Root cause: a hardcoded Supabase API key in client-side JavaScript and Row Level Security disabled. RedHuntLabs found that 1 in 5 vibe-coded websites exposes at least one sensitive secret.

2. Context window exposure

When you paste code into a public LLM API (ChatGPT, Claude API, etc.), the prompt data may be retained by the provider for abuse monitoring or model improvement.

If that code contains credentials, those credentials are now outside your control. Even if providers don't use them for training, they exist in logs, caches, and processing pipelines you can't audit.

3. Training data memorization

When you fine-tune a model on internal repositories that contain embedded secrets, the model memorizes them. Researchers have demonstrated that fine-tuned models can regurgitate API keys, database connection strings, and private keys verbatim when prompted with related context.

Truffle Security scanned the December 2024 Common Crawl archive (400 terabytes from 2.67 billion web pages) and found 11,908 live, actively valid secrets including AWS keys and MailChimp credentials. 63% of these secrets were repeated across multiple web pages. One WalkScore API key appeared 57,029 times across 1,871 subdomains. LLMs trained on this data can't distinguish between valid and invalid secrets, so they reinforce insecure patterns in generated output.

It goes deeper than keys. Research by Irregular (February 2026) found that LLM-generated passwords are fundamentally weak. Claude's passwords tend to start with an uppercase "G" and the digit "7". ChatGPT's nearly always start with "v". A batch of 50 Claude-generated passwords produced only 30 unique results. The measured entropy: 27 bits for a 16-character password, vs. 98 bits expected for a truly random password of that length. These passwords can be brute-forced in hours. And developers are using them: the characteristic patterns appear in public GitHub repos.

4. MCP tool exfiltration

The newest vector. SANDWORM_MODE (disclosed by Socket's Threat Research Team, February 2026) is a supply chain attack where 19 malicious npm packages install rogue MCP servers into AI coding tools (Claude Code, Cursor, Windsurf, VS Code Continue). Three packages impersonated Claude Code specifically.

The attack is two-stage: first stage captures credentials and crypto keys. Second stage activates 48 hours later (with per-machine jitter) for deeper harvesting. The "McpInject" module deploys a malicious MCP server with embedded prompt injection that tells the AI agent to read SSH keys, AWS credentials, npm tokens, and .env files. It targets LLM API keys from 9 providers (OpenAI, Anthropic, Cohere, Mistral, and more). AES-256-GCM encrypted payloads for obfuscation.

The agent doesn't know it's compromised. It just follows the tool's instructions.

5. Framework-level vulnerabilities

Some AI frameworks have vulnerabilities that directly enable credential theft:

CVE-2025-68664 ("LangGrinch"): A serialization injection in LangChain Core (CVSS 9.3) allows attackers to exfiltrate environment variables containing secrets. A single prompt can trigger it indirectly by instantiating classes that make requests populated with secrets_from_env.
CVE-2025-3248 (Langflow, CVSS 9.8): Unauthenticated RCE via the /api/v1/validate/code endpoint. 361 malicious IPs observed exploiting it. Used to deploy the Flodrix botnet.
GitHub MCP Credential Theft (Invariant Labs, May 2025): Malicious GitHub Issues hijack AI agents and coerce them into exfiltrating data from private repositories. The root cause: developers use Personal Access Tokens that grant AI assistants broad access to all repos, public and private.

The three rules

Rule 1: Use a secrets manager with automatic rotation

The only way to guarantee an LLM won't leak a secret is to make sure the secret never exists in source code. Period.

Use a secrets manager:

Manager	Best for	Key feature
HashiCorp Vault	Multi-cloud, on-prem	Dynamic secrets, automatic rotation
AWS Secrets Manager	AWS-native workloads	Native IAM integration, auto-rotation
Azure Key Vault	Azure workloads	HSM-backed, RBAC integration
GCP Secret Manager	GCP workloads	IAM conditions, audit logging
Doppler	Developer-focused	Universal sync, env-agnostic

The integration pattern:

Instead of this:

# DON'T: hardcoded secret
DATABASE_URL = "postgresql://admin:s3cret@db.example.com:5432/prod"

Do this:

import os

# DO: reference from environment
DATABASE_URL = os.environ["DATABASE_URL"]

Or for more control:

import hvac

def get_secret(path):
    client = hvac.Client(url=os.environ["VAULT_ADDR"])
    secret = client.secrets.kv.v2.read_secret_version(path=path)
    return secret["data"]["data"]

DATABASE_URL = get_secret("database/prod")

The key insight: your AI agent should generate code that references secrets, never code that contains them.

When you review AI-generated code, the first thing to check is whether any string looks like a credential. If the agent hardcoded it, replace it with an environment variable or secrets manager call before committing.

Rule 2: Never paste code with credentials into public LLM APIs

Before you copy-paste code into ChatGPT, Claude, or any public API:

Grep for patterns: sk_live_, AKIA, -----BEGIN, mongodb+srv://, postgres://
Strip .env files: never include environment files in context
Sanitize connection strings: replace actual credentials with placeholders
Use local models for sensitive code: if the code touches credentials, use a local model or a private deployment with data retention controls

# Quick check before pasting code into an LLM
grep -rn "sk_live\|sk_test\|AKIA\|BEGIN.*PRIVATE\|password\s*=" ./src/

For organizations: establish a policy. Define what can and cannot be shared with external LLM APIs. Enforce it with tooling, not trust. GitGuardian, TruffleHog, and gitleaks can all scan content before it leaves your environment.

Rule 3: Pre-commit hooks with secret scanning

When AI generates code, the usual "I know where the secrets are" mental model breaks. You didn't write it. You might not recognize the credential patterns.

Automated scanning is your safety net.

Option A: gitleaks (open source, fast)

# Install
brew install gitleaks

# Add to pre-commit
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.22.1
    hooks:
      - id: gitleaks

Option B: TruffleHog (open source, deep)

# Install
brew install trufflehog

# Scan before commit
trufflehog filesystem --directory . --only-verified

Option C: GitHub Push Protection (built-in)

GitHub's push protection blocks pushes containing recognized secret patterns. Enable it at the repository or organization level:

Settings > Code security > Secret scanning > Push protection > Enable

This catches secrets at push time, before they reach the remote. It supports 200+ secret patterns from partners including AWS, GCP, Stripe, and OpenAI.

Option D: GitGuardian (SaaS, comprehensive)

# Install
pip install ggshield

# Pre-commit hook
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitguardian/ggshield
    rev: v1.34.0
    hooks:
      - id: ggshield
        language_version: python3

The important thing isn't which tool you pick. It's that you have something between the AI's output and your Git history.

What to check in AI-generated code

A quick checklist for every AI-generated code review:

[ ] No hardcoded API keys, tokens, or passwords
[ ] No connection strings with embedded credentials
[ ] No private keys or certificates
[ ] Secrets referenced via environment variables or secrets manager
[ ] No .env files committed (check .gitignore)
[ ] No credentials in comments or TODOs
[ ] No base64-encoded secrets (LLMs sometimes encode credentials)
[ ] Pre-commit secret scanning hook is active

Common patterns to watch for:

# API keys
sk_live_*, sk_test_*, pk_live_*, pk_test_*   # Stripe
AKIA[A-Z0-9]{16}                              # AWS Access Key
AIza[0-9A-Za-z-_]{35}                         # Google API
sk-[a-zA-Z0-9]{48}                            # OpenAI

# Connection strings
mongodb+srv://user:pass@cluster
postgresql://user:pass@host:5432/db
mysql://root:password@localhost

# Private keys
-----BEGIN RSA PRIVATE KEY-----
-----BEGIN OPENSSH PRIVATE KEY-----
-----BEGIN EC PRIVATE KEY-----

The CI/CD layer

Pre-commit hooks are your first line. CI/CD is your second.

# GitHub Actions: scan for secrets on every push
name: Secret Scanning
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

For organizations using SARIF output to feed into GitHub Code Scanning:

gitleaks detect --source . --report-format sarif --report-path results.sarif

This creates alerts directly in your Security tab, alongside other code scanning findings.

The MCP configuration problem

If you're using MCP servers (Claude Desktop, Cursor, Windsurf), your configuration file is another secret exposure point.

A typical insecure MCP config:

{
  "mcpServers": {
    "database": {
      "command": "npx",
      "args": ["-y", "@example/mcp-db"],
      "env": {
        "DB_PASSWORD": "s3cret_production_password",
        "API_KEY": "sk-live-abc123def456"
      }
    }
  }
}

This file sits on your local machine, often unencrypted, often in a dotfile directory. If an infostealer hits your machine (like the Vidar variant that began targeting OpenClaw configs in February 2026), these credentials are harvested along with everything else.

The fix:

{
  "mcpServers": {
    "database": {
      "command": "npx",
      "args": ["-y", "@example/mcp-db@1.2.3"],
      "env": {
        "DB_PASSWORD": "${DB_PASSWORD}",
        "API_KEY": "${API_KEY}"
      }
    }
  }
}

Reference environment variables. Pin package versions. Never hardcode credentials in MCP configs.

For automated scanning of MCP configurations:

# Scan all auto-discovered MCP client configs
aguara scan --auto --severity high

Aguara has 19 detection rules for credential leaks including API key patterns, private keys, database connection strings, and hardcoded secrets in MCP config files.

If you're fine-tuning: scan before you train

Before feeding internal code into a fine-tuning pipeline:

Scan the training corpus for secrets with gitleaks or TruffleHog
Remove or redact any files containing credentials
Strip .env files, config files, and deployment scripts from the dataset
Test the fine-tuned model with prompts designed to elicit credential recall
Monitor model outputs for patterns matching known secret formats

A model that has memorized your production database password will eventually produce it in a code suggestion. The only mitigation is to never expose it during training.

The speed problem

GitGuardian estimates that developers push a new secret to Git every 8 seconds. Over 90% of those secrets remain active 5 days after leaking. 70% are still active two years later. The industry calls these "zombie leaks": secrets that everyone forgot about, but attackers haven't.

AI agents make this worse in two ways. First, they generate secrets faster than humans can review them. Second, they normalize the pattern. When Copilot produces code with a hardcoded API key and the developer accepts it, the developer learns that this is how you integrate an API. The bad pattern spreads.

The bottom line

AI agents are powerful code generators. They are also powerful secret generators.

The same patterns that made them good at writing code (learning from millions of repositories) are the patterns that make them dangerous with secrets (reproducing what they've seen, including credentials).

Three rules:

Secrets manager with rotation. The secret never touches source code.
Never paste credentials into public LLMs. Sanitize before you share.
Pre-commit hooks. Automate the catch. Don't trust the review.

The tooling exists. The patterns are established. The data shows the problem is getting worse, not better.

AI agents don't understand secrets. That's your job.

Tools mentioned:

gitleaks (open source, pre-commit secret scanning)
TruffleHog (open source, verified secret detection)
GitGuardian (SaaS, comprehensive secret scanning)
GitHub Push Protection (built-in, 200+ patterns)
HashiCorp Vault (secrets management)
Aguara (open source, MCP config scanning, 19 credential leak rules)

Data sources:

Top comments (5)

Matthew Hou • Mar 1

The 40% higher leak rate in Copilot-active repos is a number that should be on every engineering manager's desk. Most teams evaluate AI coding tools on velocity — lines generated, PRs merged — but nobody's tracking the security regression rate.

What's especially concerning is the "valid secrets per prompt" metric. The model isn't generating random strings that look like keys — it's generating actual working credentials it saw in training data. That's a leak from someone else's repo showing up in yours.

The defense I've found most practical: treat every AI-generated commit the same way you'd treat a commit from an untrusted contributor. Pre-commit hooks with secret scanning, mandatory. Not optional. Not "we'll add it later." The 3-second scan is the cheapest security investment you'll ever make.

Gus • Mar 1

Hi Matthew! Thanks for adding your perspective.

One blind spot worth flagging: MCP config files. API keys and database passwords in plaintext, sitting in dotfile directories on developer machines. Pre-commit hooks never see them because they never touch Git.

Jan Luca Sandmann • Mar 3 • Edited

Hey Gus! I think you're spot-on and those are terrifying stats! especially the 40% higher leak rate with Copilot and those zombie secrets still active years later...

One emerging pattern I'm seeing in 2026: MCP-based tools (Cursor, Claude Code, etc.) are now prime targets for supply-chain worms like SANDWORM_MODE (Socket disclosed Feb 26), where malicious packages inject prompt exfiltration to steal LLM keys and env vars from the agent's runtime.

Adding secrets scanning specifically for MCP config files (e.g., via Aguara or custom TruffleHog rules) has caught several hardcoded ${...} placeholders that devs missed in review.

Great wake-up call post! Thanks for writing :)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.