Dwayne McDaniel for GitGuardian

Posted on May 15 • Originally published at blog.gitguardian.com

The Bot Left a Fingerprint: Detecting and Attributing LLM-Generated Passwords

#security #ai #research #programming

In February 2026, researchers at Irregular published a detailed post about LLM-generated passwords, showing how passwords generated by LLMs follow notable patterns and are generally highly predictable.

The root cause is fundamental: LLMs are optimized to predict probable outputs, which is the exact opposite of what secure password generation demands.

That observation raised a natural follow-on question: if LLMs leave statistical fingerprints in the passwords they generate, can those fingerprints be detected and attributed? Can we look at a password found in a leaked dataset and say which model generated it? More importantly, can we measure how widely those LLM passwords are used in the wild? That is what this research set out to answer.

Extending the perimeter

We extended the scope of the analysis to 40 LLM models from 11 providers, including both closed-source (OpenAI GPT, Anthropic Claude, etc) and open-source (Qwen, DeepSeek, etc) models. We increased the password sample size to 200 for better statistical accuracy, generating a total dataset of 8,000 passwords.

An initial analysis confirmed the original findings: we observe a bias in generated passwords, inconsistent across models:

Anthropic's models show poor uniqueness: Claude Opus 4.6 is the worst, with only 35% of unique passwords.
Open-source Qwen, Llama, and Gemma models show between 50 and 60% uniqueness.
The GPT-5 family generates only unique passwords.

The uniqueness of generated passwords does not guarantee their security. Generated passwords tend to follow similar patterns and use common substrings. Nearly all models follow the same "upper, digit, symbol, lower" pattern repetition:

Anthropic models lock position 0 firmly: claude-opus always starts lowercase (100%), claude-haiku and claude-sonnet-4.6 always start uppercase (100%).
Llama models are 99–100% uppercase at position 0.
GPT-4.1-mini is 92% uppercase at position 0.

All models exhibit a strong statistical deviation from random. The most common substrings per model (factor vs. random in parentheses):

gpt-5.2: the 7! bigram in 52% of passwords (x4.5k) and vQ7!mZ substring in 6% (x41B)
Mistral-medium-3.1: the x7#pL9 substring in 65% of passwords (x448B)
Llama-3.3-70b-instruct: the 8d bigram in all passwords, and Gx#8dL in 96% — the worst score of all models.

Interestingly, some substrings are shared across multiple providers:

The L2 bigram appears in passwords of 10 out of 11 providers, with an average probability of 27% (x114)
The #kL9 substring appears in passwords from 4 providers (mistralai, deepseek, qwen, openai) at an average probability of 13% (x954M)

Fighting robots with a rusty sword

The previous results suggested that modeling LLM-generated passwords could be done using Markov chains — a mathematical model introduced by Russian mathematician Andrei Markov in 1906, a full 100 years before LLMs.

For password recognition, a Markov chain can be as simple as:

One state for each letter of the alphabet
Transitions set to the probability of encountering a character after the current state

We used the LLM-generated password sample to build multiple Markov chains:

One chain per selected model
One chain per model family or provider
One chain aggregating the whole LLM password dataset

Results:

The chains identify the right model in 55% of cases and the correct provider in 65% of cases.
The generic chain was, on average, half as surprised when seeing an LLM-generated password as when seeing a random value or generic password.

Hunting bot passwords in the wild

We classified a sample of passwords collected by GitGuardian's public monitoring platform: 34 million passwords observed on GitHub between November 2025 and March 2026.

In a conservative approach, we considered a password LLM-generated if:

A model-specific chain predicts it with >75% confidence
A provider-specific chain predicts it with >75% confidence
The general chain sees the password with a perplexity level <100

(We excluded xAI models because they often generate non-random-looking passwords like P@ssw0rdS3cur3!2023, which would capture weak human-generated passwords.)

With this method, we classified 28,000 passwords as LLM-generated. The most predicted providers are Anthropic, Qwen, and Google — representing 63% of all occurrences.

Provider	Count
Anthropic	7951
Qwen	6643
Google	3184
OpenAI	2812
Amazon	2661
Mistralai	1710
Meta Llama	1498
Cohere	1405
Deepseek	182
Microsoft	91

Anthropic passwords are the most certain candidates, with an average confidence level of 92%.

LLM-generated passwords have been committed consistently at an average rate of 1,500 per week during the study timeframe.

The passwords are mostly contained in JSON files, but a significant proportion are hardcoded in source and configuration files. 1,800 .env files were found to contain at least one LLM-generated secret — including application security keys, encryption keys, and passwords for third-party services.

# Company AWS Services Configuration
USE_AWS_SERVICES=true
AWS_REGION=us-east-1

# Database
DATABASE_PASSWORD=Kx9mP2vQ8nR5tY7w

A typical LLM-generated password used to connect to a database.

"lx01" = {
  name     = "lx01"
  password = "x7QpL2n9V8F5"
}

A typical LLM-generated password used as the default password of Terraform-generated machines.

Among the 8,700 commits containing a predicted Anthropic-generated password, 41 are announced as co-authored by Claude — validating that AI Agents can independently generate and hardcode passwords in code.

What this means in practice

The prevalence of LLM-generated passwords is not widespread when compared to total leaked passwords. But two behaviors are worth noting.

Some people are using LLMs as password generators

We observed LLM-generated passwords used in connection strings to web and database services. Unless the coding agent configured those services, a user purposely asked an AI to generate their password.

Asking an AI agent to generate a password is a bad practice. The password transits through the network, the LLM provider will know it, and it might end up logged in an agent log file — leaked before even reaching the developer's machine. And as we've shown, those passwords will be weak and mostly predictable.

AI Agents autonomously generate and hardcode passwords

We observed LLM-generated passwords hardcoded in Terraform files and committed by AI agents. While not widespread, this behavior exists — and even if the code isn't publicly leaked, the password predictability enables efficient online enumeration.

This risk should be taken into account when designing your AI security policy. Some of the passwords in this category were generated using flagship models, so the issue isn't limited to older or entry-level models.

Attacking and defending

Markov chains could also be used to attack LLM-generated passwords with much higher efficiency than brute-force. Common password-cracking tools already implement a Markov mode:

$ calc_stat opus_46.passwords opus_46.john.stats

$ vim john.conf
[Markov:opus46]
Statsfile = opus_46.john.stats
MkvLvl = 400
MkvMaxLen = 20
MkvMinLen = 16

$ john --markov=opus46 --stdout
k7$Lm9#Qx2vP!nW4rT&jR&j
k7$Lm9#Qx2vP!nW4rT&jR&8
[...]

John The Ripper generating Opus 4.6-looking passwords.

On the defender side:

LLMs should not be used as password generators. Use a vault, password manager, or dedicated tool.
Agent-generated passwords require tight guardrails. GitGuardian's ggshield now supports scanning AI agent hook events for secrets — as simple as:

$ ggshield install -t cursor -m global
$ ggshield install -t claude -m global

Setting up ggshield to scan Claude and Cursor hook events for secrets.

It's a small step, but given what's at stake, it's one no team shipping AI-assisted code can afford to skip.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.