TL;DR
- AI editors frequently output MD5 or SHA-1 for password hashing -- both broken for this purpose
- Root cause: training data from pre-2015 tutorials where MD5 was common practice
- Fix: bcrypt (cost 12) or argon2. Pick one. Three-line swap.
I was reviewing a friend's side project last month. Node.js backend, clean architecture, React frontend. He'd built the whole thing with Cursor over a weekend. The auth flow looked right at a glance -- passwords were stored as hashes, login worked, tests passed.
Then I saw the hashing function.
import hashlib
hashed = hashlib.md5(password.encode()).hexdigest() # CWE-328
MD5 is not a password hashing algorithm. It never was. It's a checksum function that completes in under 1 microsecond per hash. On a consumer GPU, an attacker can run 60 billion MD5 hashes per second. The entire rockyou2024 dataset -- 10 billion passwords -- cracked in under 3 minutes.
Why AI Editors Keep Generating This
It's not a model bug. It's a training data artifact.
The internet has thousands of Python and PHP tutorials from 2008-2015 that use MD5 for password storage. Stack Overflow answers with thousands of upvotes. Old framework docs. These pages rank well in search, which means they were in the training corpus.
When you type "hash user password Python", the model pattern-matches against the most common code it's seen -- and that code is frequently from a decade ago. The model has no concept of "this was common but is now a known vulnerability". It sees frequency. MD5 appears often next to "password hash" in training data, so it appears often in output.
SHA-1 shows up for the same reason. Slightly slower than MD5, equally broken for this use case.
The Vulnerable Pattern
# CWE-328: Use of Weak Hash -- AI-generated output
import hashlib
def hash_password(password: str) -> str:
return hashlib.md5(password.encode()).hexdigest() # ❌
def verify_password(password: str, stored_hash: str) -> bool:
return hashlib.md5(password.encode()).hexdigest() == stored_hash # ❌
This is the exact output Cursor gives when you prompt "add password hashing to this Flask endpoint" with no other context. The JavaScript version is equally common:
// CWE-328 -- same problem
const crypto = require('crypto');
const hash = crypto.createHash('md5').update(password).digest('hex'); // ❌
The Fix
bcrypt is the safe default. Deliberately slow, well-audited, supported in every language:
import bcrypt
def hash_password(password: str) -> bytes:
return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12)) # ✅
def verify_password(password: str, stored_hash: bytes) -> bool:
return bcrypt.checkpw(password.encode(), stored_hash) # ✅
The rounds=12 work factor means each hash takes ~250ms. A GPU that cracks MD5 at 60B hashes/sec is reduced to ~1,000 bcrypt hashes/sec. The economics of brute force collapse.
For new projects, argon2 is the current OWASP recommendation:
from argon2 import PasswordHasher
ph = PasswordHasher()
hashed = ph.hash(password) # ✅
ph.verify(hashed, password) # ✅
Node.js:
const bcrypt = require('bcrypt');
const hash = await bcrypt.hash(password, 12); // ✅
const match = await bcrypt.compare(password, hash); // ✅
One migration note: if you have existing MD5 hashes in your database, you cannot re-hash them -- you don't have the plaintext. The correct path is re-hashing on next login. User logs in, verify against MD5, then immediately replace with bcrypt. Active users migrate naturally. Dormant accounts get a forced password reset.
Also -- add this to your .cursorrules: "For all password hashing, use bcrypt cost 12 or argon2. Never use MD5, SHA-1, or SHA-256 for password storage." One instruction. The model follows it every time.
I've been running SafeWeave for this. It hooks into Cursor and Claude Code as an MCP server and flags these patterns before I move on. That said, even a basic pre-commit hook with semgrep and gitleaks will catch most of what's in this post. The important thing is catching it early, whatever tool you use.
Top comments (0)