Why Cursor Keeps Generating MD5 Password Hashes in 2026

#security #webdev #ai #devsecops

TL;DR

AI editors surface MD5 hashing from training data dominated by 2008-2014 tutorials
MD5 hashes crack in milliseconds on modern GPUs -- any breach becomes full password exposure
Fix: one import swap to bcrypt (Python) or argon2id (Node) -- no architecture changes needed

I was reviewing a side project a friend built with Cursor. Flask backend, JWT auth, clean structure. It looked solid. Then I got to the password module.

hashlib.md5(password.encode()).hexdigest()

MD5. In 2026. About to go live.

He hadn't written it -- Cursor had. He'd accepted it because it ran, the tests passed, and the login flow worked. Nothing flagged a problem. Why would it? The code is syntactically correct. It's just cryptographically broken.

The Vulnerable Pattern (CWE-328)

Here's the exact output from Cursor on a basic auth route:

import hashlib

def hash_password(password: str) -> str:
    return hashlib.md5(password.encode()).hexdigest()  # CWE-328

def verify_password(password: str, stored_hash: str) -> bool:
    return hash_password(password) == stored_hash

CWE-328: use of a weak cryptographic hash for passwords. MD5 is a checksum algorithm designed to be fast. That speed is the problem. A modern GPU computes 10 billion MD5 hashes per second. A 6-character password cracks in milliseconds. A full user database dump becomes a near-complete plaintext list within hours.

Why AI Keeps Writing This

MD5 password hashing was the standard internet recommendation from roughly 2008 to 2014. Every PHP walkthrough, every "build a login system" tutorial from that era used it. StackOverflow answers recommending md5() accumulated thousands of upvotes before the security community caught up.

LLMs train on that corpus. The pattern is embedded. "Hash a password in Python" statistically surfaces MD5 because MD5 dominated the training data. The model isn't being careless -- it's doing exactly what next-token prediction does. The problem is the training data is old and the internet doesn't auto-update bad tutorials when better practices emerge.

SHA1 has the same issue. SHA256 is less bad but still wrong here -- it's too fast for password storage.

The Fix

Replace MD5 with bcrypt. It's slow by design -- the work factor is the security property.

import bcrypt

def hash_password(password: str) -> bytes:
    return bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt(rounds=12))

def verify_password(password: str, stored_hash: bytes) -> bool:
    return bcrypt.checkpw(password.encode('utf-8'), stored_hash)

rounds=12 is the current recommended baseline. Cost-12 takes ~250ms to hash -- imperceptible to a user, catastrophic for an attacker running a brute-force. The cost factor is stored in the hash itself, so increasing it later is backward compatible.

For Node.js, argon2id is OWASP's current top recommendation:

const argon2 = require('argon2');

const hash = await argon2.hash(password, { type: argon2.argon2id });
const valid = await argon2.verify(hash, password);

Both are one-dependency swaps. Existing hashes need migration -- typically handled transparently on next login with a re-hash.

Do This Right Now

Grep your AI-generated code before anything ships:

grep -r "md5\|sha1" --include="*.py" --include="*.js" .

30 seconds. Either you're clean or you find something worth fixing before it matters.

I've been running SafeWeave for this. It hooks into Cursor and Claude Code as an MCP server and flags these patterns before I move on. That said, even a basic pre-commit hook with semgrep and gitleaks will catch most of what's in this post. The important thing is catching it early, whatever tool you use.

DEV Community