ShadowStrike

Posted on Apr 26

How to Build a HaveIBeenPwned Breach Auditor in Python

#python #security #cli #tutorial

Version 1.0.0

Data breaches happen constantly. When credentials from one breach get reused in credential-stuffing attacks against other services, the ripple effect can last years. That's why checking whether an email address or password has appeared in a known breach is a routine first step in any security assessment.

HaveIBeenPwned (HIBP) maintains one of the most comprehensive breach databases available, with over 12 billion compromised accounts indexed. In this tutorial, you'll build a Python CLI tool that checks email addresses and passwords against that database using the HIBP API, with proper k-anonymity implementation to protect privacy.

What You'll Build

A command-line Python tool called hibp_auditor.py that:

Checks passwords using k-anonymity (your password never leaves your machine) — works without any API key
Checks email addresses against the HIBP breach database (requires paid API subscription)
Handles API rate limiting gracefully
Outputs results to console and optionally to a timestamped report file

Note: Password checking is free and works immediately. Email breach checking requires a paid HIBP API subscription (pricing started ~2024).

Prerequisites

Python 3.6 or later
requests library (pip install requests)
No API key needed for password checking (uses k-anonymity)
Optional: Paid HIBP API subscription for email breach checking (haveibeenpwned.com/API/Key)

Understanding K-Anonymity

Before diving into code, it's worth understanding why password checking with HIBP is safe - because the implementation uses k-anonymity.

The problem: If you send your password to an API to check if it's compromised, you're... sending your password to an API. That's not great.

The solution: Instead of sending the full password, HIBP's Pwned Passwords API uses a clever technique:

You hash your password locally using SHA-1
You send only the first 5 characters of that hash to the API
The API returns all pwned password hashes that start with those 5 characters
You check locally whether your full hash is in that list

This means the API never sees your actual password or even your full hash. The first 5 characters of a SHA-1 hash match thousands of different passwords, so the API can't determine which specific password you're checking.

This is k-anonymity: your query is indistinguishable from k other possible queries, where k is large enough to preserve privacy.

The Complete Script

Here's the full implementation:

#!/usr/bin/env python3
"""
HaveIBeenPwned Breach Auditor
Purpose: Check email addresses and passwords against the HIBP breach database
Author: ShadowStrike (Strategos)
License: MIT
"""

import argparse
import hashlib
import requests
import sys
import time
from datetime import datetime

# HIBP API endpoints
HIBP_BREACH_API = "https://haveibeenpwned.com/api/v3/breachedaccount/{}"
HIBP_PASSWORD_API = "https://api.pwnedpasswords.com/range/{}"

def check_email_breaches(email, api_key=None):
    """
    Check if an email address appears in known data breaches.

    Args:
        email: Email address to check
        api_key: HIBP API key (required for email checks)

    Returns:
        List of breach dictionaries or None if error
    """
    if not api_key:
        print("[ERROR] Email breach checking requires an HIBP API key")
        print("[INFO] Get a free key at: https://haveibeenpwned.com/API/Key")
        return None

    url = HIBP_BREACH_API.format(email)
    headers = {
        'hibp-api-key': api_key,
        'user-agent': 'HIBP-Breach-Auditor'
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 404:
            return []  # No breaches found (good news!)
        elif response.status_code == 429:
            print("[ERROR] Rate limit exceeded - wait and try again")
            return None
        else:
            print(f"[ERROR] API returned status code: {response.status_code}")
            return None

    except requests.exceptions.RequestException as e:
        print(f"[ERROR] Network error: {e}")
        return None

def check_password_pwned(password):
    """
    Check if a password appears in known breaches using k-anonymity.

    This uses the Pwned Passwords API with k-anonymity - only the first 5 
    characters of the SHA-1 hash are sent to the API, protecting privacy.

    Args:
        password: Password to check (never sent to API in plain text)

    Returns:
        Tuple of (is_pwned: bool, count: int) or (None, None) if error
    """
    # Hash the password locally
    sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()

    # Send only the first 5 characters
    prefix = sha1_hash[:5]
    suffix = sha1_hash[5:]

    url = HIBP_PASSWORD_API.format(prefix)

    try:
        response = requests.get(url, timeout=10)

        if response.status_code == 200:
            # Parse the response - each line is "suffix:count"
            hashes = response.text.split('\r\n')

            for hash_line in hashes:
                if ':' in hash_line:
                    hash_suffix, count = hash_line.split(':')
                    if hash_suffix == suffix:
                        return (True, int(count))

            # Hash not found in response = password not pwned
            return (False, 0)

        elif response.status_code == 429:
            print("[ERROR] Rate limit exceeded")
            return (None, None)
        else:
            print(f"[ERROR] API returned status code: {response.status_code}")
            return (None, None)

    except requests.exceptions.RequestException as e:
        print(f"[ERROR] Network error: {e}")
        return (None, None)

def format_breach_info(breach):
    """Format a breach dictionary into readable output"""
    name = breach.get('Name', 'Unknown')
    domain = breach.get('Domain', 'N/A')
    breach_date = breach.get('BreachDate', 'Unknown')
    pwn_count = breach.get('PwnCount', 0)
    data_classes = ', '.join(breach.get('DataClasses', []))

    return f"""
  Breach: {name}
  Domain: {domain}
  Date: {breach_date}
  Accounts: {pwn_count:,}
  Data: {data_classes}
"""

def main():
    parser = argparse.ArgumentParser(
        description='Check email addresses and passwords against HaveIBeenPwned database',
        epilog='Example: python hibp_auditor.py --email test@example.com --api-key YOUR_KEY'
    )

    parser.add_argument('--email', type=str,
                        help='Email address to check for breaches')
    parser.add_argument('--password', type=str,
                        help='Password to check (uses k-anonymity - safe)')
    parser.add_argument('--api-key', type=str,
                        help='HIBP API key (required for email checks)')
    parser.add_argument('--output', type=str,
                        help='Write results to file (default: console only)')

    args = parser.parse_args()

    if not args.email and not args.password:
        parser.print_help()
        sys.exit(1)

    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    results = []
    results.append(f"HIBP Breach Audit Report - {timestamp}")
    results.append("=" * 60)
    results.append("")

    # Check email if provided
    if args.email:
        print(f"\n[*] Checking email: {args.email}")
        results.append(f"Email: {args.email}")

        breaches = check_email_breaches(args.email, args.api_key)

        if breaches is None:
            results.append("  Status: ERROR - Could not complete check")
        elif len(breaches) == 0:
            print("[OK] No breaches found - this email is clean!")
            results.append("  Status: CLEAN - No breaches found")
        else:
            print(f"[WARNING] Found in {len(breaches)} breach(es):")
            results.append(f"  Status: COMPROMISED - Found in {len(breaches)} breach(es)")
            results.append("")

            for breach in breaches:
                breach_info = format_breach_info(breach)
                print(breach_info)
                results.append(breach_info)

        results.append("")

        # Rate limiting courtesy
        if args.password:
            time.sleep(1.5)

    # Check password if provided
    if args.password:
        print(f"\n[*] Checking password (using k-anonymity)...")
        results.append("Password: [REDACTED]")

        is_pwned, count = check_password_pwned(args.password)

        if is_pwned is None:
            results.append("  Status: ERROR - Could not complete check")
        elif is_pwned:
            print(f"[WARNING] Password found in {count:,} breaches!")
            print("[ADVICE] This password is compromised - change it immediately")
            results.append(f"  Status: PWNED - Found in {count:,} breaches")
            results.append("  Advice: Change this password immediately")
        else:
            print("[OK] Password not found in known breaches")
            results.append("  Status: CLEAN - Not found in known breaches")

        results.append("")

    # Write to file if requested
    if args.output:
        try:
            with open(args.output, 'w', encoding='utf-8') as f:
                f.write('\n'.join(results))
            print(f"\n[*] Report written to: {args.output}")
        except IOError as e:
            print(f"\n[ERROR] Could not write to file: {e}")

    print("\n[*] Audit complete")

if __name__ == "__main__":
    main()

How to Use It

Installation

# Install dependencies
pip install requests


# Download the script
# (or clone from GitHub - link at end of tutorial)

# Linux/Mac only - make executable
chmod +x hibp_auditor.py

# Windows - no chmod needed, just run with python
python hibp_auditor.py --help

Security Note: Read the Code Before Running It

Before executing this or any Python script from the internet:

Read the source code completely — every function, every API call, every file operation
Verify the logic — does it do what it claims and nothing else?
Check for risks — unexpected network calls, file access, credential storage

Apply the ABC principle:

Assume nothing
Believe nothing
Check everything

Never execute code you haven't personally reviewed and understood, regardless of where it came from. You are the final security control.

Check a Password (No API Key Required)

python hibp_auditor.py --password "password123"

Sample output:

[*] Checking password (using k-anonymity)...
[WARNING] Password found in 2,254,650 breaches!
[ADVICE] This password is compromised - change it immediately

[*] Audit complete

Or with a strong password:

[*] Checking password (using k-anonymity)...
[OK] Password not found in known breaches

[*] Audit complete

Save Results to File

python hibp_auditor.py --email test@example.com --api-key YOUR_KEY --output report.txt

Check an Email Address (Requires Paid API Key)

Note: Email breach checking requires a paid HIBP API subscription. If you don't have an API key, the tool will show an error and direct you to the HIBP website.

python hibp_auditor.py --email test@example.com --api-key YOUR_API_KEY

Sample output (with valid API key):

[*] Checking email: test@example.com
[WARNING] Found in 3 breach(es):

  Breach: Adobe
  Domain: adobe.com
  Date: 2013-10-04
  Accounts: 152,445,165
  Data: Email addresses, Password hints, Passwords, Usernames

  Breach: LinkedIn
  Domain: linkedin.com
  Date: 2012-05-05
  Accounts: 164,611,595
  Data: Email addresses, Passwords

Code Walkthrough

Email Breach Checking

The check_email_breaches() function hits the HIBP API v3 endpoint:

url = HIBP_BREACH_API.format(email)
headers = {
    'hibp-api-key': api_key,
    'user-agent': 'HIBP-Breach-Auditor'
}
response = requests.get(url, headers=headers, timeout=10)

Key points:

Requires an API key (free for reasonable use)
Returns HTTP 404 if no breaches found (which we treat as good news)
Returns HTTP 429 if rate-limited (wait and retry)
Returns JSON array of breach objects if compromised

Password Checking with K-Anonymity

The check_password_pwned() function implements the k-anonymity protocol:

# Hash locally
sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()

# Send only first 5 chars
prefix = sha1_hash[:5]
suffix = sha1_hash[5:]

url = HIBP_PASSWORD_API.format(prefix)
response = requests.get(url, timeout=10)

# Check if our full hash is in the response
for hash_line in hashes:
    hash_suffix, count = hash_line.split(':')
    if hash_suffix == suffix:
        return (True, int(count))

Why this is safe:

Password is hashed locally with SHA-1
Only the first 5 characters of the hash are sent
API returns ~500-1000 hash suffixes matching that prefix
We check locally if our full hash is in that list
The API never learns which specific hash we're checking

Rate Limiting Courtesy

The HIBP API has rate limits. The script implements courtesy delays:

if args.password:
    time.sleep(1.5)  # Wait between email and password checks

For production use checking multiple emails, implement exponential backoff when hitting 429 responses.

Real-World Use Cases

Small team audit: Check all company email addresses to see who's been compromised:

for email in alice@company.com bob@company.com charlie@company.com; do
  python hibp_auditor.py --email $email --api-key YOUR_KEY --output report_$email.txt
  sleep 2  # Rate limiting courtesy
done

Password policy enforcement: Check common weak passwords against HIBP before allowing them in your system:

is_pwned, count = check_password_pwned(user_password)
if is_pwned and count > 100:
    return "This password appears in known breaches - choose another"

Incident response: When investigating a suspected breach, check if credentials have appeared in public dumps:

python hibp_auditor.py --email suspicious@victim.com --api-key YOUR_KEY

Security awareness training: Demonstrate to users how common their passwords are:

python hibp_auditor.py --password "password123"
# Shows: "Found in 9,238,454 breaches!"

Extending the Script

Bulk email checking:
Add a --email-list parameter that reads from a CSV file and checks multiple addresses with proper rate limiting.

Domain-wide audit:
Integrate with your company directory (LDAP, Azure AD) to audit all @yourcompany.com addresses automatically.

Slack/Teams notifications:
Add webhook integration to alert security teams when compromised accounts are detected.

Password strength scoring:
Combine HIBP checking with zxcvbn or similar libraries to provide holistic password strength assessment.

MFA enforcement triggers:
Automatically enforce MFA for accounts found in breaches as part of your identity management workflow.

Security Considerations

Never log passwords: The script outputs [REDACTED] instead of the actual password in reports. Maintain this practice.

API key protection: Store your HIBP API key in environment variables, not hardcoded in scripts:

export HIBP_API_KEY="your-key-here"
python hibp_auditor.py --email test@example.com --api-key $HIBP_API_KEY

TLS verification: The requests library verifies TLS certificates by default. Don't disable this.

Rate limits: Respect HIBP's rate limits. They provide this service for free — don't abuse it.

GitHub Repository

The complete script, requirements.txt, and this tutorial are available on GitHub:

ShadowStrike-CTF / hibp-breach-auditor

A Python CLI tool to check email addresses against the HaveIBeenPwned breach database.

hibp-breach-auditor

A Python CLI tool to check email addresses against the HaveIBeenPwned breach database.

View on GitHub

Conclusion

Password reuse and credential stuffing remain among the most effective attack vectors in 2026. Building a simple Python CLI tool that checks against HaveIBeenPwned's database gives you a practical way to assess exposure for individuals or small teams.

The k-anonymity implementation means you can check passwords safely without ever transmitting them, and the email breach checking provides immediate visibility into which accounts need attention.

For security teams, incident responders, and IT administrators working with small-to-medium organisations, this is a foundational tool that costs nothing to run and provides immediate actionable intelligence.

Built by ShadowStrike (Strategos) — where we build actual security tools instead of theatre 🎃.

DEV Community

How to Build a HaveIBeenPwned Breach Auditor in Python

What You'll Build

Prerequisites

Understanding K-Anonymity

The Complete Script

How to Use It

Installation

Security Note: Read the Code Before Running It

Check a Password (No API Key Required)

Save Results to File

Check an Email Address (Requires Paid API Key)

Code Walkthrough

Email Breach Checking

Password Checking with K-Anonymity

Rate Limiting Courtesy

Real-World Use Cases

Extending the Script

Security Considerations

GitHub Repository

ShadowStrike-CTF / hibp-breach-auditor

A Python CLI tool to check email addresses against the HaveIBeenPwned breach database.

hibp-breach-auditor

Conclusion

Top comments (0)