DEV Community

Cover image for Python Guide: How to Detect If a Domain Is a Scam
MarkPy
MarkPy

Posted on

Python Guide: How to Detect If a Domain Is a Scam

Shopping online and signing up for new websites are everyday activities, but so is stumbling across scam domains. These shady sites may take your money, steal sensitive info, or vanish after operating for only a few weeks. So, how can you tell if a domain is sketchy—before you get burned?

In this complete guide, you’ll learn how to use Python to automatically screen websites for scam signals. You’ll see why and how each check works, get a working script, learn to interpret results, and discover how to tailor it for your needs.

What You'll Learn

  1. Why scam domains are so hard to spot
  2. Which technical signals matter most—and why
  3. How to fetch WHOIS, DNS, HTTPS, and content info in Python
  4. A full script that weighs each check to give you a risk score
  5. How to interpret results wisely (and avoid false positives)
  6. Where to look for deeper verification or more advanced checks

Why Are Scam Domains So Common?

Scammers can create a slick web store or fake landing page in minutes. Most use:

  • Cheap or free domains registered in the last year, often just weeks ago
  • WHOIS privacy shields to hide their real identity
  • No real email setup—just a web form, if that
  • Broken or missing HTTPS
  • Aggressive sales or big discounts (to lure impulse buyers)
  • Almost no “real” policy pages, social proof, or company footprint
  • Many legitimate startups show some of these signals at first, of course. But the more red flags you spot together, the higher the risk.

Red Flags You Can Automatically Check

  • Domain Age: Was the website registered in the last few months? Most scams use brand new domains.
  • WHOIS Privacy: When a domain owner hides behind privacy services (like WhoisGuard, DomainsByProxy), you can’t verify them.
  • No MX Record: Real businesses usually have a public email setup. Scam sites often don’t bother.
  • HTTPS/SSL: No HTTPS or expired certificates are big trust issues.
  • Suspicious On-Page Content: Language like “70% off today only!” and generic “secure checkout” badges are classic scam tactics.
  • Missing or Fake Contact/Policy Pages: If there’s no easy way to reach out, or refund and privacy policies are missing or copy-pasted, beware.
  • Each single signal isn’t proof of a scam, but several together raise the odds considerably.

What you'll need:

  • Python 3.7 or above
  • Required libraries: python-whois, requests, beautifulsoup4, dnspython, tldextract Install them with: pip install python-whois requests beautifulsoup4 dnspython tldextract

The Python Code Explained

Below is a script that does the following:

  • Fetches WHOIS info to check domain age and privacy
  • Checks DNS records for email (MX)
  • Tries to fetch homepage using HTTPS (and falls back to HTTP)
  • Scrapes for suspicious text (flash sales, missing policies, trust badges)
  • Combines evidence into a risk score and verdict
  • Just fill in your target domain as a command line argument.
import re
import json
import whois
import requests
import dns.resolver
from bs4 import BeautifulSoup
from datetime import datetime, timezone
import tldextract

HEADERS = {"User-Agent": "Mozilla/5.0 (DomainRisk/0.1)"}
TIMEOUT = 10

def domain_age_days(w):
    created = w.get("creation_date")
    if isinstance(created, list): created = created[0] if created else None
    if not isinstance(created, datetime): return None
    if created.tzinfo is None: created = created.replace(tzinfo=timezone.utc)
    return (datetime.now(timezone.utc) - created).days

def whois_privacy(w):
    text = " ".join(str(w.get(k, "")).lower() for k in ["registrar","org","name"])
    return any(t in text for t in ["privacy","proxy","whoisguard","redacted","withheld"])

def resolve_dns(domain):
    out = {"A": [], "MX": []}
    try: out["A"] = [r.to_text() for r in dns.resolver.resolve(domain, "A")]
    except Exception: pass
    try: out["MX"] = [r.to_text() for r in dns.resolver.resolve(domain, "MX")]
    except Exception: pass
    return out

def fetch(url):
    try:
        r = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
        if 200 <= r.status_code < 400: return r.text
    except Exception: pass
    return None

def text_signals(html):
    soup = BeautifulSoup(html, "html.parser")
    text = re.sub(r"\s+", " ", soup.get_text(" ").lower())
    signals = {
        "aggressive_discounts": bool(re.search(r"\b(\d{2,3})% off\b|flash sale|limited time", text)),
        "no_contact_info": not any(k in text for k in ["contact us","email","phone","address"]),
        "no_returns_policy": not any(k in text for k in ["refund","returns","return policy"]),
    }
    trust_imgs = [img for img in soup.find_all("img", alt=True) if "trust" in img.get("alt","").lower()]
    signals["trust_badges_unverified"] = any(img.parent.name != "a" for img in trust_imgs)
    return signals

def risk_score(signals):
    weights = {
        "domain_very_new": 20,
        "whois_privacy": 5,
        "no_mx": 4,
        "no_https": 8,
        "aggressive_discounts": 10,
        "no_contact_info": 10,
        "no_returns_policy": 8,
        "trust_badges_unverified": 8,
    }
    return sum(weights[k] for k, v in signals.items() if v and k in weights)

def analyze(domain):
    ext = tldextract.extract(domain)
    norm = ".".join(p for p in [ext.domain, ext.suffix] if p)
    if ext.subdomain: norm = f"{ext.subdomain}.{norm}"

    w = whois.whois(norm) or {}
    age = domain_age_days(w)
    dns = resolve_dns(norm)
    https_ok = fetch(f"https://{norm}") is not None
    html = fetch(f"https://{norm}") or fetch(f"http://{norm}")

    signals = {
        "domain_very_new": age is None or age < 90,
        "whois_privacy": whois_privacy(w),
        "no_mx": len(dns.get("MX", [])) == 0,
        "no_https": not https_ok,
    }
    if html:
        signals.update(text_signals(html))

    score = risk_score(signals)
    band = "High Risk" if score >= 50 else ("Moderate Risk" if score >= 30 else "Lower Risk")
    return {
        "domain": norm,
        "age_days": age,
        "dns": dns,
        "signals": signals,
        "risk_score": score,
        "risk_band": band,
    }

if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: python scan.py <domain>")
        raise SystemExit(1)
    print(json.dumps(analyze(sys.argv[1]), indent=2))
Enter fullscreen mode Exit fullscreen mode

Running the Script

  1. Save your script as scan.py.
  2. Open your terminal.
  3. Type: pip install python-whois requests beautifulsoup4 dnspython tldextract
  4. To test a domain, type: python scan.py example.com

You’ll get a clear JSON output with:

  • Domain age
  • DNS status (especially MX/email)
  • Aggressive discount detection, missing policies, unverified trust badges
  • A risk_score and one of: High Risk, Moderate Risk, Lower Risk

Real World Example

Want to see this in action? Here’s a review where an automatic “robot” check was part of catching a likely scam:
CKlinen.com: A Scam Fashion Store Review

Conclusion

With a little Python and the right checks, you can screen for scam domains faster and more accurately than ever before. Stay curious, share what works, and help keep friends and family safe when they shop online.

If you have suggestions or want to contribute improvements to this script, leave a comment or open a GitHub gist—every bit helps in the fight against online fraud!

Top comments (0)