agenthustler

Posted on Mar 27

Building a Dark Pattern Detector: Scraping UI/UX Anti-Patterns at Scale

#python #tutorial #webdev #programming

Dark patterns manipulate users into actions they didn't intend — from hidden subscription traps to guilt-tripping opt-outs. What if you could automatically detect them? In this guide, we'll build a Python tool that scrapes websites and flags common UI/UX anti-patterns.

What Are Dark Patterns?

Dark patterns are deceptive design choices: confirmshaming ("No thanks, I don't want to save money"), hidden costs revealed at checkout, roach motels (easy to enter, impossible to leave), and forced continuity. The EU's Digital Services Act and California's CPPA now regulate these practices.

Architecture Overview

Our detector will:

Crawl target pages and extract DOM structure
Analyze text for manipulation patterns (confirmshaming, urgency)
Detect visual tricks (hidden checkboxes, tiny unsubscribe links)
Generate a dark pattern audit report

Setting Up the Scraper

import requests
from bs4 import BeautifulSoup
import re

API_KEY = "YOUR_SCRAPERAPI_KEY"

def scrape_page(url):
    payload = {
        "api_key": API_KEY,
        "url": url,
        "render": "true"  # JavaScript rendering for dynamic UIs
    }
    response = requests.get(
        "https://api.scraperapi.com", params=payload, timeout=60
    )
    return BeautifulSoup(response.text, "html.parser")

Using ScraperAPI handles JavaScript rendering — essential since dark patterns often rely on dynamic UI elements.

Detecting Confirmshaming

Confirmshaming uses guilt to push users away from opting out:

SHAME_PATTERNS = [
    r"no\s*,?\s*i\s+don'?t\s+want",
    r"i('?m|\s+am)\s+(not interested|fine without)",
    r"no\s+thanks,?\s+i('?d|\s+would)\s+rather",
    r"i\s+prefer\s+not\s+to",
    r"i\s+don'?t\s+(need|like|care)",
]

def detect_confirmshaming(soup):
    findings = []
    for link in soup.find_all(["a", "button", "span"]):
        text = link.get_text(strip=True).lower()
        for pattern in SHAME_PATTERNS:
            if re.search(pattern, text):
                findings.append({
                    "type": "confirmshaming",
                    "text": link.get_text(strip=True),
                    "element": str(link)[:200],
                    "severity": "high"
                })
    return findings

Detecting Hidden Pre-Checked Boxes

Pre-checked checkboxes for newsletters or terms are a classic trick:

def detect_prechecked_boxes(soup):
    findings = []
    for checkbox in soup.find_all("input", {"type": "checkbox"}):
        if checkbox.get("checked") is not None:
            label = ""
            label_el = soup.find("label", {"for": checkbox.get("id", "")})
            if label_el:
                label = label_el.get_text(strip=True)
            if any(kw in label.lower() for kw in
                   ["newsletter", "marketing", "partner", "third party", "agree"]):
                findings.append({
                    "type": "pre_checked_consent",
                    "label": label,
                    "severity": "high"
                })
    return findings

Urgency and Scarcity Detection

Fake urgency ("Only 2 left!") pressures users into impulse decisions:

URGENCY_PATTERNS = [
    r"only\s+\d+\s+(left|remaining|available)",
    r"(offer|deal|sale)\s+(ends|expires)\s+(soon|today|in\s+\d+)",
    r"\d+\s+people\s+(are\s+)?(viewing|watching|looking)",
    r"limited\s+(time|stock|availability)",
    r"act\s+(now|fast|quickly)",
    r"don'?t\s+miss\s+(out|this)",
]

def detect_urgency(soup):
    findings = []
    page_text = soup.get_text()
    for pattern in URGENCY_PATTERNS:
        matches = re.finditer(pattern, page_text, re.IGNORECASE)
        for match in matches:
            findings.append({
                "type": "urgency_scarcity",
                "text": match.group(),
                "severity": "medium"
            })
    return findings

Running the Full Audit

def audit_dark_patterns(url):
    soup = scrape_page(url)
    results = {
        "url": url,
        "confirmshaming": detect_confirmshaming(soup),
        "prechecked_boxes": detect_prechecked_boxes(soup),
        "urgency_scarcity": detect_urgency(soup),
    }
    total = sum(len(v) for v in results.values() if isinstance(v, list))
    results["total_findings"] = total
    results["risk_level"] = (
        "high" if total > 5 else "medium" if total > 2 else "low"
    )
    return results

# Audit multiple e-commerce sites
targets = [
    "https://example-store.com/checkout",
    "https://example-saas.com/pricing",
]
for target in targets:
    report = audit_dark_patterns(target)
    print(f"{report['url']}: {report['total_findings']} findings ({report['risk_level']})")

Scaling with Proxy Rotation

When auditing many sites, rotate proxies to avoid blocks. ThorData provides residential proxies with geo-targeting — useful when dark patterns vary by region. ScrapeOps offers a proxy aggregator that auto-rotates between providers.

Extending the Detector

Consider adding:

Cookie consent analysis: Detect reject buttons that are harder to find than accept
Price comparison: Scrape the same product from different sessions to detect dynamic pricing
Accessibility dark patterns: Hidden elements only visible to screen readers
Screenshot comparison: Use Playwright to capture visual hierarchy differences

Legal and Ethical Notes

Dark pattern detection is a legitimate audit tool. Many compliance teams use similar techniques to ensure their own sites meet DSA and CPPA requirements. Always respect robots.txt and terms of service.

Dark patterns cost consumers billions annually. Building detection tools isn't just technically interesting — it contributes to a more transparent web. The techniques here can be adapted for compliance auditing, competitive analysis, or consumer advocacy.

Happy scraping!

DEV Community