The Markup built Blacklight as an investigative tool for journalists. It visits a URL with a headless browser, inventories every third-party script that loads, and classifies what those scripts do — ad tracking, fingerprinting, session recording, keystroke capture.
I used it to scan 96 websites and cross-referenced every URL with VirusTotal's 94-vendor malware detection. Here's what I learned about web surveillance at scale.
The Stack Behind Blacklight
Blacklight runs a headless Chromium instance and monitors:
- Third-party requests — any HTTP request to a domain different from the page's own domain
- Cookies — both first-party and third-party, with classification of known tracking cookies
- Canvas fingerprinting — detects when a site draws to a hidden canvas element and reads the pixel data back (a technique for generating a unique browser fingerprint without cookies)
- Session recording — detects scripts from known session replay services (FullStory, Hotjar, etc.) that capture mouse movements, clicks, scrolls, and DOM changes
- Keystroke logging — detects scripts that attach event listeners to keypress/keydown events and transmit the captured data
- Facebook Pixel, Google Analytics remarketing, TikTok Pixel — identified by their specific script signatures
It's not a vulnerability scanner. It's a surveillance auditor. The distinction matters: a site can be malware-free (VirusTotal 0/94) while running 10 tracking scripts that report your behavior to advertising networks.
What 96 Scans Taught Me About Tracking
Distribution is bimodal
Sites cluster at either 0-2 trackers (clean) or 5-10+ trackers (heavy tracking). Very few sites sit at 3-4. The industry has roughly split into "we don't track" and "we track everything." The middle ground barely exists.
Cookies correlate weakly with trackers
You'd expect high tracker counts to mean high cookie counts. The correlation is weaker than I expected. Some sites load 6 trackers with 0 cookies (using fingerprinting instead). Others set 7 cookies with only 1 tracker (using first-party cookies for behavioral tracking without third-party scripts).
The takeaway: tracker count and cookie count measure different vectors. You need both to understand a site's surveillance posture.
Session recording is rare but targeted
Out of 96 sites, only a handful had session recording scripts. But the sites that did have it were platforms where users input sensitive information — chat messages, payment details, personal preferences. Session recording on a static content site is invasive but limited in damage. Session recording on an interactive platform where users type private messages is a fundamentally different risk.
Canvas fingerprinting is more common than expected
Multiple sites used canvas fingerprinting — a technique that's harder to block than cookies and persists across browsing sessions. The technique works by rendering invisible text and graphics to a <canvas> element, reading the pixel data back, and hashing it. The hash is unique to your combination of GPU, driver version, OS, browser, and font rendering — essentially a device fingerprint that doesn't require storage on your machine.
uBlock Origin blocks most known fingerprinting scripts. The Brave browser blocks canvas readback by default. Standard Chrome and Firefox do not.
VirusTotal as a Complement
Blacklight tells you about surveillance. VirusTotal tells you about security. They answer different questions:
- Blacklight: "Is this site watching what I do?"
- VirusTotal: "Is this site trying to harm my device?"
The overlap is smaller than you'd think. A site can have 0/94 VirusTotal flags (no malware, no phishing) while loading 10 trackers, running session recording, and fingerprinting your browser. Conversely, a site with 1/94 VirusTotal flags might have a perfect 0/0 Blacklight scan — the flag often comes from the ad network, not the site itself.
For a complete assessment of any URL, you need both tools.
How to Run Your Own Scans
Blacklight: Visit themarkup.org/blacklight, enter any URL, click Scan. Results in about 30 seconds. No account needed. Free.
VirusTotal: Visit virustotal.com, paste a URL in the search tab. Results from 94 security vendors plus community score. Free, with API access for bulk scanning.
Programmatic approach: VirusTotal has a public API. For Blacklight-style analysis at scale, you'd need to build your own headless browser pipeline — Puppeteer or Playwright with request interception to log third-party domains. The classification (which scripts are trackers vs. analytics vs. functional) is the hard part. The Markup hasn't open-sourced Blacklight's classification engine, but the EasyList and EasyPrivacy filter lists (used by uBlock Origin) provide a solid starting point for script classification.
The Project
I published all 96 scan results at NSFWRanker. Each site has an individual safety report with raw Blacklight data, VirusTotal score, and a privacy rating based solely on the scan results.
The scan data drives editorial rankings:
If you're building something similar for a different vertical — health sites, fintech, news — the methodology transfers directly. Blacklight + VirusTotal + editorial context = a privacy audit framework that works on any category of website.
Questions about the scanning methodology? Drop them in the comments.
Top comments (0)