Oarcom

Posted on Mar 14

I Used a Newsroom Privacy Tool to Audit 100+ Adult Sites. Here's How the Stack Works.

#security #privacy #opensource #webdev

Blacklight, VirusTotal, Supabase, and a lot of uncomfortable browser tabs.

This started because I wanted to build an adult site review platform that wasn't just opinions. Every "best porn sites" list on the internet is the same thing — ten affiliate links, a paragraph about "great content," and zero verifiable data. I wanted scan results. Numbers. Something a reader could cross-check.

The problem: there's no standardized privacy scanning pipeline for adult websites. Nobody's built one. The tools exist in pieces across different contexts — journalism, security research, browser extension development — but nobody's stitched them together for this specific use case.

So I did. Here's the stack, the methodology, and the non-obvious problems I ran into along the way.

The scanning tool: Blacklight

Blacklight is an open-source real-time privacy inspector built by The Markup, a nonprofit investigative newsroom. It loads a URL in a headless browser and detects:

Third-party trackers — scripts loaded from external domains that track user behavior
Third-party cookies — cookies set by domains other than the one you're visiting
Canvas fingerprinting — using the HTML5 Canvas API to generate a unique device identifier
Session recording — scripts that replay your mouse movements, clicks, and scrolls (think FullStory, Hotjar)
Keystroke capture — logging form field input before submission
Facebook/Google tracking pixels — specific integrations with ad platforms

For this project, the relevant outputs are trackers, cookies, fingerprinting, session recording, and keystroke capture. Facebook/Google pixels aren't meaningful in the adult space — those platforms ban adult advertisers, so the pixels are rarely present.

What Blacklight doesn't catch

This matters more than what it does catch.

Blacklight measures third-party tracking. It does not detect first-party analytics (server-side logging, Plausible, Matomo). Every website with a server collects access logs — IP addresses, timestamps, user agents, referrers. Blacklight can't see that because it's server-side. A site returning 0 trackers and 0 cookies in Blacklight is not collecting "no data." It's collecting no data via third-party scripts in your browser.

That distinction is important and I've had to explain it in approximately 40 reviews.

The malware layer: VirusTotal

Blacklight tells you about tracking. It doesn't tell you about malware. For that I use VirusTotal's URL scanner — it submits the URL to 90+ antivirus engines and returns a consensus score.

Every site I've scanned came back 0/94 or close to it. The malware threat on major adult platforms in 2026 is effectively zero. The actual malware risk in the adult ecosystem lives in the ad network — pop-ups, redirects, and interstitials that route through sketchy ad exchanges. The sites themselves are clean. The ads around them aren't always.

This is why uBlock Origin is the single most impactful browser extension for adult site safety. It kills the ad layer entirely.

The data model

I store everything in a TypeScript object (siteData.ts) that looks like this for each site:

{
  trackers: 2,
  cookies: 0,
  fingerprinting: false,
  sessionRecording: false,
  keystrokeCapture: false,
  billingDescriptor: 'Aylo/Probiller',
  paymentProcessor: 'Probiller',
  vtScore: '0/94',
  vtFlagged: false,
  monthlyVisits: null,
  topCountry: null,
  domainAge: '2000',
  paymentMethods: {
    creditCard: false,
    crypto: true,
    paypal: false,
    giftCard: false
  }
}

This feeds into everything — review pages, comparison tables, the privacy score tool, aggregate statistics on guide pages, and the sitemap. When a scan value changes, every page that references it updates at the next build. No manual propagation. No copy-paste errors.

The site runs on Next.js (App Router, server components), deployed on Vercel, with Supabase for analytics (self-hosted Umami). The data layer is deliberately not in a database — it's a TypeScript file that gets bundled at build time. For 100 sites this is faster than any database query and the DX is simpler: edit the file, push, done.

At 1,000+ sites I'll probably need to migrate to Supabase, but premature database architecture is how side projects die.

The non-obvious problems

Adult sites break headless browsers differently

Blacklight runs a headless Chromium instance. Some adult sites detect headless browsers and serve different content — fewer ads, different tracking scripts, or outright blocks. This means the scan may undercount trackers on sites that fingerprint the scanner itself.

I've dealt with this by running scans on multiple occasions and comparing results. If a site returns 0/0 once and 3/2 the next time, something is being served conditionally. I report the higher number with a note.

Paywall gating changes the scan surface

A site like Brazzers has a landing page (what Blacklight scans) and a members-only area (what it can't reach). The tracking infrastructure behind the login may be completely different. Blacklight scans the public-facing homepage. This is a limitation I document for every paid site.

For free sites, the scan surface is the actual user experience. For paid sites, it's the marketing layer. Keep that in mind when comparing free tubes (full scan surface) to premium studios (partial scan surface).

Cookie counts are noisy

A site with 7 cookies isn't necessarily worse than a site with 1 cookie. One session management cookie is benign. Seven analytics and marketing cookies are not. But Blacklight reports the count, not the function. I've started doing manual cookie audits on the worst offenders to break down what each cookie actually does, but for the aggregate data, the count is the reported metric.

Billing data requires a subscription

I can't scan what a billing descriptor looks like without actually subscribing. For some sites I've subscribed, noted the descriptor, and canceled. For others I rely on user reports and processor documentation. The billing descriptor field in my data model has varying confidence levels — some confirmed firsthand, some sourced from forums and reviews.

What 100+ scans taught me about the adult industry's privacy landscape

The data clusters into three tiers:

Tier 1 — Clean (0-1 trackers, 0-1 cookies, no invasive tech): This includes platforms you wouldn't expect. XNXX — one of the top 50 most visited websites globally — returns 0/0. Chaturbate, the biggest cam site, returns 0 trackers and 1 cookie. OnlyFans returns 0/0. Literotica returns 0/0. The clean tier is bigger than most people assume.

Tier 2 — Standard (2-3 trackers, 0-3 cookies, no invasive tech): Most major sites land here. Pornhub at 2/0. XHamster at 2/0. Erika Lust at 2/0. This is the baseline for sites running standard analytics and ad attribution without anything aggressive.

Tier 3 — Heavy (4+ trackers, 4+ cookies, or invasive tech present): This is where it gets interesting. Ashley Madison at 5 trackers and 7 cookies. Fansly at 6/6. Promptchan AI at 3/12. LiveJasmin with session recording AND keystroke capture. Ersties at 3 trackers and 9 cookies. The sites in this tier have business models or organizational structures that demand more data from users.

The pattern: the heaviest tracking doesn't correlate with the sketchiest sites. It correlates with the most complex business models. VC-backed startups, sites with affiliate programs, platforms running multiple payment processors — these are the ones that light up the scan. A simple tube site with ads runs cleaner than a "premium ethical" platform with investors, because the tube doesn't need cohort analytics for board presentations.

The tool I built from this data

The scan data feeds a public tool at nsfwranker.com/tools/privacy-score. Type a site name, get the Blacklight results. No signup. No paywall. The dataset covers 100+ sites now and scales to 1,000+ without code changes (it reads from siteData.ts dynamically).

The guide pages — adult site privacy rankings, does Pornhub track you, cam site privacy report — all pull from the same data source. One TypeScript file. Every page reflects the current scan state at build time.

If you're building something similar in a different vertical (health sites, kids' apps, news outlets), the methodology is directly transferable. Blacklight is open-source. VirusTotal has a free API tier. The hard part isn't the tooling — it's scanning enough sites to make the patterns visible.

Replicate this yourself

Go to themarkup.org/blacklight
Enter any URL
Wait 30-60 seconds for the headless browser to load and scan
Read the results — trackers, cookies, fingerprinting, session recording, keystroke capture

That's it. No account needed. No API key. The tool is free and public.

For batch scanning, The Markup has published the Blacklight source code — you can run it locally against a list of URLs and collect results programmatically. I haven't done this yet (manual scans for 100 sites was tedious but manageable), but it's the obvious next step for scaling.

VirusTotal: virustotal.com. Paste any URL. Free. 4 scans/minute on the free API tier if you want to automate.

The full dataset and all reviews: nsfwranker.com

Source code for the scanning methodology and data model is not open-source (yet), but the tools it relies on are. If enough people want a standalone privacy-scanning pipeline for any vertical, I'll consider packaging it.

Tags: privacy security webdev javascript opensource

DEV Community