DEV Community

Spicy
Spicy

Posted on

Data Brokers: What They Collect, How the Industry Works, and How to Opt Out at Scale

Most developers know abstractly that data brokers exist. Fewer have actually looked up their own profile and seen what's there — their home address, every previous address, relatives' names and addresses, income estimate, vehicle history, court records, and consumer interest categories.

Here's how the data pipeline actually works, what a profile contains at the data level, and how to approach opt-outs at scale rather than one form at a time.


How Data Broker Pipelines Work

Data brokers aggregate from three primary source categories:

Public records — property records, voter registration, court filings, professional licenses, business registrations, marriage/divorce records, death records. These are legally public in the US and most other countries. Brokers ingest them continuously via bulk data agreements with county, state, and federal agencies.

Commercial data — purchase history from retailers (via loyalty programs and direct sales), subscription records, warranty registrations, financial transaction metadata (purchased from banks and credit card processors), insurance records, telecommunications data.

Third-party data — scraped from social media and public web, purchased from other data brokers (the industry extensively resells to itself), purchased from app developers who include data-sharing SDKs.

The aggregation logic:

# Simplified version of identity resolution logic
# (what brokers call "entity resolution" or "data matching")

def resolve_identity(records: list[dict]) -> PersonProfile:
    """
    Match records across sources using probabilistic 
    identity resolution — name + address + DOB + phone
    combinations weighted by confidence score
    """
    clusters = []

    for record in records:
        matched = False
        for cluster in clusters:
            if identity_match_score(record, cluster) > THRESHOLD:
                cluster.merge(record)
                matched = True
                break
        if not matched:
            clusters.append(PersonCluster(record))

    return [cluster.to_profile() for cluster in clusters]

def identity_match_score(record, cluster) -> float:
    score = 0.0
    if fuzzy_name_match(record.name, cluster.names): score += 0.4
    if record.address in cluster.addresses: score += 0.3
    if record.dob == cluster.dob: score += 0.2
    if record.phone in cluster.phones: score += 0.1
    return score
Enter fullscreen mode Exit fullscreen mode

This is why a data broker profile contains people you've lived with — shared address history creates a probabilistic link that their systems treat as a relationship signal.


What's Actually in a Profile (Data Schema)

A typical commercial data broker profile at the API level:

{
  "person": {
    "names": ["Jane Smith", "Jane A. Smith", "Jane Adams"],
    "dob_range": {"min": "1985-01-01", "max": "1985-12-31"},
    "phones": ["+15551234567", "+15559876543"],
    "emails": ["jane@gmail.com", "jane.smith@oldwork.com"]
  },
  "locations": [
    {
      "address": "123 Main St, Austin TX 78701",
      "type": "current",
      "confidence": 0.94,
      "date_range": {"from": "2021-03", "to": "present"}
    },
    {
      "address": "456 Oak Ave, Denver CO 80203", 
      "type": "previous",
      "confidence": 0.87,
      "date_range": {"from": "2018-06", "to": "2021-02"}
    }
  ],
  "associates": [
    {
      "name": "Robert Smith",
      "relationship": "relative",
      "confidence": 0.76,
      "shared_addresses": ["123 Main St, Austin TX 78701"]
    }
  ],
  "financials": {
    "income_estimate": {"min": 75000, "max": 100000},
    "net_worth_estimate": {"min": 50000, "max": 150000},
    "homeowner": true,
    "property_value": 385000
  },
  "records": {
    "criminal": [],
    "civil": [],
    "bankruptcies": [],
    "liens": []
  },
  "consumer_segments": [
    "health_conscious_shopper",
    "frequent_traveler", 
    "suburban_homeowner",
    "political_donor_democrat"
  ]
}
Enter fullscreen mode Exit fullscreen mode

The consumer_segments field is the advertising product — these interest/demographic categories are what marketers buy. The address and associate data is what stalkers, scammers, and PI firms buy.


The Opt-Out Landscape

There are approximately 4,000 data brokers. Manually opting out of each one is not realistic. The practical approach is tiered:

Tier 1 — High-traffic consumer-facing sites (manual opt-out, highest priority)

Site Opt-Out URL Method TTL
Spokeo spokeo.com/optout Email form ~3-6 months
WhitePages whitepages.com/suppression_requests Web form ~3-6 months
BeenVerified beenverified.com/opt-out Web form ~3-6 months
MyLife mylife.com Phone call required ~3-6 months
Radaris radaris.com/page/privacy Email form ~3-6 months
Intelius intelius.com/optout Web form ~3-6 months

TTL = time before listing typically reappears from re-aggregation.

Tier 2 — Automated opt-out via paid services

Services like DeleteMe, Incogni, and Privacy Bee submit opt-outs across 100–750 brokers and resubmit on a schedule. Worth the cost if you're doing this for yourself or building it into a product for users.

Tier 3 — Enterprise data brokers (requires legal process)

Acxiom, LexisNexis, CoreLogic, Equifax (non-credit), TransUnion Marketing — these serve enterprise customers and have different opt-out mechanisms. Acxiom has an opt-out at aboutthedata.com. LexisNexis requires a written request with ID verification. California CCPA requests get the fastest response for these sources.


Automating Opt-Out Submissions

For the manual tier, the process is repetitive and automatable for the sites that use web forms rather than email or phone:

// Playwright automation for form-based opt-outs
// (Shown for educational purposes — 
//  check each site's ToS before automating)

const { chromium } = require('playwright');

async function submitOptOut(site, profileUrl, email) {
  const browser = await chromium.launch({ headless: false });
  const page = await browser.newPage();

  switch(site) {
    case 'spokeo':
      await page.goto('https://www.spokeo.com/optout');
      await page.fill('#email', email);
      await page.fill('#profile_url', profileUrl);
      await page.click('[type="submit"]');
      break;

    case 'radaris':
      await page.goto('https://radaris.com/page/privacy');
      await page.fill('input[name="email"]', email);
      await page.fill('input[name="url"]', profileUrl);
      await page.click('.submit-btn');
      break;
  }

  await browser.close();
}

// Rate limit to avoid triggering bot detection
async function batchOptOut(profiles) {
  for (const profile of profiles) {
    await submitOptOut(profile.site, profile.url, profile.email);
    await new Promise(r => setTimeout(r, 2000 + Math.random() * 3000));
  }
}
Enter fullscreen mode Exit fullscreen mode

The main friction points: CAPTCHA on some forms, email confirmation required on most, and a few sites require the user to find their own profile URL first (can't just submit a name).


CCPA as a Lever

For California residents (and US residents targeting California-based brokers), the CCPA gives individuals the right to:

  • Know what data is collected about them
  • Request deletion
  • Opt out of sale

Submitting a CCPA deletion request often gets faster and more thorough responses than the standard opt-out form, even from brokers that theoretically don't have to respond. Use the Global Privacy Control (GPC) signal in your browser header — it's legally recognized in California and several other states:

// GPC header — supported by Firefox and Brave natively
// Can be set programmatically:
navigator.globalPrivacyControl // true if GPC enabled

// For server-side requests:
headers['Sec-GPC'] = '1'
Enter fullscreen mode Exit fullscreen mode

The Realistic Picture

Manual opt-outs from the top 10-15 consumer-facing sites takes about 2-3 hours and provides meaningful short-term privacy improvement, particularly for address exposure. The data comes back in 3-6 months.

The deeper problem is that the legal architecture in the US makes this a whack-a-mole exercise until federal privacy legislation passes. For users who need durable protection — domestic violence survivors, public figures, journalists — the paid services plus CCPA requests plus synthetic identity strategies (PO boxes, registered agents for property) are the more serious toolkit.

Consumer-facing explanation of the same topic, including the exact opt-out steps for each major site: lucas8.com/data-broker-opt-out-guide

Top comments (0)