Most developers know abstractly that data brokers exist. Fewer have actually looked up their own profile and seen what's there — their home address, every previous address, relatives' names and addresses, income estimate, vehicle history, court records, and consumer interest categories.
Here's how the data pipeline actually works, what a profile contains at the data level, and how to approach opt-outs at scale rather than one form at a time.
How Data Broker Pipelines Work
Data brokers aggregate from three primary source categories:
Public records — property records, voter registration, court filings, professional licenses, business registrations, marriage/divorce records, death records. These are legally public in the US and most other countries. Brokers ingest them continuously via bulk data agreements with county, state, and federal agencies.
Commercial data — purchase history from retailers (via loyalty programs and direct sales), subscription records, warranty registrations, financial transaction metadata (purchased from banks and credit card processors), insurance records, telecommunications data.
Third-party data — scraped from social media and public web, purchased from other data brokers (the industry extensively resells to itself), purchased from app developers who include data-sharing SDKs.
The aggregation logic:
# Simplified version of identity resolution logic
# (what brokers call "entity resolution" or "data matching")
def resolve_identity(records: list[dict]) -> PersonProfile:
"""
Match records across sources using probabilistic
identity resolution — name + address + DOB + phone
combinations weighted by confidence score
"""
clusters = []
for record in records:
matched = False
for cluster in clusters:
if identity_match_score(record, cluster) > THRESHOLD:
cluster.merge(record)
matched = True
break
if not matched:
clusters.append(PersonCluster(record))
return [cluster.to_profile() for cluster in clusters]
def identity_match_score(record, cluster) -> float:
score = 0.0
if fuzzy_name_match(record.name, cluster.names): score += 0.4
if record.address in cluster.addresses: score += 0.3
if record.dob == cluster.dob: score += 0.2
if record.phone in cluster.phones: score += 0.1
return score
This is why a data broker profile contains people you've lived with — shared address history creates a probabilistic link that their systems treat as a relationship signal.
What's Actually in a Profile (Data Schema)
A typical commercial data broker profile at the API level:
{
"person": {
"names": ["Jane Smith", "Jane A. Smith", "Jane Adams"],
"dob_range": {"min": "1985-01-01", "max": "1985-12-31"},
"phones": ["+15551234567", "+15559876543"],
"emails": ["jane@gmail.com", "jane.smith@oldwork.com"]
},
"locations": [
{
"address": "123 Main St, Austin TX 78701",
"type": "current",
"confidence": 0.94,
"date_range": {"from": "2021-03", "to": "present"}
},
{
"address": "456 Oak Ave, Denver CO 80203",
"type": "previous",
"confidence": 0.87,
"date_range": {"from": "2018-06", "to": "2021-02"}
}
],
"associates": [
{
"name": "Robert Smith",
"relationship": "relative",
"confidence": 0.76,
"shared_addresses": ["123 Main St, Austin TX 78701"]
}
],
"financials": {
"income_estimate": {"min": 75000, "max": 100000},
"net_worth_estimate": {"min": 50000, "max": 150000},
"homeowner": true,
"property_value": 385000
},
"records": {
"criminal": [],
"civil": [],
"bankruptcies": [],
"liens": []
},
"consumer_segments": [
"health_conscious_shopper",
"frequent_traveler",
"suburban_homeowner",
"political_donor_democrat"
]
}
The consumer_segments field is the advertising product — these interest/demographic categories are what marketers buy. The address and associate data is what stalkers, scammers, and PI firms buy.
The Opt-Out Landscape
There are approximately 4,000 data brokers. Manually opting out of each one is not realistic. The practical approach is tiered:
Tier 1 — High-traffic consumer-facing sites (manual opt-out, highest priority)
| Site | Opt-Out URL | Method | TTL |
|---|---|---|---|
| Spokeo | spokeo.com/optout | Email form | ~3-6 months |
| WhitePages | whitepages.com/suppression_requests | Web form | ~3-6 months |
| BeenVerified | beenverified.com/opt-out | Web form | ~3-6 months |
| MyLife | mylife.com | Phone call required | ~3-6 months |
| Radaris | radaris.com/page/privacy | Email form | ~3-6 months |
| Intelius | intelius.com/optout | Web form | ~3-6 months |
TTL = time before listing typically reappears from re-aggregation.
Tier 2 — Automated opt-out via paid services
Services like DeleteMe, Incogni, and Privacy Bee submit opt-outs across 100–750 brokers and resubmit on a schedule. Worth the cost if you're doing this for yourself or building it into a product for users.
Tier 3 — Enterprise data brokers (requires legal process)
Acxiom, LexisNexis, CoreLogic, Equifax (non-credit), TransUnion Marketing — these serve enterprise customers and have different opt-out mechanisms. Acxiom has an opt-out at aboutthedata.com. LexisNexis requires a written request with ID verification. California CCPA requests get the fastest response for these sources.
Automating Opt-Out Submissions
For the manual tier, the process is repetitive and automatable for the sites that use web forms rather than email or phone:
// Playwright automation for form-based opt-outs
// (Shown for educational purposes —
// check each site's ToS before automating)
const { chromium } = require('playwright');
async function submitOptOut(site, profileUrl, email) {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
switch(site) {
case 'spokeo':
await page.goto('https://www.spokeo.com/optout');
await page.fill('#email', email);
await page.fill('#profile_url', profileUrl);
await page.click('[type="submit"]');
break;
case 'radaris':
await page.goto('https://radaris.com/page/privacy');
await page.fill('input[name="email"]', email);
await page.fill('input[name="url"]', profileUrl);
await page.click('.submit-btn');
break;
}
await browser.close();
}
// Rate limit to avoid triggering bot detection
async function batchOptOut(profiles) {
for (const profile of profiles) {
await submitOptOut(profile.site, profile.url, profile.email);
await new Promise(r => setTimeout(r, 2000 + Math.random() * 3000));
}
}
The main friction points: CAPTCHA on some forms, email confirmation required on most, and a few sites require the user to find their own profile URL first (can't just submit a name).
CCPA as a Lever
For California residents (and US residents targeting California-based brokers), the CCPA gives individuals the right to:
- Know what data is collected about them
- Request deletion
- Opt out of sale
Submitting a CCPA deletion request often gets faster and more thorough responses than the standard opt-out form, even from brokers that theoretically don't have to respond. Use the Global Privacy Control (GPC) signal in your browser header — it's legally recognized in California and several other states:
// GPC header — supported by Firefox and Brave natively
// Can be set programmatically:
navigator.globalPrivacyControl // true if GPC enabled
// For server-side requests:
headers['Sec-GPC'] = '1'
The Realistic Picture
Manual opt-outs from the top 10-15 consumer-facing sites takes about 2-3 hours and provides meaningful short-term privacy improvement, particularly for address exposure. The data comes back in 3-6 months.
The deeper problem is that the legal architecture in the US makes this a whack-a-mole exercise until federal privacy legislation passes. For users who need durable protection — domestic violence survivors, public figures, journalists — the paid services plus CCPA requests plus synthetic identity strategies (PO boxes, registered agents for property) are the more serious toolkit.
Consumer-facing explanation of the same topic, including the exact opt-out steps for each major site: lucas8.com/data-broker-opt-out-guide
Top comments (0)