DEV Community

Vhub Systems
Vhub Systems

Posted on

How to Extract B2B Contact Information From Any Website at Scale

Your prospecting list is only as good as the contact data behind it. Most teams pay $200–500/month for email databases that are 6–18 months stale. The alternative? Pull contact data directly from company websites — at the moment you need it.

Here's how to build a B2B contact extraction pipeline that runs on demand.

The problem with purchased contact lists

Bought email lists have three failure modes:

  1. Staleness — People change jobs every 2–3 years. A list from 6 months ago is already 15–25% wrong.
  2. Relevance — You're buying contacts from a segment, not a curated list of your actual ICP.
  3. Deliverability — Sending to stale lists tanks your domain reputation fast.

The better approach: identify target companies yourself, then scrape their contact pages on demand.

What contact information you can extract

Public contact data typically includes:

  • Email addresses (contact@, hello@, sales@, support@)
  • Phone numbers (main, direct, WhatsApp)
  • Social profiles (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok, GitHub)
  • Physical addresses
  • Contact form URLs

Most company websites publish this on /contact, /about, /team, or in the site footer. A scraper that crawls 5–10 pages per domain catches 90%+ of what's publicly listed.

Setting up the contact scraper

The Contact Info Scraper on Apify extracts all of the above from any list of URLs. Here's the input format:

{
  "startUrls": [
    { "url": "https://company1.com" },
    { "url": "https://company2.com" },
    { "url": "https://company3.com" }
  ],
  "maxDepth": 2,
  "maxPagesPerCrawl": 10,
  "includePersonalEmails": false
}
Enter fullscreen mode Exit fullscreen mode

maxDepth: 2 tells the crawler to follow internal links up to 2 levels deep — enough to hit most contact pages without crawling the entire site.

Building the pipeline

Step 1: Get your target company list

You likely already have a list of target companies from:

  • LinkedIn Sales Navigator search exports
  • Industry directories (Crunchbase, G2, etc.)
  • Conference attendee lists
  • Job board postings (companies actively hiring = growing = buying)

Export as CSV. Extract the website column.

Step 2: Run the scraper

Via Apify API:

import requests

API_TOKEN = "your_apify_token"
ACTOR_ID = "lanky_quantifier~contact-info-scraper"

# Start the run
response = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
    params={"token": API_TOKEN},
    json={
        "startUrls": [{"url": url} for url in company_urls],
        "maxDepth": 2,
        "maxPagesPerCrawl": 8
    }
)

run_id = response.json()["data"]["id"]
print(f"Run started: {run_id}")
Enter fullscreen mode Exit fullscreen mode

Step 3: Retrieve results

import time

# Wait for completion
while True:
    status = requests.get(
        f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
        params={"token": API_TOKEN}
    ).json()["data"]["status"]

    if status in ("SUCCEEDED", "FAILED"):
        break
    time.sleep(5)

# Get results
results = requests.get(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}/dataset/items",
    params={"token": API_TOKEN}
).json()

for item in results:
    print(f"{item['url']}: {item.get('emails', [])} | {item.get('phones', [])}")
Enter fullscreen mode Exit fullscreen mode

Step 4: Push to CRM

Map the output fields to your CRM schema. Most CRMs (HubSpot, Pipedrive, Salesforce) have API endpoints for bulk contact creation.

import hubspot
from hubspot.crm.contacts import SimplePublicObjectInput

client = hubspot.Client.create(access_token="your_hubspot_token")

for item in results:
    for email in item.get('emails', []):
        contact = SimplePublicObjectInput(properties={
            "email": email,
            "website": item['url'],
            "phone": item.get('phones', [''])[0]
        })
        client.crm.contacts.basic_api.create(simple_public_object_input=contact)
Enter fullscreen mode Exit fullscreen mode

Cost comparison

Method Cost Freshness Control
ZoomInfo ~$15,000/year 3–6 months stale None
Apollo.io $99–$499/month 6–12 months stale Limited
Hunter.io $49–$399/month Unknown None
Contact scraper ~$0.001/domain Real-time Full

For a list of 1,000 companies: Apollo charges $99/month ongoing. The scraper costs about $1 total.

What this does NOT get you

To set expectations:

  • No personal emails from behind login walls (LinkedIn InMail addresses, etc.)
  • No verified delivery status (still need an email validator like NeverBounce)
  • Some sites block crawlers — typically 5–15% of corporate sites

For verified deliverability, run results through Hunter.io's verification API or NeverBounce before sending.

Production tips

Rate limiting: Set maxConcurrency to 5–10 to avoid triggering bot detection on shared hosting providers.

Deduplication: The same email appears on multiple pages. Deduplicate by email address before pushing to CRM.

Scheduling: Run weekly against your target account list. New hires, new contact pages — you catch them before your competitors do.


Take the next step

The actor is live on Apify: Contact Info Scraper — 831 runs and counting.

If you want a complete lead generation workflow (scraping + enrichment + CRM push + outreach sequencing), that's packaged in the AI Lead Gen Kit — $49 one-time.

Works with n8n, runs on any VPS or Apify cloud.

Top comments (0)