Vhub Systems

Posted on Apr 2

How to Extract B2B Contact Information From Any Website at Scale

#webscraping #python #marketing

Your prospecting list is only as good as the contact data behind it. Most teams pay $200–500/month for email databases that are 6–18 months stale. The alternative? Pull contact data directly from company websites — at the moment you need it.

Here's how to build a B2B contact extraction pipeline that runs on demand.

The problem with purchased contact lists

Bought email lists have three failure modes:

Staleness — People change jobs every 2–3 years. A list from 6 months ago is already 15–25% wrong.
Relevance — You're buying contacts from a segment, not a curated list of your actual ICP.
Deliverability — Sending to stale lists tanks your domain reputation fast.

The better approach: identify target companies yourself, then scrape their contact pages on demand.

What contact information you can extract

Public contact data typically includes:

Email addresses (contact@, hello@, sales@, support@)
Phone numbers (main, direct, WhatsApp)
Social profiles (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok, GitHub)
Physical addresses
Contact form URLs

Most company websites publish this on /contact, /about, /team, or in the site footer. A scraper that crawls 5–10 pages per domain catches 90%+ of what's publicly listed.

Setting up the contact scraper

The Contact Info Scraper on Apify extracts all of the above from any list of URLs. Here's the input format:

{
  "startUrls": [
    { "url": "https://company1.com" },
    { "url": "https://company2.com" },
    { "url": "https://company3.com" }
  ],
  "maxDepth": 2,
  "maxPagesPerCrawl": 10,
  "includePersonalEmails": false
}

maxDepth: 2 tells the crawler to follow internal links up to 2 levels deep — enough to hit most contact pages without crawling the entire site.

Building the pipeline

Step 1: Get your target company list

You likely already have a list of target companies from:

LinkedIn Sales Navigator search exports
Industry directories (Crunchbase, G2, etc.)
Conference attendee lists
Job board postings (companies actively hiring = growing = buying)

Export as CSV. Extract the website column.

Step 2: Run the scraper

Via Apify API:

import requests

API_TOKEN = "your_apify_token"
ACTOR_ID = "lanky_quantifier~contact-info-scraper"

# Start the run
response = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
    params={"token": API_TOKEN},
    json={
        "startUrls": [{"url": url} for url in company_urls],
        "maxDepth": 2,
        "maxPagesPerCrawl": 8
    }
)

run_id = response.json()["data"]["id"]
print(f"Run started: {run_id}")

Step 3: Retrieve results

import time

# Wait for completion
while True:
    status = requests.get(
        f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
        params={"token": API_TOKEN}
    ).json()["data"]["status"]

    if status in ("SUCCEEDED", "FAILED"):
        break
    time.sleep(5)

# Get results
results = requests.get(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}/dataset/items",
    params={"token": API_TOKEN}
).json()

for item in results:
    print(f"{item['url']}: {item.get('emails', [])} | {item.get('phones', [])}")

Step 4: Push to CRM

Map the output fields to your CRM schema. Most CRMs (HubSpot, Pipedrive, Salesforce) have API endpoints for bulk contact creation.

import hubspot
from hubspot.crm.contacts import SimplePublicObjectInput

client = hubspot.Client.create(access_token="your_hubspot_token")

for item in results:
    for email in item.get('emails', []):
        contact = SimplePublicObjectInput(properties={
            "email": email,
            "website": item['url'],
            "phone": item.get('phones', [''])[0]
        })
        client.crm.contacts.basic_api.create(simple_public_object_input=contact)

Cost comparison

Method	Cost	Freshness	Control
ZoomInfo	~$15,000/year	3–6 months stale	None
Apollo.io	$99–$499/month	6–12 months stale	Limited
Hunter.io	$49–$399/month	Unknown	None
Contact scraper	~$0.001/domain	Real-time	Full

For a list of 1,000 companies: Apollo charges $99/month ongoing. The scraper costs about $1 total.

What this does NOT get you

To set expectations:

No personal emails from behind login walls (LinkedIn InMail addresses, etc.)
No verified delivery status (still need an email validator like NeverBounce)
Some sites block crawlers — typically 5–15% of corporate sites

For verified deliverability, run results through Hunter.io's verification API or NeverBounce before sending.

Production tips

Rate limiting: Set maxConcurrency to 5–10 to avoid triggering bot detection on shared hosting providers.

Deduplication: The same email appears on multiple pages. Deduplicate by email address before pushing to CRM.

Scheduling: Run weekly against your target account list. New hires, new contact pages — you catch them before your competitors do.

Take the next step

The actor is live on Apify: Contact Info Scraper — 831 runs and counting.

If you want a complete lead generation workflow (scraping + enrichment + CRM push + outreach sequencing), that's packaged in the AI Lead Gen Kit — $49 one-time.

Works with n8n, runs on any VPS or Apify cloud.

DEV Community