Your prospecting list is only as good as the contact data behind it. Most teams pay $200–500/month for email databases that are 6–18 months stale. The alternative? Pull contact data directly from company websites — at the moment you need it.
Here's how to build a B2B contact extraction pipeline that runs on demand.
The problem with purchased contact lists
Bought email lists have three failure modes:
- Staleness — People change jobs every 2–3 years. A list from 6 months ago is already 15–25% wrong.
- Relevance — You're buying contacts from a segment, not a curated list of your actual ICP.
- Deliverability — Sending to stale lists tanks your domain reputation fast.
The better approach: identify target companies yourself, then scrape their contact pages on demand.
What contact information you can extract
Public contact data typically includes:
- Email addresses (contact@, hello@, sales@, support@)
- Phone numbers (main, direct, WhatsApp)
- Social profiles (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok, GitHub)
- Physical addresses
- Contact form URLs
Most company websites publish this on /contact, /about, /team, or in the site footer. A scraper that crawls 5–10 pages per domain catches 90%+ of what's publicly listed.
Setting up the contact scraper
The Contact Info Scraper on Apify extracts all of the above from any list of URLs. Here's the input format:
{
"startUrls": [
{ "url": "https://company1.com" },
{ "url": "https://company2.com" },
{ "url": "https://company3.com" }
],
"maxDepth": 2,
"maxPagesPerCrawl": 10,
"includePersonalEmails": false
}
maxDepth: 2 tells the crawler to follow internal links up to 2 levels deep — enough to hit most contact pages without crawling the entire site.
Building the pipeline
Step 1: Get your target company list
You likely already have a list of target companies from:
- LinkedIn Sales Navigator search exports
- Industry directories (Crunchbase, G2, etc.)
- Conference attendee lists
- Job board postings (companies actively hiring = growing = buying)
Export as CSV. Extract the website column.
Step 2: Run the scraper
Via Apify API:
import requests
API_TOKEN = "your_apify_token"
ACTOR_ID = "lanky_quantifier~contact-info-scraper"
# Start the run
response = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
params={"token": API_TOKEN},
json={
"startUrls": [{"url": url} for url in company_urls],
"maxDepth": 2,
"maxPagesPerCrawl": 8
}
)
run_id = response.json()["data"]["id"]
print(f"Run started: {run_id}")
Step 3: Retrieve results
import time
# Wait for completion
while True:
status = requests.get(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
params={"token": API_TOKEN}
).json()["data"]["status"]
if status in ("SUCCEEDED", "FAILED"):
break
time.sleep(5)
# Get results
results = requests.get(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}/dataset/items",
params={"token": API_TOKEN}
).json()
for item in results:
print(f"{item['url']}: {item.get('emails', [])} | {item.get('phones', [])}")
Step 4: Push to CRM
Map the output fields to your CRM schema. Most CRMs (HubSpot, Pipedrive, Salesforce) have API endpoints for bulk contact creation.
import hubspot
from hubspot.crm.contacts import SimplePublicObjectInput
client = hubspot.Client.create(access_token="your_hubspot_token")
for item in results:
for email in item.get('emails', []):
contact = SimplePublicObjectInput(properties={
"email": email,
"website": item['url'],
"phone": item.get('phones', [''])[0]
})
client.crm.contacts.basic_api.create(simple_public_object_input=contact)
Cost comparison
| Method | Cost | Freshness | Control |
|---|---|---|---|
| ZoomInfo | ~$15,000/year | 3–6 months stale | None |
| Apollo.io | $99–$499/month | 6–12 months stale | Limited |
| Hunter.io | $49–$399/month | Unknown | None |
| Contact scraper | ~$0.001/domain | Real-time | Full |
For a list of 1,000 companies: Apollo charges $99/month ongoing. The scraper costs about $1 total.
What this does NOT get you
To set expectations:
- No personal emails from behind login walls (LinkedIn InMail addresses, etc.)
- No verified delivery status (still need an email validator like NeverBounce)
- Some sites block crawlers — typically 5–15% of corporate sites
For verified deliverability, run results through Hunter.io's verification API or NeverBounce before sending.
Production tips
Rate limiting: Set maxConcurrency to 5–10 to avoid triggering bot detection on shared hosting providers.
Deduplication: The same email appears on multiple pages. Deduplicate by email address before pushing to CRM.
Scheduling: Run weekly against your target account list. New hires, new contact pages — you catch them before your competitors do.
Take the next step
The actor is live on Apify: Contact Info Scraper — 831 runs and counting.
If you want a complete lead generation workflow (scraping + enrichment + CRM push + outreach sequencing), that's packaged in the AI Lead Gen Kit — $49 one-time.
Works with n8n, runs on any VPS or Apify cloud.
Top comments (0)