Your CRM is full of company names and domains with no contact data. Manually researching each one takes 5–10 minutes per prospect. At 100 prospects, that's a full day of work every week.
Here's how to automate it.
The enrichment gap
Sales teams typically capture a company name and domain from an inbound form or LinkedIn search — then stop there. The result: a CRM with 2,000 entries that look like:
Company: Acme Corp
Website: acmecorp.com
Email: [empty]
Phone: [empty]
Owner: [empty]
To actually reach someone, a rep has to manually visit the site, find the contact page, copy the email, and paste it into the CRM. Multiply by 500 prospects and you've wasted a week.
What automated enrichment looks like
A contact enrichment pipeline has three stages:
- Extract — Pull all publicly visible contact info from company websites
- Deduplicate + Validate — Remove duplicate emails, flag invalid formats
- Sync — Push enriched data back into your CRM
The extraction step is where most teams get stuck. Here's the simplest way to do it.
The contact scraper
The Contact Info Scraper crawls any list of URLs and returns structured contact data:
{
"url": "https://acmecorp.com",
"emails": ["hello@acmecorp.com", "sales@acmecorp.com"],
"phones": ["+1-415-555-0100"],
"linkedIn": "https://linkedin.com/company/acme-corp",
"twitter": "https://twitter.com/acmecorp",
"facebook": "https://facebook.com/acmecorp"
}
It crawls up to 10 pages per domain by default (enough to hit /contact, /about, /team, and the footer), extracts all contact patterns using regex and semantic detection, and returns structured JSON.
Building the enrichment pipeline
Step 1: Export domains from CRM
Pull all records with an empty email field:
# HubSpot example
import hubspot
client = hubspot.Client.create(access_token="YOUR_TOKEN")
# Get contacts without email
contacts = client.crm.contacts.search_api.do_search({
"filterGroups": [{
"filters": [{"propertyName": "email", "operator": "NOT_HAS_PROPERTY"}]
}],
"properties": ["company", "website"]
})
domains = [c.properties.get('website') for c in contacts.results if c.properties.get('website')]
print(f"Enriching {len(domains)} records")
Step 2: Run the contact scraper
import requests
API_TOKEN = "your_apify_token"
ACTOR = "lanky_quantifier~contact-info-scraper"
run = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/runs",
params={"token": API_TOKEN},
json={
"startUrls": [{"url": d if d.startswith("http") else f"https://{d}"} for d in domains],
"maxDepth": 2,
"maxPagesPerCrawl": 10
}
).json()["data"]
print(f"Run ID: {run['id']} — started")
Step 3: Wait and retrieve
import time
run_id = run['id']
while True:
r = requests.get(
f"https://api.apify.com/v2/acts/{ACTOR}/runs/{run_id}",
params={"token": API_TOKEN}
).json()["data"]
if r["status"] in ("SUCCEEDED", "FAILED"):
print(f"Done: {r['status']} | Items: {r.get('stats',{}).get('itemCount',0)}")
break
time.sleep(10)
# Get items
items = requests.get(
f"https://api.apify.com/v2/acts/{ACTOR}/runs/{run_id}/dataset/items",
params={"token": API_TOKEN}
).json()
Step 4: Push back to CRM
for item in items:
domain = item['url'].replace('https://', '').replace('http://', '').rstrip('/')
emails = item.get('emails', [])
phones = item.get('phones', [])
if not emails:
continue
# Update HubSpot contact
client.crm.contacts.basic_api.update(
contact_id=domain_to_contact_id[domain],
simple_public_object_input={
"properties": {
"email": emails[0],
"phone": phones[0] if phones else "",
"hs_linkedin_company_page": item.get('linkedIn', '')
}
}
)
print(f"Enriched: {domain} → {emails[0]}")
Real numbers
In a test against 500 B2B SaaS company domains:
- 432 domains returned at least one email address (86%)
- 218 domains returned a phone number (44%)
- 380 domains returned a LinkedIn company page (76%)
- Average run time: ~4 minutes for 500 URLs
The 14% that returned nothing: mostly enterprise sites with no public contact info (large banks, governments), or aggressive bot detection (Cloudflare Enterprise).
Scheduling for ongoing enrichment
Run this weekly on new CRM entries:
# n8n workflow trigger: every Monday at 9am
# 1. Query CRM for contacts added in last 7 days without email
# 2. Run contact scraper on their domains
# 3. Push enriched data back
# 4. Send Slack alert with enrichment summary
Or use Apify's built-in scheduler to run the actor on a recurring basis against a continuously updated URL list.
What to do with the data
Once you have email addresses:
- Validate before sending — use Hunter.io or NeverBounce API to verify deliverability
-
Segment by contact type —
hello@= generic,cto@= executive,support@= ops - Personalize — check LinkedIn URL for company updates before outreach
- Sequence — push into your outreach tool (Apollo, Lemlist, Instantly) with warm-up enabled
Automate the whole thing
The extraction step is solved. If you want the full pipeline — CRM pull → scrape → validate → sequence — that's what the AI Lead Gen Kit ($49) covers. Two complete n8n workflows, documented and import-ready.
Actor link: Contact Info Scraper on Apify — 831 runs, pay-per-result pricing.
Top comments (0)