Scraping B2B Lead Data: Combining LinkedIn, Clearbit, and Hunter.io

#python #webdev #tutorial #programming

B2B lead generation runs on data. The best sales teams enrich prospects with company data, verified emails, and professional profiles. Here's how to build an automated lead enrichment pipeline combining multiple data sources.

The Lead Enrichment Stack

LinkedIn: Professional profiles, job titles, company associations
Clearbit: Company data, technographics, funding info
Hunter.io: Email discovery and verification
Public sources: Company websites, press releases, SEC filings

Hunter.io Email Discovery

Hunter.io offers a generous free tier (25 searches/month) and affordable paid plans:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Clearbit Company Enrichment

class ClearbitClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.headers["Authorization"] = f"Bearer {api_key}"

    def enrich_company(self, domain):
        resp = self.session.get(
            f"https://company.clearbit.com/v2/companies/find",
            params={"domain": domain}
        )
        if resp.status_code != 200:
            return None
        data = resp.json()
        return {
            "name": data.get("name"),
            "domain": data.get("domain"),
            "industry": data.get("category", {}).get("industry"),
            "employee_count": data.get("metrics", {}).get("employees"),
            "revenue_range": data.get("metrics", {}).get("estimatedAnnualRevenue"),
            "tech_stack": data.get("tech", []),
            "funding": data.get("metrics", {}).get("raised"),
            "location": data.get("geo", {}).get("city"),
            "description": data.get("description"),
        }

Scraping Company Websites for Context

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

ScraperAPI ensures reliable access to company websites regardless of their security setup.

The Full Enrichment Pipeline

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Exporting to CRM Format

def export_to_csv(enriched_leads, filename="leads_enriched.csv"):
    with open(filename, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow([
            "Domain", "Company", "Industry", "Employees",
            "Revenue", "Email", "Confidence", "Tech Stack"
        ])
        for lead in enriched_leads:
            company = lead.get("company", {})
            email = lead.get("email", {})
            writer.writerow([
                lead["domain"],
                company.get("name", ""),
                company.get("industry", ""),
                company.get("employee_count", ""),
                company.get("revenue_range", ""),
                email.get("email", ""),
                email.get("confidence", ""),
                ", ".join(company.get("tech_stack", [])),
            ])

For scaling lead enrichment across thousands of companies, use ThorData for residential proxies and ScrapeOps for monitoring.

B2B lead enrichment combines multiple data sources into actionable intelligence. API-first services like Hunter.io and Clearbit provide structured data, while web scraping fills the gaps. The key is building a pipeline that enriches reliably and respects rate limits.

Happy scraping!