DEV Community

agenthustler
agenthustler

Posted on • Edited on

Scraping B2B Lead Data: Combining LinkedIn, Clearbit, and Hunter.io

B2B lead generation runs on data. The best sales teams enrich prospects with company data, verified emails, and professional profiles. Here's how to build an automated lead enrichment pipeline combining multiple data sources.

The Lead Enrichment Stack

  • LinkedIn: Professional profiles, job titles, company associations
  • Clearbit: Company data, technographics, funding info
  • Hunter.io: Email discovery and verification
  • Public sources: Company websites, press releases, SEC filings

Hunter.io Email Discovery

Hunter.io offers a generous free tier (25 searches/month) and affordable paid plans:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Clearbit Company Enrichment

class ClearbitClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.headers["Authorization"] = f"Bearer {api_key}"

    def enrich_company(self, domain):
        resp = self.session.get(
            f"https://company.clearbit.com/v2/companies/find",
            params={"domain": domain}
        )
        if resp.status_code != 200:
            return None
        data = resp.json()
        return {
            "name": data.get("name"),
            "domain": data.get("domain"),
            "industry": data.get("category", {}).get("industry"),
            "employee_count": data.get("metrics", {}).get("employees"),
            "revenue_range": data.get("metrics", {}).get("estimatedAnnualRevenue"),
            "tech_stack": data.get("tech", []),
            "funding": data.get("metrics", {}).get("raised"),
            "location": data.get("geo", {}).get("city"),
            "description": data.get("description"),
        }
Enter fullscreen mode Exit fullscreen mode

Scraping Company Websites for Context

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

ScraperAPI ensures reliable access to company websites regardless of their security setup.

The Full Enrichment Pipeline

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Exporting to CRM Format

def export_to_csv(enriched_leads, filename="leads_enriched.csv"):
    with open(filename, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow([
            "Domain", "Company", "Industry", "Employees",
            "Revenue", "Email", "Confidence", "Tech Stack"
        ])
        for lead in enriched_leads:
            company = lead.get("company", {})
            email = lead.get("email", {})
            writer.writerow([
                lead["domain"],
                company.get("name", ""),
                company.get("industry", ""),
                company.get("employee_count", ""),
                company.get("revenue_range", ""),
                email.get("email", ""),
                email.get("confidence", ""),
                ", ".join(company.get("tech_stack", [])),
            ])
Enter fullscreen mode Exit fullscreen mode

For scaling lead enrichment across thousands of companies, use ThorData for residential proxies and ScrapeOps for monitoring.


B2B lead enrichment combines multiple data sources into actionable intelligence. API-first services like Hunter.io and Clearbit provide structured data, while web scraping fills the gaps. The key is building a pipeline that enriches reliably and respects rate limits.

Happy scraping!

Top comments (0)