DEV Community

Charles
Charles

Posted on

How to Scrape LinkedIn Company Data Legally and Efficiently

How to Scrape LinkedIn Company Data Legally and Efficiently

LinkedIn is a goldmine for B2B data. But scraping it is famously difficult.

The Legal Framework

First, let's be clear: Scraping public LinkedIn data was ruled legal (hiQ Labs v. LinkedIn, 2022). But respecting robots.txt and rate limits is still required.

What You Can Scrape

  • Public company pages — About, size, industry, website
  • Public profiles — Name, headline, location (if not logged in)
  • Company employee lists — Aggregated data (not personal)

What You Cannot Scrape

  • Private profiles (only visible when logged in)
  • Messages, endorsements, connections — LinkedIn's ToS explicitly forbids
  • Rate-limited data — LinkedIn blocks aggressively

The Tech Approach

Standard browsers won't work — LinkedIn detects headless Chrome instantly.

XCrawl's approach:

const response = await fetch('https://run.xcrawl.com/v1/scrape', {
  method: 'POST',
  headers: { 'X-API-Key': 'your-key' },
  body: JSON.stringify({
    url: 'https://www.linkedin.com/company/microsoft/',
    proxy: { country: 'US' },
    js_rendering: true
  })
});
Enter fullscreen mode Exit fullscreen mode

Anti-Detection Features Required

  1. Real residential IPs — Datacenter IPs get blocked instantly
  2. Browser fingerprint spoofing — Headers, WebGL, canvas
  3. Human-like interaction patterns — Random delays, scroll behavior
  4. CAPTCHA solving — LinkedIn throws captchas at high volume

Best Practices

  • Start small: 50-100 pages/day, not 10,000
  • Use a dedicated proxy pool per account
  • Store results in structured format (JSON/CSV)
  • Monitor 429 rates and back off immediately

XCrawl handles LinkedIn anti-detection automatically: dash.xcrawl.com

Top comments (0)