How to Scrape LinkedIn Company Data Legally and Efficiently

#webscraping #linkedin #tutorial #javascript

How to Scrape LinkedIn Company Data Legally and Efficiently

LinkedIn is a goldmine for B2B data. But scraping it is famously difficult.

The Legal Framework

First, let's be clear: Scraping public LinkedIn data was ruled legal (hiQ Labs v. LinkedIn, 2022). But respecting robots.txt and rate limits is still required.

What You Can Scrape

Public company pages — About, size, industry, website
Public profiles — Name, headline, location (if not logged in)
Company employee lists — Aggregated data (not personal)

What You Cannot Scrape

Private profiles (only visible when logged in)
Messages, endorsements, connections — LinkedIn's ToS explicitly forbids
Rate-limited data — LinkedIn blocks aggressively

The Tech Approach

Standard browsers won't work — LinkedIn detects headless Chrome instantly.

XCrawl's approach:

const response = await fetch('https://run.xcrawl.com/v1/scrape', {
  method: 'POST',
  headers: { 'X-API-Key': 'your-key' },
  body: JSON.stringify({
    url: 'https://www.linkedin.com/company/microsoft/',
    proxy: { country: 'US' },
    js_rendering: true
  })
});

Anti-Detection Features Required

Real residential IPs — Datacenter IPs get blocked instantly
Browser fingerprint spoofing — Headers, WebGL, canvas
Human-like interaction patterns — Random delays, scroll behavior
CAPTCHA solving — LinkedIn throws captchas at high volume

Best Practices

Start small: 50-100 pages/day, not 10,000
Use a dedicated proxy pool per account
Store results in structured format (JSON/CSV)
Monitor 429 rates and back off immediately

XCrawl handles LinkedIn anti-detection automatically: dash.xcrawl.com

DEV Community

How to Scrape LinkedIn Company Data Legally and Efficiently

How to Scrape LinkedIn Company Data Legally and Efficiently

The Legal Framework

What You Can Scrape

What You Cannot Scrape

The Tech Approach

Anti-Detection Features Required

Best Practices

Top comments (0)