How to Scrape LinkedIn Company Data Legally and Efficiently
LinkedIn is a goldmine for B2B data. But scraping it is famously difficult.
The Legal Framework
First, let's be clear: Scraping public LinkedIn data was ruled legal (hiQ Labs v. LinkedIn, 2022). But respecting robots.txt and rate limits is still required.
What You Can Scrape
- Public company pages — About, size, industry, website
- Public profiles — Name, headline, location (if not logged in)
- Company employee lists — Aggregated data (not personal)
What You Cannot Scrape
- Private profiles (only visible when logged in)
- Messages, endorsements, connections — LinkedIn's ToS explicitly forbids
- Rate-limited data — LinkedIn blocks aggressively
The Tech Approach
Standard browsers won't work — LinkedIn detects headless Chrome instantly.
XCrawl's approach:
const response = await fetch('https://run.xcrawl.com/v1/scrape', {
method: 'POST',
headers: { 'X-API-Key': 'your-key' },
body: JSON.stringify({
url: 'https://www.linkedin.com/company/microsoft/',
proxy: { country: 'US' },
js_rendering: true
})
});
Anti-Detection Features Required
- Real residential IPs — Datacenter IPs get blocked instantly
- Browser fingerprint spoofing — Headers, WebGL, canvas
- Human-like interaction patterns — Random delays, scroll behavior
- CAPTCHA solving — LinkedIn throws captchas at high volume
Best Practices
- Start small: 50-100 pages/day, not 10,000
- Use a dedicated proxy pool per account
- Store results in structured format (JSON/CSV)
- Monitor 429 rates and back off immediately
XCrawl handles LinkedIn anti-detection automatically: dash.xcrawl.com
Top comments (0)