The Problem with LinkedIn Data Access
LinkedIn charges $800–$1,200/month for Sales Navigator API access. For developers building lead-gen tools, market research pipelines, or sales automation, that's a brutal barrier. But there's a smarter path: build your own LinkedIn scraper using open-source tools — Playwright, Crawlee, and Apify.
Here's how.
Why Playwright + Crawlee?
Crawlee (by Apify) is a Node.js library built specifically for reliable web scraping. It handles:
- Request queuing — no duplicate URLs, automatic retries
- Browser fingerprint rotation — reduces detection risk
- Session management — persists cookies across requests
- Proxy integration — built-in support for residential proxies
Playwright drives a real Chromium/Chrome browser, meaning JavaScript-rendered content (like LinkedIn's React app) loads normally.
Anti-Detection Tips
LinkedIn is aggressive about bot detection. These measures matter:
- Use residential proxies — datacenter IPs get blocked fast
- Randomize delays — add 2–5s between requests, never linear
- Rotate user agents — match real Chrome versions
- Avoid headless mode — use headless: false or set realistic window sizes
- Persist sessions — re-use cookies so you don't re-login every run
- Limit concurrency — max 1–2 parallel requests on LinkedIn
Basic Apify Actor Structure
Here's a minimal actor that scrapes LinkedIn company employee data:
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';
await Actor.init();
const input = await Actor.getInput();
const { linkedinUrls = [], maxProfiles = 50 } = input;
const crawler = new PlaywrightCrawler({
proxyConfiguration: await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
}),
launchContext: {
launchOptions: {
headless: true,
args: ['--no-sandbox', '--disable-blink-features=AutomationControlled'],
},
},
minConcurrency: 1,
maxConcurrency: 2,
async requestHandler({ page, request }) {
await page.waitForTimeout(2000 + Math.random() * 3000);
const employees = await page.$$eval('.org-people-profile-card', cards =>
cards.map(card => ({
name: card.querySelector('.artdeco-entity-lockup__title')?.innerText?.trim(),
title: card.querySelector('.artdeco-entity-lockup__subtitle')?.innerText?.trim(),
profileUrl: card.querySelector('a')?.href,
}))
);
// Pay-Per-Event pricing — charge per profile scraped
await Actor.charge({ eventName: 'profile-scraped', count: employees.length });
await Actor.pushData(employees);
},
});
await crawler.addRequests(linkedinUrls.map(url => ({ url })));
await crawler.run();
await Actor.exit();
Deploying to Apify
npm install -g apify-cli
apify create my-linkedin-scraper --template playwright-js
apify push
apify call my-linkedin-scraper --input '{"linkedinUrls": ["https://www.linkedin.com/company/google/people/"]}'
The Cost Math
| Approach | Monthly Cost |
|---|---|
| Sales Navigator API | $800–$1,200/month |
| DIY (Apify + Residential Proxies) | $20–$80/month |
| Savings | 90–95% |
Real-World Use Cases
- Sales teams: Build prospect lists from target company employee pages
- Recruiters: Source candidates at scale without LinkedIn Recruiter
- Market researchers: Track hiring signals, team growth, org changes
- Competitive intel: Monitor competitor headcount and hiring patterns
Get the Code
All my scraping actors — including the LinkedIn Employee Scraper with 600+ runs in production — are open-source on GitHub.
GitHub: https://github.com/the-ai-entrepreneur-ai-hub
Built in Nairobi. Running on 45 Apify actors, 599 users, 11,974 total runs.
Top comments (0)