How to Build a LinkedIn Data Pipeline Without Paying for Sales Navigator

#node #javascript #webdev #scraping

The Problem with LinkedIn Data Access

LinkedIn charges $800–$1,200/month for Sales Navigator API access. For developers building lead-gen tools, market research pipelines, or sales automation, that's a brutal barrier. But there's a smarter path: build your own LinkedIn scraper using open-source tools — Playwright, Crawlee, and Apify.

Here's how.

Why Playwright + Crawlee?

Crawlee (by Apify) is a Node.js library built specifically for reliable web scraping. It handles:

Request queuing — no duplicate URLs, automatic retries
Browser fingerprint rotation — reduces detection risk
Session management — persists cookies across requests
Proxy integration — built-in support for residential proxies

Playwright drives a real Chromium/Chrome browser, meaning JavaScript-rendered content (like LinkedIn's React app) loads normally.

Anti-Detection Tips

LinkedIn is aggressive about bot detection. These measures matter:

Use residential proxies — datacenter IPs get blocked fast
Randomize delays — add 2–5s between requests, never linear
Rotate user agents — match real Chrome versions
Avoid headless mode — use headless: false or set realistic window sizes
Persist sessions — re-use cookies so you don't re-login every run
Limit concurrency — max 1–2 parallel requests on LinkedIn

Basic Apify Actor Structure

Here's a minimal actor that scrapes LinkedIn company employee data:

import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();

const input = await Actor.getInput();
const { linkedinUrls = [], maxProfiles = 50 } = input;

const crawler = new PlaywrightCrawler({
  proxyConfiguration: await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
  }),
  launchContext: {
    launchOptions: {
      headless: true,
      args: ['--no-sandbox', '--disable-blink-features=AutomationControlled'],
    },
  },
  minConcurrency: 1,
  maxConcurrency: 2,

  async requestHandler({ page, request }) {
    await page.waitForTimeout(2000 + Math.random() * 3000);

    const employees = await page.$$eval('.org-people-profile-card', cards =>
      cards.map(card => ({
        name: card.querySelector('.artdeco-entity-lockup__title')?.innerText?.trim(),
        title: card.querySelector('.artdeco-entity-lockup__subtitle')?.innerText?.trim(),
        profileUrl: card.querySelector('a')?.href,
      }))
    );

    // Pay-Per-Event pricing — charge per profile scraped
    await Actor.charge({ eventName: 'profile-scraped', count: employees.length });
    await Actor.pushData(employees);
  },
});

await crawler.addRequests(linkedinUrls.map(url => ({ url })));
await crawler.run();
await Actor.exit();

Deploying to Apify

npm install -g apify-cli
apify create my-linkedin-scraper --template playwright-js
apify push
apify call my-linkedin-scraper --input '{"linkedinUrls": ["https://www.linkedin.com/company/google/people/"]}'

The Cost Math

Approach	Monthly Cost
Sales Navigator API	$800–$1,200/month
DIY (Apify + Residential Proxies)	$20–$80/month
Savings	90–95%

Real-World Use Cases

Sales teams: Build prospect lists from target company employee pages
Recruiters: Source candidates at scale without LinkedIn Recruiter
Market researchers: Track hiring signals, team growth, org changes
Competitive intel: Monitor competitor headcount and hiring patterns

Get the Code

All my scraping actors — including the LinkedIn Employee Scraper with 600+ runs in production — are open-source on GitHub.

GitHub: https://github.com/the-ai-entrepreneur-ai-hub

Built in Nairobi. Running on 45 Apify actors, 599 users, 11,974 total runs.

DEV Community