My LinkedIn Scraper Just Hit Top 20 on Apify — Here's How I Built It

#javascript #webdev #scraping #node

I woke up last week to an email from Apify saying my LinkedIn Employee Scraper had earned the Rising Star badge — meaning it cracked the top 20 actors on the entire platform. 176 users, 2,430 runs, and counting.

This is the story of how a side project built in Nairobi turned into one of the most-used LinkedIn scrapers on Apify.

The Problem: LinkedIn Has No Real API for Employee Data

If you've ever tried to pull employee data from LinkedIn programmatically, you already know the pain. LinkedIn's official API is locked down tight — you need partner status or a Sales Navigator license ($800–$1,200/month) just to get basic company employee info.

For indie developers, recruiters building internal tools, or startups doing competitive intel, that price tag kills the project before it starts. I needed a different approach.

How It Works: Playwright + Crawlee + Anti-Detection

The scraper runs as an Apify Actor using Crawlee (Apify's open-source crawling framework) with Playwright driving a real Chromium browser. Here's the core pattern:

import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();
const input = await Actor.getInput();
const { linkedinUrls = [], maxProfiles = 50 } = input;

const crawler = new PlaywrightCrawler({
  proxyConfiguration: await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
  }),
  launchContext: {
    launchOptions: {
      headless: true,
      args: ['--no-sandbox', '--disable-blink-features=AutomationControlled'],
    },
  },
  minConcurrency: 1,
  maxConcurrency: 2,

  async requestHandler({ page, request }) {
    // Human-like delay between actions
    await page.waitForTimeout(2000 + Math.random() * 3000);

    const employees = await page.$$eval('.org-people-profile-card', cards =>
      cards.map(card => ({
        name: card.querySelector('.artdeco-entity-lockup__title')?.innerText?.trim(),
        title: card.querySelector('.artdeco-entity-lockup__subtitle')?.innerText?.trim(),
        profileUrl: card.querySelector('a')?.href,
      }))
    );

    await Actor.charge({ eventName: 'profile-scraped', count: employees.length });
    await Actor.pushData(employees);
  },
});

await crawler.addRequests(linkedinUrls.map(url => ({ url })));
await crawler.run();
await Actor.exit();

The architecture isn't complicated, but the details are what make it survive in production. LinkedIn is one of the most aggressive anti-bot platforms out there, so every layer matters.

What Actually Keeps It Running

Three things separate a LinkedIn scraper that works once from one that runs 2,430 times without breaking:

Session management. Instead of logging in fresh every run, the scraper persists cookies and reuses sessions. This mimics real user behavior and avoids triggering LinkedIn's "new device" verification flow.

Residential proxies. Datacenter IPs get flagged within minutes on LinkedIn. The actor routes through Apify's residential proxy pool, rotating IPs per request. Each request looks like it comes from a different home internet connection.

Randomized timing. No fixed delays. Every pause between actions uses Math.random() to vary between 2–5 seconds. Linear timing patterns are the easiest signal for bot detection systems to catch.

I also limit concurrency to 1–2 parallel requests max. It's slower, but LinkedIn's rate limiting is harsh enough that going faster just burns through proxy credits with nothing to show for it.

The Numbers

Here's where the scraper stands today:

176 users on Apify Store
2,430 total runs in production
Rising Star badge — top 20 actor on the platform
Pay-per-event pricing at $0.004 per profile scraped

For context, I launched this about a year ago as one of my first Apify actors. It started getting steady traction around the 500-run mark, and growth has been compounding since. The Rising Star badge was a genuine surprise — I didn't realize it had climbed that high until the notification hit my inbox.

Lessons Learned Building Scrapers at Scale

LinkedIn changes its DOM constantly. I've had to update selectors at least four times. If you build a LinkedIn scraper, abstract your selectors into a config object so you can patch them without rewriting handler logic.

Users will throw anything at your actor. Company pages with 50,000 employees, URLs with typos, private profiles, pages behind auth walls. Defensive coding isn't optional — it's the entire job. Every edge case that crashes your actor is a 1-star review waiting to happen.

Pay-per-event pricing works. Charging per profile scraped instead of per run aligns cost with value. Users scraping 10 profiles pay less than users scraping 10,000. This keeps casual users happy while still generating real revenue from power users.

Good README = more users. My most-used actors all have detailed READMEs with input/output examples, Mermaid architecture diagrams, and clear pricing breakdowns. Developers don't install tools they can't understand in 30 seconds.

What's Next

I'm currently running 38+ actors on Apify covering everything from Google Scholar to Telegram channels to OFAC sanctions data. The LinkedIn scraper remains my top performer, and I'm working on v2 with better pagination handling and support for scraping by department filters.

If you're building scrapers and want to see the code, everything is on GitHub. If you just need LinkedIn employee data without building anything, the actor is ready to run on Apify Store.

Apify Store: https://apify.com/george.the.developer
GitHub: https://github.com/the-ai-entrepreneur-ai-hub

Built in Nairobi. Questions about the scraper or Apify actors in general — drop them in the comments.