Vhub Systems

Posted on Mar 28

How to Build a LinkedIn Lead Enrichment Pipeline Without Sales Navigator (Using Apify)

#javascript #webdev #productivity #tutorial

The Sales Navigator Problem at Early-Stage

If you're running outbound at an early-stage company, you've probably looked at Sales Navigator and done the math: $79–$99/month per seat, or $948–$1,188/year. At $0–$2M ARR, that's not a rounding error — that's 40–100% of your entire prospecting budget for a small team.

The frustration isn't that Sales Navigator is overpriced. It's that there's nothing in between that actually works.

The free-tier options are well-known and consistently disappointing:

Apollo.io free tier: 50 credits/month. If you're doing any real volume, that's gone in two days.
Hunter.io: Email-only. No job titles, no seniority, no company size. Useful for one thing.
PhantomBuster: 10 execution minutes per day on the free tier, and LinkedIn actively flags and blocks sessions from known automation IPs.

The buyer who ends up building this pipeline is usually saying something like: "I can't justify that at our ARR but I'm spending 10 hours a week manually filling in LinkedIn job titles." That's the real cost that doesn't show up on the vendor comparison page.

Why Free-Tier Alternatives Don't Scale: The Three Broken Paths

Before getting into the solution, it's worth naming exactly why the standard alternatives break down. There are three paths most early-stage founders try, and all three have the same failure mode: they don't scale past 500 contacts.

Path 1: Manual copy-paste

You open LinkedIn, you find the person, you copy the job title, you paste it into your spreadsheet. Multiply that by 1,000 contacts and you're looking at 15–40 hours of work — time that compounds against you because your backlog grows faster than your output. Every new contact you add to the CRM is another row that needs to be enriched by hand.

Path 2: Stitching together free-tier tools

This looks reasonable on paper. Apollo for some credits, Hunter for emails, PhantomBuster for LinkedIn scraping. In practice:

Apollo's 50 credits/month is exhausted before the end of the first week of any real campaign
Hunter gives you emails but nothing else — no job title, no seniority, no company size
PhantomBuster's LinkedIn session restrictions mean your primary account is at real risk of getting flagged and restricted
The data formats coming out of three different tools are incompatible with each other and require manual reconciliation before CRM import

Path 3: Buying a contact database

Purchasing a pre-built list sounds like it shortcuts the problem. The issue: 30–40% of job titles in most purchased databases are stale at the time of acquisition. People change jobs. Companies get acquired. Titles shift. You're paying for data that's already incorrect, and you're also taking on GDPR exposure depending on how that data was collected.

None of these paths gets you to a reliable, maintained, automated enrichment workflow. That's what the pipeline below is built to do.

How the Apify-Based Enrichment Pipeline Works: The Architecture

The pipeline has four components: input, processing, output, and optional alerting.

Input

A CSV or Google Sheet of LinkedIn profile URLs. These come from wherever your contacts live — CRM exports, LinkedIn search exports, list providers. If you have the LinkedIn URL, you have what you need.

Processing

Two Apify actors run in sequence:

LinkedIn Profile Scraper — extracts job title, company name, company size, industry, seniority level, and location from each profile URL
LinkedIn Company Scraper — supplements the profile-level data with company signals: actual headcount, hiring velocity, recent job posting count

Both actors run on a weekly cron schedule on the Apify platform. New contacts are added to the input sheet and automatically enter the enrichment queue on the next scheduled run.

Output

A Google Sheet in CRM-import format with the following columns: first_name, last_name, job_title, company, company_size, linkedin_url, industry. This schema imports directly into HubSpot, Pipedrive, and Airtable without transformation.

Alerting

An optional Slack notification fires when each enrichment batch completes. This is a one-line Apify webhook configuration — not a separate integration to maintain.

The key architectural point: this is not a manual process. It's not a monthly subscription. It's a pipeline. You configure it once, and it runs.

Step-by-Step Setup Guide

The setup requires no code for the basic configuration. These are the exact steps.

Step 1: Build the input list

Gather LinkedIn profile URLs from your existing sources:

Export contacts from your CRM (HubSpot, Pipedrive, and Airtable all support LinkedIn URL fields)
Use LinkedIn's own search export (with a basic or premium account, you can export search results as CSV)
If using a list provider, require LinkedIn URL as a field in the delivery spec

Add these URLs to a Google Sheet with a single column header: linkedin_url. This is the actor input.

Step 2: Configure the LinkedIn Profile Scraper

In Apify, search for the LinkedIn Profile Scraper actor. The input schema takes:

Your Google Sheet URL (or direct CSV upload)
Output fields to include: job_title, company, company_size, industry, seniority

Run a test batch of 10–20 URLs before scheduling. Verify the output schema matches what your CRM expects. Most issues at this stage are column naming mismatches, not scraper failures.

Step 3: Configure the LinkedIn Company Scraper

Run the LinkedIn Company Scraper actor with the company URLs extracted from the Profile Scraper output. This supplements each contact record with:

Actual employee headcount (not self-reported)
Recent job opening count (a proxy for hiring velocity and growth stage)
Company industry classification

This step is optional but high-value if you're doing account-based outbound where company signals matter.

Step 4: Set up the Google Sheets output

Apify has a native Google Sheets integration. Configure the output dataset to write directly to a target sheet. Use the pre-built output schema: first_name, last_name, job_title, company, company_size, linkedin_url, industry.

Enable deduplication by LinkedIn URL — this prevents duplicate rows when contacts appear in multiple input batches.

Step 5: Schedule the weekly run

In the Apify actor settings, configure a cron schedule. A Sunday evening run works well — data is fresh for Monday prospecting. The schedule runs automatically; new contacts added to the input sheet during the week are enriched on the next cycle.

Step 6: Handle edge cases

Two edge cases come up consistently:

Private profiles: Roughly less than 5% of typical B2B contact lists have fully private LinkedIn profiles. These return no data. Flag them in your output sheet for manual review or skip them entirely.
Job title normalization: The scraper returns raw title strings. "VP of Sales", "Vice President, Sales", and "VP Sales" are the same role. A simple normalization pass (a Google Sheets formula or a one-time Airtable automation) handles this before CRM import. Company size bucketing (1–10, 11–50, 51–200, etc.) similarly benefits from a post-processing step to match your CRM's picklist values.

LinkedIn ToS Considerations and Rate Limit Avoidance

This is the section most tutorials skip, and it's the first thing buyers with prior scraping experience will ask about.

If you got your LinkedIn account flagged using PhantomBuster or a similar tool, the concern is legitimate. Session-based scraping from a primary account using aggressive timing patterns is exactly how accounts get restricted. That experience is worth taking seriously.

Apify actors handle this differently. The LinkedIn scrapers on the Apify platform use built-in session management and rate limiting. Requests are spaced at human-like intervals. IP rotation is managed at the infrastructure level. This doesn't make it risk-free — it means the risk profile is different from running PhantomBuster on your primary account at 3am.

Practical best practices for running this pipeline:

Use a non-primary LinkedIn account for the scraper session. Create a secondary account specifically for data operations.
Rate-limit to reasonable request volume: 200–500 profiles per run is well within normal human browsing behavior patterns over the course of a day.
Avoid off-hours patterns: Schedule runs during business hours when the request patterns look normal.
Focus on public profile data only: The pipeline is designed to extract publicly visible fields — job title, company, industry, location. It does not access private connections, messages, or anything behind a login-required page.

On the legal and compliance question: scraping public LinkedIn profile data sits in a gray area. The hiQ vs. LinkedIn case established that scraping publicly accessible data does not automatically violate the Computer Fraud and Abuse Act, but LinkedIn's ToS explicitly prohibits scraping. These two things coexist. You need to make your own judgment about this based on your risk tolerance and jurisdiction.

This pipeline is designed for public data only. If you're in a regulated industry or have specific legal constraints, run it by your counsel before deploying at scale.

What You Get vs. Sales Navigator: The Comparison

Here's a direct capability comparison between Sales Navigator and the Apify pipeline:

Capability	Sales Navigator	Apify Pipeline
Job title enrichment	✅	✅
Company size	✅	✅
Industry	✅	✅
Seniority level	✅	✅
CRM-import-ready output	✅	✅ (with setup)
Scheduled enrichment refresh	✅	✅ (Apify scheduler)
InMail messaging	✅	❌ (not replicable)
Lead recommendations	✅	❌
Real-time data freshness	✅	✅ (weekly refresh)
Cost	$79–$99/seat/month	~$0–$5/month in Apify compute

The honest read on this table: Sales Navigator does more. InMail and lead recommendations are genuinely useful features that this pipeline doesn't replicate and can't.

But if you're an early-stage founder who needs enrichment — job titles, company size, industry, seniority — and doesn't need InMail outreach through LinkedIn's platform, the pipeline replaces the $79–$99/month function at near-zero marginal cost. That's the specific use case this is built for.

At $1/month in Apify compute for a typical weekly enrichment batch, the ROI calculation is simple: you're getting the data layer that Sales Navigator provides for the enrichment use case at roughly 1–2% of the cost.

Get the Template and Start This Afternoon

Sales Navigator is $99/month — $1,188/year. If you've been spending 10 hours/month on manual enrichment, you already know the math doesn't work at early-stage ARR.

The template below has the Apify actor configurations pre-built, the Google Sheets output schema ready to import, and the CRM mapping docs for HubSpot, Pipedrive, and Airtable. The pipeline gets running in an afternoon. $29, one-time.

Also available as part of the B2B Cold Outbound Launch Pack ($49) — includes the domain warmup checklist, list hygiene audit, and this enrichment pipeline template. Everything you need to build the data layer for your first outbound motion.

If you're at the stage where you're spending real time on manual contact enrichment and can't justify a Sales Navigator seat, this is the gap that pipeline fills.