agenthustler

Posted on Mar 20

Best LinkedIn Job Scrapers in 2026: Public Job Data via Apify

#webscraping #python #linkedin #career

LinkedIn is the largest professional job board in the world. What many people don't realize is that LinkedIn exposes job listings through a public endpoint that requires no login — the jobs-guest API. This makes LinkedIn job data accessible for market research, recruiting analytics, and salary analysis without violating account terms.

The LinkedIn Jobs-Guest Endpoint

LinkedIn serves public job listings at URLs like:

https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search
    ?keywords=python+developer
    &location=United+States
    &start=0

This endpoint returns HTML fragments with job cards. No authentication needed. It's the same data LinkedIn shows to logged-out visitors when they browse job listings.

What you can extract:

Job title, company name, location
Posted date, applicant count
Job description (from individual job pages)
Employment type (full-time, contract, etc.)
Seniority level
Industry and function tags

Why Scrape LinkedIn Jobs?

The use cases go well beyond job hunting:

Job market research — track demand for specific skills across regions and industries
Recruiting intelligence — identify which companies are hiring aggressively, what roles they're filling
Salary analysis — correlate job descriptions with salary ranges (when disclosed)
Competitive analysis — monitor competitor hiring patterns to infer strategic direction
Academic research — labor economics, skill demand trends, geographic employment patterns
Career tools — build job aggregators, alert services, or recommendation engines

Approach 1: Apify Store Actors

The Apify Store has several LinkedIn job scrapers. These cloud-based actors handle proxy rotation and pagination automatically.

Key considerations when choosing an actor:

Uses jobs-guest endpoint — actors that require LinkedIn login are riskier and may violate LinkedIn's Terms of Service
Pagination support — LinkedIn caps results at 1000 per search; good actors work around this by narrowing search parameters
Data freshness — check when the actor was last updated; LinkedIn changes its HTML frequently
Output fields — some actors only grab titles and links; better ones extract full job descriptions

Popular actors include general-purpose LinkedIn scrapers as well as job-specific ones. Search "linkedin jobs" on the Apify Store to see current options and their ratings.

Our Upcoming Actor

We're building a focused LinkedIn Jobs scraper at apify.com/cryptosignals/linkedin-jobs-scraper designed around the public jobs-guest endpoint:

No login required — uses only publicly accessible data
Full job details — title, company, location, description, seniority, employment type
Smart pagination — splits broad searches into narrower queries to bypass the 1000-result cap
Structured output — clean JSON with consistent field names

This actor is upcoming and not yet publicly available — check the link for launch updates.

Approach 2: DIY with Python

Here's a basic scraper using the jobs-guest endpoint:

import requests
from bs4 import BeautifulSoup
import time

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    )
}

def search_jobs(keywords, location, max_results=50):
    jobs = []
    for start in range(0, max_results, 25):
        url = (
            "https://www.linkedin.com/jobs-guest/jobs/"
            "api/seeMoreJobPostings/search"
            f"?keywords={keywords}"
            f"&location={location}"
            f"&start={start}"
        )
        resp = requests.get(url, headers=HEADERS)
        if resp.status_code != 200:
            break

        soup = BeautifulSoup(resp.text, "lxml")
        cards = soup.select("div.base-card")

        if not cards:
            break

        for card in cards:
            title_el = card.select_one("h3.base-search-card__title")
            company_el = card.select_one("h4.base-search-card__subtitle")
            location_el = card.select_one("span.job-search-card__location")
            link_el = card.select_one("a.base-card__full-link")

            jobs.append({
                "title": title_el.text.strip()
                    if title_el else None,
                "company": company_el.text.strip()
                    if company_el else None,
                "location": location_el.text.strip()
                    if location_el else None,
                "url": link_el["href"].split("?")[0]
                    if link_el else None,
            })

        time.sleep(2)
    return jobs

results = search_jobs("data engineer", "San Francisco")
print(f"Found {len(results)} jobs")

Getting Full Job Descriptions

The search endpoint gives you summaries. For full descriptions, scrape individual job pages:

def get_job_details(job_url):
    resp = requests.get(job_url, headers=HEADERS)
    soup = BeautifulSoup(resp.text, "lxml")

    desc_el = soup.select_one(
        "div.show-more-less-html__markup"
    )
    criteria = soup.select(
        "li.description__job-criteria-item"
    )

    details = {
        "description": desc_el.text.strip()
            if desc_el else None,
    }

    for item in criteria:
        label = item.select_one("h3")
        value = item.select_one("span")
        if label and value:
            key = label.text.strip().lower().replace(" ", "_")
            details[key] = value.text.strip()

    return details

Approach 3: LinkedIn's Official API

LinkedIn does offer a Jobs API, but access is restricted:

Requires a LinkedIn developer account with approved use case
Limited to certain partner integrations
Rate limits are strict
Not available for general market research

For most use cases — especially research and analytics — the public jobs-guest endpoint is more practical.

Handling LinkedIn's Anti-Scraping Measures

LinkedIn is more aggressive about blocking scrapers than many sites:

IP blocking — rotate proxies or use residential IPs
Rate limiting — keep requests to 1 every 2-3 seconds minimum
HTML changes — LinkedIn updates its markup frequently; expect to update selectors
CAPTCHA — appears after sustained scraping from a single IP

Tips for reliability:

import random

def polite_request(url):
    delay = random.uniform(2.0, 4.0)
    time.sleep(delay)
    try:
        resp = requests.get(
            url, headers=HEADERS, timeout=10
        )
        resp.raise_for_status()
        return resp
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return None

Comparison Table

Feature	Apify Actors	DIY Python	LinkedIn API
Setup time	Minutes	Hours	Days (approval)
Login required	No (jobs-guest)	No (jobs-guest)	Yes
Proxy handling	Built-in	Manual	N/A
Rate limits	Managed	Manual	Strict
Full descriptions	Usually yes	Extra requests	Varies
Cost	Pay per usage	Free + proxies	Free (limited)

Legal Notes

The jobs-guest endpoint serves publicly accessible data to any visitor, logged in or not. The legal landscape:

The hiQ v. LinkedIn case established that scraping public data is not a CFAA violation
LinkedIn's Terms of Service prohibit scraping, but ToS violations are a civil matter, not criminal
Scraping for research, analytics, and building job tools is the lowest-risk use case
Avoid scraping personal profile data — stick to job postings

Getting Started

If you want results today: Search "linkedin jobs" on the Apify Store or check our upcoming LinkedIn Jobs scraper for a dedicated solution.

If you want full control: Start with the Python code above. The jobs-guest endpoint is stable and well-documented by the scraping community.

If you need scale: Use a managed platform with proxy rotation. Scraping 10,000+ job listings from a single IP will get you blocked within minutes.

LinkedIn job data is one of the most valuable datasets for understanding labor markets. With the right tools, you can access it programmatically and build powerful analytics on top of it.

DEV Community