agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Scrape Indeed in 2026: Job Listings, Salaries, and Company Reviews

#webscraping #python #api #tutorial

Indeed is the largest job board in the world with over 350 million unique visitors per month. Whether you're building a job aggregator, tracking salary trends, or doing labor market research, Indeed's data is incredibly valuable.

But Indeed doesn't offer a public API. And their anti-bot systems are among the most aggressive I've tested.

In this guide, I'll show you how to scrape Indeed job listings, salary data, and company reviews using Python — and how to handle the anti-bot measures that will try to stop you.

What Data Can You Extract from Indeed?

Indeed has three main data types worth scraping:

Job listings: Title, company, location, salary range, job type (full-time/part-time/contract), posted date, job description
Salary data: Average salaries by job title and location, salary ranges, pay transparency info
Company reviews: Overall rating, work-life balance, compensation, management, culture scores, review text

The Anti-Bot Problem

Indeed uses multiple layers of protection:

Cloudflare WAF — blocks suspicious request patterns
JavaScript challenges — requires browser-like JS execution
Rate limiting — aggressive throttling after a few dozen requests
CAPTCHA walls — triggered by unusual traffic patterns
Session fingerprinting — tracks browser characteristics across requests

A simple requests.get() will get you blocked within 5-10 requests. You need a real strategy.

Method 1: Basic Scraping with requests + BeautifulSoup

This works for small-scale scraping (under 50 pages). You'll need proper headers and delays.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Method 2: Scraping Salary Pages

Indeed has dedicated salary pages at indeed.com/career/{job-title}/salaries. These are less protected than job search:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Method 3: Scraping Company Reviews

Company reviews live at indeed.com/cmp/{company-name}/reviews:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scaling Up: When Basic Scraping Isn't Enough

The code above works for small projects — maybe 100-200 pages per day. But if you need to track thousands of job listings across multiple cities, you'll hit Indeed's anti-bot walls fast.

The main problems at scale:

IP bans after 50-100 requests from the same IP
CAPTCHA challenges that break automated flows
JavaScript rendering that requests can't handle
Session invalidation that kills your cookies mid-scrape

Using ScraperAPI for Scale

ScraperAPI handles the hard parts — proxy rotation, CAPTCHA solving, browser fingerprinting — so you can focus on parsing the data:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

ScraperAPI automatically rotates through millions of residential IPs, solves CAPTCHAs, and handles JavaScript rendering. You pay per successful request instead of managing proxy infrastructure yourself.

Storing and Analyzing Job Data

Once you have the data, store it properly for analysis:

import csv
from datetime import datetime


def save_jobs_to_csv(jobs: list[dict], filename: str = "indeed_jobs.csv"):
    """Save scraped jobs to CSV with timestamp."""
    fieldnames = [
        "title", "company", "location", "salary",
        "job_type", "snippet", "posted", "url", "scraped_at",
    ]

    with open(filename, "a", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        if f.tell() == 0:
            writer.writeheader()
        for job in jobs:
            job["scraped_at"] = datetime.now().isoformat()
            writer.writerow(job)

    print(f"Saved {len(jobs)} jobs to {filename}")

Key Takeaways

Start with the embedded JSON — Indeed's mosaic-data script tag is more reliable than parsing HTML
Rotate user agents and add delays — minimum 3-5 seconds between requests
Use sessions — visit the homepage first to collect cookies before searching
Salary and review pages are less protected than job search — start there if you need that data
For anything over 200 pages/day, use a proxy service like ScraperAPI to handle anti-bot detection

Related Tools

If you're doing job market research at scale, check out the LinkedIn Jobs Scraper on Apify — it handles LinkedIn's anti-bot measures and returns structured data you can combine with your Indeed scrapes.

Need to scrape other job boards? I'm building more scrapers on my Apify profile. Follow for updates.

DEV Community