agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Scrape Indeed in 2026: Job Listings, Salaries, and Company Reviews

#webscraping #python #api #tutorial

Indeed is the largest job aggregator with over 300 million unique visitors per month. Whether you are building a job board aggregator, analyzing salary trends, or monitoring hiring activity in specific industries, Indeed data is incredibly valuable.

This guide shows you how to extract job listings, salary data, and company reviews from Indeed in 2026 — with working code, honest limitations, and practical workarounds.

What Data Can You Get from Indeed?

Indeed exposes several types of data:

Job listings: title, company, location, salary (when posted), description, date posted
Salary data: reported salaries by job title and location
Company reviews: ratings, pros/cons, CEO approval
Job trends: hiring velocity by industry and region

The most common use case is job listing aggregation — pulling thousands of postings for a specific role or location and analyzing salary ranges, required skills, or remote work availability.

Method 1: Scraping Indeed Search Results with Requests

Indeed search results load server-side, which means you can get listing data with plain HTTP requests — no browser automation needed for the initial results.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Getting Full Job Descriptions

The search results only show a snippet. To get the full job description, you need to fetch each individual job page:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Method 2: Playwright for Dynamic Content

Indeed has been progressively moving to client-side rendering for some features. If the requests approach starts returning incomplete data, Playwright is your fallback:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Indeed Salary Data

Indeed has a dedicated salary section that aggregates reported salaries by job title and location. This data is useful for market research:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Company Reviews

Indeed company review pages contain ratings, written reviews, and pros/cons:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Handling Anti-Bot Protection

Indeed bot detection is moderate compared to sites like LinkedIn or Zillow. Here is what you will face:

Rate limiting: More than ~1 request per second from the same IP will trigger CAPTCHAs
Cookie requirements: Sessions without proper cookies get flagged
CAPTCHA walls: Automated requests eventually hit a CAPTCHA page
IP blocking: Persistent scraping from datacenter IPs gets blocked within minutes

Solutions That Work

For small-scale scraping (under 500 pages): Add delays (2-3 seconds between requests), rotate User-Agent strings, and use a residential proxy.

For production-scale scraping: Use a scraping API that handles anti-bot automatically. ScraperAPI works well for Indeed — their proxy pool and header rotation handles the CAPTCHA problem:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

For comparing scraping API options (pricing, success rates, speed), ScrapeOps maintains a helpful proxy comparison dashboard that benchmarks different providers against popular targets including Indeed.

Exporting to Structured Data

Here is a complete pipeline that scrapes, cleans, and exports Indeed data:

import csv
import json
from datetime import datetime

def export_jobs(jobs: list, fmt: str = "csv"):
    """Export scraped jobs to CSV or JSON."""

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

    # Clean the data
    clean_jobs = []
    seen_keys = set()

    for job in jobs:
        key = job.get("job_key")
        if key and key in seen_keys:
            continue
        if key:
            seen_keys.add(key)

        salary = job.get("salary", "")
        if salary:
            salary = salary.replace("\xa0", " ").strip()

        clean_jobs.append({
            "title": job.get("title", "").strip(),
            "company": job.get("company", "").strip(),
            "location": job.get("location", "").strip(),
            "salary": salary or "Not disclosed",
            "posted": job.get("posted", ""),
            "job_key": key,
            "scraped_at": datetime.now().isoformat()
        })

    if fmt == "csv":
        filename = f"indeed_jobs_{timestamp}.csv"
        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=clean_jobs[0].keys())
            writer.writeheader()
            writer.writerows(clean_jobs)
    else:
        filename = f"indeed_jobs_{timestamp}.json"
        with open(filename, "w", encoding="utf-8") as f:
            json.dump(clean_jobs, f, indent=2, ensure_ascii=False)

    print(f"Exported {len(clean_jobs)} jobs to {filename}")
    return filename

export_jobs(jobs, fmt="csv")

Legal and Ethical Notes

Indeed robots.txt disallows most scraping paths. Their ToS explicitly prohibits automated data collection. That said:

Public job listings are widely considered factual, non-copyrightable data
The hiQ v. LinkedIn precedent suggests that scraping public data is not a CFAA violation
However, commercial redistribution of Indeed data could trigger a civil lawsuit
Indeed actively sends cease-and-desist letters to scrapers that redistribute their data

Best practices: Scrape at respectful rates, do not redistribute raw data, use the data for analysis and insights rather than building a competing job board, and consider whether Indeed official APIs or partner programs might serve your needs.

The Easy Path: Pre-Built Job Scrapers

If you need job data without maintaining scraping infrastructure, the Apify Store has ready-to-use job scrapers including our Glassdoor Jobs and Reviews Scraper for a complementary data source. These handle proxy rotation, anti-bot bypasses, and structured output — you just define your search parameters and get clean JSON or CSV output.

For job market analysis, combining data from multiple sources (Indeed + Glassdoor + LinkedIn) gives you the most complete picture of salary ranges and hiring trends.

Building something cool with job data? Drop a comment below or check out our other scrapers on the Apify Store.

DEV Community