agenthustler

Posted on Mar 26 • Edited on Apr 19

Building a Job Board Aggregator: Indeed, LinkedIn, and Glassdoor

#python #programming #tutorial #webdev

Why Build a Job Board Aggregator?

Job seekers waste hours checking multiple platforms daily. Recruiters need market intelligence across boards. A job aggregator solves both problems — one API, all listings, structured data.

Let's build a Python aggregator that pulls from Indeed, LinkedIn, and Glassdoor.

Architecture

Our aggregator follows a plugin pattern where each job board gets its own scraper class:

from abc import ABC, abstractmethod
from dataclasses import dataclass, asdict
from typing import List, Optional
import json

@dataclass
class JobListing:
    title: str
    company: str
    location: str
    salary: Optional[str]
    url: str
    source: str
    description: Optional[str] = None
    posted_date: Optional[str] = None

class JobScraper(ABC):
    @abstractmethod
    def search(self, query: str, location: str, pages: int = 1) -> List[JobListing]:
        pass

Indeed Scraper

Indeed is the largest job board. Their listings are rendered server-side, making them relatively easy to parse:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

LinkedIn Scraper

LinkedIn's public job listings do not require authentication:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The Aggregator

Combine all scrapers into a unified interface:

class JobAggregator:
    def __init__(self, api_key):
        self.scrapers = {
            "indeed": IndeedScraper(api_key),
            "linkedin": LinkedInScraper(api_key),
        }

    def search_all(self, query, location, pages=1):
        all_jobs = []
        for name, scraper in self.scrapers.items():
            print(f"Searching {name}...")
            try:
                jobs = scraper.search(query, location, pages)
                all_jobs.extend(jobs)
                print(f"  Found {len(jobs)} jobs")
            except Exception as e:
                print(f"  Error: {e}")

        # Deduplicate by title + company
        seen = set()
        unique = []
        for job in all_jobs:
            key = (job.title.lower(), job.company.lower())
            if key not in seen:
                seen.add(key)
                unique.append(job)

        return unique

    def export_json(self, jobs, filename):
        with open(filename, "w") as f:
            json.dump([asdict(j) for j in jobs], f, indent=2)
        print(f"Exported {len(jobs)} jobs to {filename}")

# Usage
agg = JobAggregator(api_key="YOUR_SCRAPERAPI_KEY")
jobs = agg.search_all("python developer", "San Francisco", pages=3)
agg.export_json(jobs, "sf_python_jobs.json")

Handling Anti-Bot Protection

Job boards use aggressive bot detection:

ScraperAPI handles CAPTCHAs and IP rotation automatically
ThorData residential proxies help avoid IP blocks
Rate limiting is essential — never hit more than 1 request per 2-3 seconds

Monitoring

Track your scraper health with ScrapeOps. Job boards frequently change their HTML structure, and ScrapeOps alerts you when success rates drop.

Taking It Further

Add email alerts for new postings matching criteria
Build a simple web dashboard with Flask/FastAPI
Track salary trends over time
Add filters for remote-only, seniority level, etc.
Store in PostgreSQL for advanced querying

Conclusion

A job board aggregator is a practical project with real users. Whether for personal job searching or building a commercial product, the ability to unify job data across platforms creates significant value.

DEV Community