DEV Community

agenthustler
agenthustler

Posted on • Edited on

Building a Job Board Aggregator: Indeed, LinkedIn, and Glassdoor

Why Build a Job Board Aggregator?

Job seekers waste hours checking multiple platforms daily. Recruiters need market intelligence across boards. A job aggregator solves both problems — one API, all listings, structured data.

Let's build a Python aggregator that pulls from Indeed, LinkedIn, and Glassdoor.

Architecture

Our aggregator follows a plugin pattern where each job board gets its own scraper class:

from abc import ABC, abstractmethod
from dataclasses import dataclass, asdict
from typing import List, Optional
import json

@dataclass
class JobListing:
    title: str
    company: str
    location: str
    salary: Optional[str]
    url: str
    source: str
    description: Optional[str] = None
    posted_date: Optional[str] = None

class JobScraper(ABC):
    @abstractmethod
    def search(self, query: str, location: str, pages: int = 1) -> List[JobListing]:
        pass
Enter fullscreen mode Exit fullscreen mode

Indeed Scraper

Indeed is the largest job board. Their listings are rendered server-side, making them relatively easy to parse:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

LinkedIn Scraper

LinkedIn's public job listings do not require authentication:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

The Aggregator

Combine all scrapers into a unified interface:

class JobAggregator:
    def __init__(self, api_key):
        self.scrapers = {
            "indeed": IndeedScraper(api_key),
            "linkedin": LinkedInScraper(api_key),
        }

    def search_all(self, query, location, pages=1):
        all_jobs = []
        for name, scraper in self.scrapers.items():
            print(f"Searching {name}...")
            try:
                jobs = scraper.search(query, location, pages)
                all_jobs.extend(jobs)
                print(f"  Found {len(jobs)} jobs")
            except Exception as e:
                print(f"  Error: {e}")

        # Deduplicate by title + company
        seen = set()
        unique = []
        for job in all_jobs:
            key = (job.title.lower(), job.company.lower())
            if key not in seen:
                seen.add(key)
                unique.append(job)

        return unique

    def export_json(self, jobs, filename):
        with open(filename, "w") as f:
            json.dump([asdict(j) for j in jobs], f, indent=2)
        print(f"Exported {len(jobs)} jobs to {filename}")

# Usage
agg = JobAggregator(api_key="YOUR_SCRAPERAPI_KEY")
jobs = agg.search_all("python developer", "San Francisco", pages=3)
agg.export_json(jobs, "sf_python_jobs.json")
Enter fullscreen mode Exit fullscreen mode

Handling Anti-Bot Protection

Job boards use aggressive bot detection:

  • ScraperAPI handles CAPTCHAs and IP rotation automatically
  • ThorData residential proxies help avoid IP blocks
  • Rate limiting is essential — never hit more than 1 request per 2-3 seconds

Monitoring

Track your scraper health with ScrapeOps. Job boards frequently change their HTML structure, and ScrapeOps alerts you when success rates drop.

Taking It Further

  • Add email alerts for new postings matching criteria
  • Build a simple web dashboard with Flask/FastAPI
  • Track salary trends over time
  • Add filters for remote-only, seniority level, etc.
  • Store in PostgreSQL for advanced querying

Conclusion

A job board aggregator is a practical project with real users. Whether for personal job searching or building a commercial product, the ability to unify job data across platforms creates significant value.

Top comments (0)