Why Build a Job Board Aggregator?
Job seekers waste hours checking multiple platforms daily. Recruiters need market intelligence across boards. A job aggregator solves both problems — one API, all listings, structured data.
Let's build a Python aggregator that pulls from Indeed, LinkedIn, and Glassdoor.
Architecture
Our aggregator follows a plugin pattern where each job board gets its own scraper class:
from abc import ABC, abstractmethod
from dataclasses import dataclass, asdict
from typing import List, Optional
import json
@dataclass
class JobListing:
title: str
company: str
location: str
salary: Optional[str]
url: str
source: str
description: Optional[str] = None
posted_date: Optional[str] = None
class JobScraper(ABC):
@abstractmethod
def search(self, query: str, location: str, pages: int = 1) -> List[JobListing]:
pass
Indeed Scraper
Indeed is the largest job board. Their listings are rendered server-side, making them relatively easy to parse:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
LinkedIn Scraper
LinkedIn's public job listings do not require authentication:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
The Aggregator
Combine all scrapers into a unified interface:
class JobAggregator:
def __init__(self, api_key):
self.scrapers = {
"indeed": IndeedScraper(api_key),
"linkedin": LinkedInScraper(api_key),
}
def search_all(self, query, location, pages=1):
all_jobs = []
for name, scraper in self.scrapers.items():
print(f"Searching {name}...")
try:
jobs = scraper.search(query, location, pages)
all_jobs.extend(jobs)
print(f" Found {len(jobs)} jobs")
except Exception as e:
print(f" Error: {e}")
# Deduplicate by title + company
seen = set()
unique = []
for job in all_jobs:
key = (job.title.lower(), job.company.lower())
if key not in seen:
seen.add(key)
unique.append(job)
return unique
def export_json(self, jobs, filename):
with open(filename, "w") as f:
json.dump([asdict(j) for j in jobs], f, indent=2)
print(f"Exported {len(jobs)} jobs to {filename}")
# Usage
agg = JobAggregator(api_key="YOUR_SCRAPERAPI_KEY")
jobs = agg.search_all("python developer", "San Francisco", pages=3)
agg.export_json(jobs, "sf_python_jobs.json")
Handling Anti-Bot Protection
Job boards use aggressive bot detection:
- ScraperAPI handles CAPTCHAs and IP rotation automatically
- ThorData residential proxies help avoid IP blocks
- Rate limiting is essential — never hit more than 1 request per 2-3 seconds
Monitoring
Track your scraper health with ScrapeOps. Job boards frequently change their HTML structure, and ScrapeOps alerts you when success rates drop.
Taking It Further
- Add email alerts for new postings matching criteria
- Build a simple web dashboard with Flask/FastAPI
- Track salary trends over time
- Add filters for remote-only, seniority level, etc.
- Store in PostgreSQL for advanced querying
Conclusion
A job board aggregator is a practical project with real users. Whether for personal job searching or building a commercial product, the ability to unify job data across platforms creates significant value.
Top comments (0)