DEV Community: Devil Scrapes

Breezy HR Jobs Scraper: Pull Every Posting From Any Breezy HR Career Board

Devil Scrapes — Sun, 26 Jul 2026 17:42:39 +0000

Breezy HR shows up more than you'd expect

Breezy HR is a lighter-weight ATS that a lot of SMB and mid-market employers reach for — I've seen it running healthcare networks with 600+ open reqs and three-person equipment dealers with a single listing, on the exact same {company}.breezy.hr URL pattern. Every one of those boards is served from one JSON endpoint (https://{company}.breezy.hr/json), no pagination, the whole board in one response.

That simplicity on Breezy's end doesn't mean scraping it at scale is simple on yours — which is why I built a dedicated Breezy HR jobs scraper under Devil Scrapes, tested end to end against boards ranging from a handful of postings to several hundred, so the same code path holds regardless of company size.

What the data looks like

A real row from the dataset:

{
  "posting_id": "5a3b663b95ce",
  "company_slug": "rhynocare",
  "company_name": "RhynoCare",
  "title": "Cook",
  "apply_url": "https://rhynocare.breezy.hr/p/5a3b663b95ce-cook",
  "published_date": "2026-07-25T00:00:00.673Z",
  "job_type": "Full-Time",
  "department": null,
  "salary": "$25 – $27 / hour",
  "location_city": "St. Catharines",
  "location_state": "ON",
  "location_country": "CA",
  "is_remote": false,
  "scraped_at": "2026-07-26T15:00:00Z"
}

salary is deliberately kept as freeform text exactly as published ("$25 – $27 / hour") rather than force-parsed into numeric fields — Breezy employers write pay ranges in enough different formats that guessing at parsing would introduce more noise than it removes.

The naive approach and why it's trickier than it looks

Breezy's /json endpoint is genuinely convenient on paper — no auth, whole board in one call. The gotcha that trips up a quick script is what happens when a company slug is wrong or unused: Breezy doesn't return a clean 404. It redirects to its own marketing site instead, which means a naive scraper will either choke trying to parse marketing-site HTML as job data, or — worse — silently report zero postings for a company that's actually fine, because it never noticed the redirect at all.

Add to that the usual scale problems — a client that doesn't look like a browser at the TLS layer, no backoff when a board briefly rate-limits, one bad slug taking down an entire multi-company batch — and "quick script" turns into an afternoon of edge-case chasing pretty fast.

There's also a proxy wrinkle worth being honest about: Breezy's feed didn't show any blocking in our own testing from a typical residential-style exit IP, but a run through a plain datacenter IP got a flat 403 in our own cloud validation. We ship this Actor defaulting to a residential proxy through Apify Proxy for exactly that reason — we'd rather absorb that cost up front than have your run intermittently fail depending on which exit IP you happened to draw that day.

The Actor

Breezy HR Jobs Scraper detects that marketing-site redirect precisely and skips the company instead of choking on it, rotates Chrome/Firefox/Safari TLS fingerprints via curl-cffi on every request, retries transient 429/5xx responses with backoff up to 5 attempts, and routes every request through a rotating residential proxy session — the same infrastructure we lean on harder the day this target tightens up further.

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/breezy-hr-jobs-scraper").call(run_input={
    "companySlugs": ["rhynocare", "barloworldequipment"],
    "maxResultsPerCompany": 200,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["salary"], item["location_city"])

Or paste the same JSON straight into the Apify Console input form for a one-off pull.

What you'd actually use this for

Recruiting & talent intelligence — track what a Breezy-hosted employer is hiring for, where, and at what pay band, when published.
Hiring-intent signals for BD/SDR teams — a sudden spike in open reqs at a target account is a decent lead qualifier.
Job-board aggregation — Breezy coverage slots next to our SmartRecruiters, Workday, Workable, and Teamtailor scrapers in one pipeline.
Labor-market research — sample hiring demand by industry or region using structured, machine-readable data.
HR-tech pipelines — wire rows straight into a CRM, dashboard, or n8n/Make workflow on a schedule.

Pricing, honestly

$0.005 per run flat, plus $0.0015 per posting written. A 1,000-posting pull costs $1.51 total. A company with an invalid slug or zero open reqs is never billed for rows it didn't produce. Apify's $5 free credit for new accounts covers roughly 3,300 postings before you'd spend anything yourself.

Why this is more interesting than it looks

The redirect-based bad-slug detection is the single most important piece of engineering in this Actor, and it's invisible if you've never hit it: get it wrong and you either crash on marketing-site HTML or — worse — report a false "zero postings" for a company that's actually fine. Getting that distinction right, plus routing consistently through a residential exit IP so a future tightening on Breezy's side doesn't turn into an intermittently-broken run, is where the real reliability work went.

Honest limitations

This covers public postings only, not authenticated internal views. Breezy's list endpoint doesn't publish the full job-description body — only the apply_url, which links to the full text on the HTML page that this Actor never fetches. You supply company slugs, not display names, since a brand like "RhynoCare" may not match its actual subdomain. And it's a point-in-time snapshot — schedule recurring runs to track how a board changes.

Try it

Breezy HR Jobs Scraper is live on the Apify Store at $1.50 per 1,000 results, with Apify's standard $5 free trial credit and no card required to start. It's part of a growing ATS-coverage fleet under Devil Scrapes — BambooHR, Recruitee, Ashby, and Personio are all covered too.

Have you hit a Breezy board that behaves differently from what's described here? Tell me in the comments — that's how the redirect-detection logic in this Actor got as solid as it is.

Personio Jobs Scraper: Pull Every Posting From Any Personio Careers Page

Devil Scrapes — Sun, 26 Jul 2026 17:42:03 +0000

Personio is the default ATS across DACH

If you're doing anything with hiring data in Germany, Austria, or Switzerland, Personio comes up constantly — it's the leading HR/recruiting SaaS across the DACH region, and every company using it to publish open roles gets a careers page at {company}.jobs.personio.de. That page is backed by a public, unauthenticated XML feed (still carrying its legacy workzag-jobs branding under the hood) listing every currently-open position: title, office, department, employment type, seniority, years of experience, and a full job description split into named sections.

There was no dedicated Personio Actor on the Apify Store as of when I built this, despite Personio's footprint being large enough to matter for anyone doing DACH-region recruiting or labor-market research. So I built one under Devil Scrapes: point it at a list of company subdomains and get one clean row per open posting, across as many companies as you want in a single run.

What the data looks like

A real row from the dataset:

{
  "company": "chrono24",
  "position_id": 1234567,
  "job_title": "Senior Backend Engineer",
  "subcompany": null,
  "office": "Karlsruhe",
  "additional_offices": null,
  "department": "Engineering",
  "recruiting_category": "Tech",
  "employment_type": "Full-time",
  "seniority": "Senior",
  "schedule": null,
  "years_of_experience": "3-5 years",
  "keywords": ["backend", "python", "aws"],
  "occupation": "Software Development",
  "occupation_category": "IT",
  "created_at": "2026-06-01T09:00:00",
  "job_url": "https://chrono24.jobs.personio.de/job/1234567",
  "description_html": null,
  "description_text": "We are looking for a Senior Backend Engineer to join our team ...",
  "scraped_at": "2026-07-26T12:00:00Z"
}

description_text is always populated regardless of settings; description_html only fills in when you opt into the raw HTML sections — useful if you're rendering postings somewhere downstream rather than just reading them.

The naive approach and why it's more work than it looks

Personio's XML feed is genuinely public and well-structured, which makes it tempting to think a quick script will do. It mostly will — until you try running it across more than a handful of companies:

Feed availability varies per subdomain. Some companies have deactivated their board, some never had one; your loop needs to treat "zero postings" as a normal outcome and keep going, not as a reason to stop the batch.
Company input normalization. People will hand you a bare subdomain, a full URL, a URL with or without the trailing path — resolving all of those to the same identifier cleanly is fiddly enough that it's easy to get wrong on the first pass.
Fingerprinting and pacing. A bare Python client doesn't look like a browser at the TLS layer, and firing requests across dozens of companies in a tight loop is exactly the pattern that starts drawing attention over time.

The Actor

Personio Jobs Scraper resolves any of those input formats to the right subdomain automatically, walks the XML feed for every company you list, and isolates failures per company so one bad subdomain never derails the rest of the run. We rotate Chrome/Firefox TLS fingerprints via curl-cffi and retry 408/429/5xx with exponential backoff up to 5 attempts, honoring Retry-After when the target sends one.

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/personio-jobs-scraper").call(run_input={
    "companies": ["chrono24", "urbansportsclub"],
    "maxJobsPerCompany": 100,
    "includeHtml": False,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["job_title"], item["office"], item["seniority"])

Or paste the same JSON into the Apify Console input form for a quick one-off pull without writing any code.

What you'd actually use this for

DACH-region recruiter pipelines — pull every open role across a portfolio of German, Austrian, and Swiss employers into one normalized feed, alongside our Teamtailor and Workday coverage.
Hiring-intent signal for BD/sales teams — watch which offices and departments a target account is scaling before it's public news.
Job-board aggregation — fold Personio postings into a multi-ATS feed alongside Multi-ATS and SmartRecruiters coverage.
Labor-market research — a structured, machine-readable sample of DACH-region postings for wage- or skills-demand studies.
Competitive intelligence — track a rival's open roles over time to infer which office or team is actually scaling.

Pricing, honestly

$0.005 per run flat, plus $0.0015 per posting written to the dataset. A 1,000-posting pull costs $1.505 total. A company with zero current openings costs nothing beyond the warm-up fee — you don't pay for rows that never existed. Apify's $5 free trial credit for new accounts covers roughly 3,300 postings before you'd spend anything yourself.

Why this is more interesting than it looks

The part that took actual iteration wasn't the XML parsing itself — it was normalizing Personio's inconsistently-populated optional fields (subcompany, schedule, additional_offices) across companies that each configure Personio slightly differently, without inventing a value the source feed never actually provided. Getting a stable schema that holds across an arbitrary set of DACH employers, most of whom you'll never have tested against directly, is most of the real engineering here.

Honest limitations

This Actor covers Personio only — Greenhouse, Lever, Ashby, Workday, SmartRecruiters, Workable, Teamtailor, and BambooHR live in sibling Actors. It only resolves the bare {company}.jobs.personio.de subdomain form — a company fronting Personio behind its own careers.example.com domain isn't resolvable this way. There's no expiry field either; Personio's feed only ever shows currently-open postings, with no way to tell how long a listing has been live beyond its created_at timestamp. And it's a live-only snapshot — schedule recurring runs if you want to track change over time.

Try it

Personio Jobs Scraper is live on the Apify Store at $1.50 per 1,000 results, with Apify's standard $5 free trial credit and no card required to start. The devil's in the data — and it's one of several ATS-coverage Actors we run under Devil Scrapes, alongside BambooHR, Ashby, Recruitee, and Breezy HR.

If you're tracking hiring across DACH and hitting a Personio board this doesn't handle right, tell me in the comments — that's exactly the feedback that ends up in next week's fix.

Ashby Jobs Scraper: Pull Every Posting From Any Ashby Job Board

Devil Scrapes — Sun, 26 Jul 2026 17:41:27 +0000

Ashby is where venture-backed hiring lives

If you're tracking hiring at startups and scale-ups, Ashby shows up constantly. It's become one of the default ATS choices for venture-backed companies, and its careers pages (jobs.ashbyhq.com/{orgSlug}) tend to publish more than most — real team names, workplace type (remote/hybrid/on-site), and often a compensation band, since a chunk of Ashby's customer base operates in markets with pay-transparency requirements.

That combination — startup hiring signal plus actual comp data — is exactly why I built a dedicated Ashby jobs scraper under the Devil Scrapes banner, separate from the Ashby coverage baked into our broader multi-ats-jobs-scraper. Ashby's board endpoint is different enough (its own GraphQL host, its own field set) to justify a purpose-built Actor rather than folding it into a generic multi-ATS tool.

What the data looks like

A single row from a real run:

{
  "posting_id": "7af121a1-d29a-4745-84c1-ef1b58a3b840",
  "title": "3P Silicon Architect",
  "org_slug": "openai",
  "team_name": null,
  "location_name": "San Francisco",
  "workplace_type": "Hybrid",
  "employment_type": "FullTime",
  "secondary_locations": ["Seattle"],
  "compensation_summary": "$342K – $555K • Offers Equity",
  "apply_url": "https://jobs.ashbyhq.com/openai/7af121a1-d29a-4745-84c1-ef1b58a3b840",
  "scraped_at": "2026-07-26T15:00:00Z"
}

compensation_summary is the field most scrapers skip because it requires trusting the org actually published one — we surface it exactly as given, null when it wasn't, never a guess.

The naive approach and why it stalls

Ashby's board data lives behind an internal GraphQL endpoint (jobs.ashbyhq.com/api/non-user-graphql?op=ApiJobBoardWithTeams) rather than a documented public API, which means the honest starting point for most people is opening devtools, watching the careers page load, and copying the request. That gets you a proof of concept in twenty minutes. It gets messier once you try to run it against more than one org reliably:

GraphQL isn't a stable public contract. There's no versioned docs page — you're reverse-engineering the operation name and payload shape from network traffic, and you need to handle the response's nested teams hierarchy correctly to get real department-style groupings instead of guessing.
Team names require a join, not a lookup. team_name isn't sitting flat on each posting — it's referenced by ID against a separate teams array in the same response, and getting that join wrong silently produces blank team fields.
A batch across many orgs needs resilience. One org having zero open roles, or a slug that doesn't exist at all, shouldn't take down the rest of a 20-org run.

The Actor

Ashby Jobs Scraper fetches an org's entire board in one GraphQL call — no pagination — joins the team hierarchy into a clean team_name field, and constructs a guaranteed-live apply_url without a second HTTP request. We rotate browser TLS fingerprints and retry transient 429/5xx responses with backoff so a multi-org batch finishes even when one org's board hiccups mid-run.

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/ashby-jobs-scraper").call(run_input={
    "orgSlugs": ["openai", "ramp"],
    "maxResultsPerOrg": 100,
    "locationFilter": None,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["team_name"], item["compensation_summary"])

Or drop the same JSON straight into the Apify Console input form for a one-off pull.

What you'd actually use this for

Recruiting & sourcing against startups — track what a target company is hiring for, on which team, at what comp band, all normalized.
Hiring-intent signal for GTM/SDR teams — a burst of open reqs at a funded startup is a leading indicator worth watching before a headcount announcement goes public.
Job-board aggregation — Ashby coverage slots next to our Workday, SmartRecruiters, and Multi-ATS scrapers.
Compensation benchmarking — a structured feed of published comp bands across companies that use Ashby, useful for comp-research and market-rate studies.
HR-tech pipelines — wire rows straight into a CRM, dashboard, or n8n/Make workflow on a schedule.

Pricing, honestly

$0.005 per run flat, plus $0.0015 per posting written. A 1,000-posting run costs $1.505 total. An org with zero open roles costs nothing beyond the warm-up fee. Apify's $5 free credit for new accounts covers roughly 3,300 postings before you spend anything yourself.

Why this is more interesting than it looks

The genuinely tricky part isn't the fetch — Ashby returns a full board in one response, which is a gift compared to paginated APIs. It's building the apply_url reliably without a second request. Ashby's schema doesn't hand you a ready-made canonical link on every posting; constructing one that's guaranteed to resolve, across arbitrary orgs you've never scraped before, without an extra round-trip per posting, is where the real work went.

Honest limitations

This Actor covers Ashby only — Greenhouse, Lever, Workday, and SmartRecruiters live in sibling Actors. There's no first-class department field in Ashby's schema; team_name is the closest real grouping signal it exposes. There's no org-name enrichment either — org_slug is exactly the input slug you gave us, since the endpoint itself doesn't expose a display name. And it reflects the board's current, live state only — no historical or removed postings.

Try it

Ashby Jobs Scraper is live on the Apify Store at $1.50 per 1,000 results, with Apify's standard $5 free trial credit and no card needed to start. It's part of a growing ATS-coverage fleet under Devil Scrapes — BambooHR, Recruitee, Personio, and Breezy HR are all covered too.

What's the trickiest Ashby board you've tried to scrape? Drop it in the comments — genuinely curious what edge cases are out there.

Recruitee Jobs Scraper: Pull Every Posting From Any Recruitee Career Site

Devil Scrapes — Sun, 26 Jul 2026 17:40:51 +0000

Why Recruitee boards are worth scraping

Recruitee is a popular ATS for mid-market companies across Europe, and every employer running their careers page on it — {company}.recruitee.com — is quietly serving the entire job board from one public JSON endpoint. That's genuinely useful: unlike ATS platforms that paginate or gate descriptions behind a second call, Recruitee's list endpoint already inlines the full HTML description, salary fields, and location data in a single response.

The catch, as with every ATS scraping job, is doing this reliably across more than one or two companies. I run a small fleet of ATS-specific Actors under Devil Scrapes, and Recruitee was a clean gap: solid data density, salary fields when published, and no dedicated coverage on the Apify Store worth the price it charged.

What the data looks like

Here's one real row from the dataset:

{
  "job_id": 2680730,
  "guid": "dl9ua",
  "slug": "data-architect",
  "title": "Data Architect",
  "company_id": "auditdata",
  "company_name": "Auditdata",
  "department": "R&D",
  "category_code": "information_technology",
  "employment_type_code": "fulltime_permanent",
  "experience_code": "experienced",
  "education_code": "bachelor_degree",
  "city": "remote",
  "country": "Poland",
  "country_code": "PL",
  "remote": true,
  "hybrid": false,
  "on_site": false,
  "salary_min": null,
  "salary_max": null,
  "salary_period": null,
  "salary_currency": null,
  "description_html": "<h4>...</h4><p>... HTML string ...</p>",
  "careers_url": "https://auditdata.recruitee.com/o/data-architect",
  "careers_apply_url": "https://auditdata.recruitee.com/o/data-architect/c/new",
  "published_at": "2026-07-21T07:48:42+00:00",
  "updated_at": "2026-07-21T07:48:42+00:00",
  "scraped_at": "2026-07-26T15:00:00+00:00"
}

Salary fields are null here because this employer didn't publish a band — when they do, salary_min/salary_max come back as clean floats instead of Recruitee's raw numeric strings.

The naive approach and why it gets messy fast

Recruitee's list endpoint is genuinely convenient — one request per company gets you everything, no second call for descriptions. That said, going from "I found the endpoint in devtools" to "I have a reliable pipeline" still runs into real friction:

Recruitee's own schema is inconsistently populated. salary sometimes arrives as a nested object with null fields, department is sometimes missing entirely — you need to normalize these into stable nullable fields rather than crashing on a KeyError the third time you hit a company that omitted a field.
A 404 on a bad company ID isn't a real failure. If you type a company subdomain wrong or an employer's board is currently empty, that's zero postings — not a crash. Getting that distinction right across a large batch matters more than it sounds.
Fingerprinting. A bare Python HTTP client doesn't replicate the TLS handshake a real browser sends, and once you're firing requests across many companies in a loop, that starts to matter.

The Actor

Recruitee Jobs Scraper does the one-call-per-company fetch for you, handles Recruitee's inconsistent nested objects, and treats a per-company zero-result response as a normal outcome rather than a run-ending error. We rotate browser TLS fingerprints on every request and retry transient failures with backoff, so a batch of fifty companies doesn't die because one hiccuped.

Python SDK example:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/recruitee-jobs-scraper").call(run_input={
    "companyIds": ["auditdata"],
    "maxResultsPerCompany": 50,
    "includeDescription": True,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["city"], item["salary_min"])

Or paste the same JSON straight into the Apify Console input form if you just want a quick pull without writing a script.

What you'd actually use this for

Recruiting & talent intelligence — track what a target Recruitee-hosted employer is hiring for, by department and location.
Hiring-intent signal for BD/SDR teams — open reqs, especially a sudden cluster in one department, are a decent proxy for growth before it's public news.
Job-board aggregation — Recruitee coverage slots next to our Ashby, Workday, SmartRecruiters, and Multi-ATS scrapers into one hiring-intel pipeline.
Compensation research — where employers publish salary bands, this is one of the few ATS platforms that gives you clean numeric fields for it instead of freeform text.
ATS data pipelines — wire structured rows straight into a CRM, dashboard, or n8n/Make workflow on a schedule.

Pricing, honestly

$0.005 per run as a flat warm-up fee, plus $0.0015 per posting written to your dataset. A 1,000-posting pull runs $1.51 total. Nothing is charged for a company that returns zero postings beyond the warm-up. Apify's $5 free credit for new accounts covers roughly 3,300 postings before you'd spend anything out of pocket.

The interesting part

The fiddly engineering here isn't the HTTP call, it's the schema cleanup. Recruitee's API returns null-riddled nested objects inconsistently — a salary object with all-null children, a department that's sometimes a string and sometimes absent entirely. Turning that into stable, always-present nullable fields (rather than crashing on the third company that structures things slightly differently) is most of what makes this reliable across an arbitrary batch of employers you've never scraped before.

Honest limitations

This only covers publicly published boards, not authenticated internal views. You supply company IDs (the subdomain from the careers URL), not display names — "Auditdata" the brand and auditdata the subdomain aren't guaranteed to match. And every sampled company we've tested returns its full board in one response with no pagination — an unusually large board comes back as-is, whatever that single response contains.

Try it

Recruitee Jobs Scraper is live on the Apify Store at $1.50 per 1,000 results, with Apify's standard $5 free trial credit and no card required to start. It's one Actor in a growing ATS-coverage fleet under Devil Scrapes — BambooHR, Ashby, Personio, and Breezy HR are covered too.

Got a Recruitee board that behaves differently from what's documented here? Drop it in the comments — that's exactly the kind of edge case we fix fast.

BambooHR Jobs Scraper: Pull Every Open Role From Any BambooHR Careers Site

Devil Scrapes — Sun, 26 Jul 2026 17:40:16 +0000

The hiring-signal hiding in plain sight

BambooHR runs the careers page for a huge slice of the SMB and mid-market world — everything from regional healthcare networks to ballet companies. If you've ever needed to answer "who is this company hiring right now, and for what," BambooHR's public {company}.bamboohr.com/careers page is usually where the answer lives. The catch is that there's no dashboard for pulling that data across dozens of companies at once, no bulk export, and no official API key you can just request.

I build small, focused scrapers for a living (under the Devil Scrapes banner), and BambooHR kept coming up as a gap: recruiters, SDRs, and labor-market researchers all wanted "every open req at this employer" without hand-copying a careers page. So I built a BambooHR jobs scraper that talks to BambooHR's own careers JSON endpoints directly and normalizes the output into one schema across every tenant.

What the data looks like

Here's a single row from a real run — this is what lands in your dataset per posting:

{
  "company": "adobe",
  "job_id": 15,
  "job_title": "IT Security Engineer",
  "job_url": "https://adobe.bamboohr.com/careers/15",
  "department": "IT",
  "employment_type": "Full-Time",
  "job_status": "Open",
  "location_city": "Mayfair",
  "location_state": "London, City of",
  "location_country": "United Kingdom",
  "is_remote": null,
  "date_posted": "2025-11-29",
  "description_html": "<p><strong>About Us</strong></p>...",
  "description_text": "About Us Our mission is simple...",
  "scraped_at": "2026-07-26T12:00:00Z"
}

Note is_remote is a tri-state boolean, not a forced true/false — when the employer never said, we leave it null instead of guessing "on-site." That distinction matters if you're building any kind of remote-hiring filter downstream.

The naive approach, and where it falls apart

If you open devtools on any BambooHR careers page, you'll find the /careers/list endpoint returning JSON almost immediately — it looks trivial at first glance. The friction shows up once you try to do this at any real scale:

Company-by-company inconsistency. Some subdomains have zero open jobs, some 404 because the company deactivated its careers page, some reject a custom-domain front-end. Your script needs to treat all three as normal outcomes, not exceptions that kill the whole batch.
Full descriptions are a second endpoint. The list feed gives you titles and locations; the actual HTML job description body lives behind a separate /careers/{id}/detail call per posting, which means orchestrating N+1 requests cleanly.
Client fingerprinting. A bare requests.get() from a script looks nothing like a browser at the TLS/HTTP2 layer, and once you're doing this against dozens of company subdomains in a loop, that difference starts mattering.

None of this is exotic, but it's exactly the kind of "boring at small scale, annoying at real scale" work that eats an afternoon you didn't plan to spend.

The Actor

This is where BambooHR Jobs Scraper comes in. We rotate Chrome/Firefox TLS fingerprints via curl-cffi on every request, retry 408/429/5xx with exponential backoff up to 5 attempts, and isolate failures per company so one bad subdomain never sinks the rest of your batch. You get back clean, Pydantic-validated rows — no half-parsed HTML, no silently-empty datasets.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/bamboohr-jobs-scraper").call(run_input={
    "companies": ["adobe", "nycballet"],
    "maxJobsPerCompany": 50,
    "fetchFullDescription": True,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["job_title"], item["location_city"])

Or fire it directly from the Apify Console with the same JSON as input — no code required if you just want a one-off pull.

What you'd actually use this for

Recruiting pipelines — build sourcing lists against every BambooHR-hosted employer in a target industry, normalized into one schema.
Hiring-intent signal for sales/BD — a sudden burst of open reqs at a target account is a decent proxy for headcount growth or a new-market push, before it shows up anywhere else.
Job-board aggregation — BambooHR coverage slots next to our Teamtailor, Workday, SmartRecruiters, Workable, and Multi-ATS scrapers to build one hiring-intel feed across the whole ATS landscape.
Labor-market research — sample hiring demand by industry, region, or company size band with a structured, machine-readable dataset.
Competitive intelligence — watch a specific competitor's open reqs over time to infer where they're actually growing.

Pricing, honestly

The Actor charges $0.005 per run (a flat warm-up fee) plus $0.0015 per job posting written to your dataset. A 1,000-posting pull costs $1.51 total. A company with zero current openings costs nothing beyond the warm-up fee — you're not billed for rows that never existed. Apify gives every new account $5 of free credit, no card required, which covers roughly 3,300 postings before you'd spend a cent of your own money.

Why this is more interesting than it looks

The genuinely fiddly part wasn't the HTTP calls — it was normalizing BambooHR's own inconsistencies across tenants. Some companies fill in department, some leave it null; some populate location_state with a full region name, others with an abbreviation; is_remote is sometimes explicit and usually isn't. Building one stable schema that holds up across hundreds of independently-configured BambooHR instances, without silently inventing values the source data doesn't actually contain, is most of the real engineering here.

Honest limitations

This Actor only resolves {subdomain}.bamboohr.com addresses — if a company fronts its BambooHR careers page with a custom domain, you'll need the underlying BambooHR subdomain instead. There's no compensation field either, because BambooHR's public feed doesn't expose structured salary data on any company we've checked — we don't invent one. And it's a point-in-time snapshot: schedule recurring runs if you want to track how a board changes over time.

Try it

BambooHR Jobs Scraper is live on the Apify Store now, priced at $1.50 per 1,000 results with Apify's standard $5 free trial credit (no credit card needed to start). It's one of several ATS-coverage Actors we run under Devil Scrapes — if you need Recruitee, Ashby, Personio, or Breezy HR coverage too, those are live as well.

What ATS is your target company running that isn't covered yet? Drop it in the comments — that's usually how the next Actor in this fleet gets picked.

Multi-ATS Jobs Scraper: One Normalized Feed for Greenhouse, Lever & Ashby

Devil Scrapes — Sat, 25 Jul 2026 09:36:16 +0000

Multi-ATS Jobs Scraper: One Normalized Feed for Greenhouse, Lever & Ashby

Greenhouse, Lever, and Ashby together power the hiring pipelines of a huge share of tech companies — but they're three different APIs with three different field names, and you don't always know upfront which one a given company runs on. If you're building a hiring-intel dataset, a job aggregator, or a "who's hiring" signal for sales, that ambiguity is the actual bottleneck, not the scraping itself. This post covers how the Multi-ATS Jobs Scraper collapses all three into one schema.

Why cross-ATS job data is worth pulling

Each platform publishes structured, public postings data — no login needed to see them, since they're built for candidates to browse:

Greenhouse: boards-api.greenhouse.io/v1/boards/{slug}/jobs
Lever: api.lever.co/v0/postings/{slug}?mode=json
Ashby: api.ashbyhq.com/posting-api/job-board/{slug}

That's useful for:

HR-tech builders assembling cross-company hiring datasets without hand-picking which API to call per employer
SDR / sales-intel teams using "who is hiring for X" as a buying-intent signal
Job-aggregator and ATS-analytics platforms that need one schema instead of three

The naive approach, and where it falls apart

Say you want Stripe, Palantir, and Ramp's open roles. Stripe runs Greenhouse, Palantir runs Lever, Ramp runs Ashby — and nothing about a bare company name tells you that upfront. You'd need to either maintain a lookup table of company-to-ATS mappings (which goes stale) or probe each API in turn and see what returns a 200. Then, even once you know which platform, the field names don't line up: Greenhouse has no team field distinct from department; Lever and Ashby do. Only Ashby exposes structured salary. Only Greenhouse reliably reports updated_at. Writing three separate parsers and reconciling them into one row shape is the actual work — not any single API call.

What we built instead

The Multi-ATS Jobs Scraper takes a bare company slug ("stripe"), a full careers URL ("https://jobs.lever.co/palantir"), or an explicit {atsType, companySlug} object if you already know the platform — and auto-detects which of the three it's dealing with when you don't tell it. Every row comes back in one consistent schema regardless of source platform, with curl-cffi fingerprint impersonation, retry-with-backoff on 429/5xx, and Apify Proxy session rotation absorbing the parts that make a large multi-company sync fragile.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/multi-ats-jobs-scraper").call(run_input={
    "companies": [
        {"atsType": "greenhouse", "companySlug": "stripe"},
        "https://jobs.lever.co/palantir",
        "https://jobs.ashbyhq.com/ramp",
    ],
    "maxItemsPerCompany": 5,
    "includeDescription": True,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["company"], item["ats"], item["title"], item["salary_min"])

Or paste the same array into the Store page input form directly.

Sample output row:

{
  "company": "ramp",
  "ats": "ashby",
  "job_id": "3b1c2e9a-0000-0000-0000-000000000000",
  "title": "Senior Software Engineer, Payments",
  "department": "Engineering",
  "team": "Payments",
  "location": "New York City",
  "remote": false,
  "salary_min": 211400,
  "salary_max": 290600,
  "salary_currency": "USD",
  "url": "https://jobs.ashbyhq.com/ramp/3b1c2e9a-0000-0000-0000-000000000000",
  "posted_at": "2026-05-02T18:11:00.000Z",
  "updated_at": null,
  "description_html": "<p>About the role...</p>"
}

The technically interesting bit

Auto-detection sounds like a simple try-each-API-in-turn loop, and mostly it is — but the edge case is a slug that happens to resolve on more than one platform. It's rare (we haven't hit it in practice), but when it happens we keep the platform with more open postings and log a warning rather than silently picking one arbitrarily. If you need a guarantee rather than a best-effort tie-break, the explicit {atsType, companySlug} form skips detection entirely and runs slightly faster. We'd rather document that edge case than pretend detection is infallible.

Pricing — no fine print

Pay-Per-Event: $0.005 per run plus $0.0015 per job posting written to the dataset — $1.50 / 1,000 results. A run collecting 500 postings across three companies costs roughly $0.75 in result charges plus the $0.005 start fee. Empty runs cost only the start fee. New Apify accounts get $5 of free credit, no card required.

Honest limitations

Salary is Ashby-only — Greenhouse and Lever don't publish a structured comp field, and we won't regex-guess one out of free-text descriptions, so salary_min/salary_max/salary_currency are honestly null for those platforms. team is Lever/Ashby only since Greenhouse doesn't split team from department. And this only reaches public, unauthenticated boards — internal/private listings need employer credentials this Actor doesn't support.

Try it, and the rest of the fleet

Greenhouse, Lever, and Ashby are three platforms in a bigger landscape. We also run the Workday Jobs Scraper, the SmartRecruiters Jobs Scraper, and the Workable Jobs Scraper — same keyless, pay-per-result approach, covering the rest of the major ATS market.

Grab the Multi-ATS Jobs Scraper on the Apify Store and try it with the free $5 credit. Found a company slug that returns wrong results, or hit an ATS quirk we don't handle? Open a support ticket or leave a review — we read every one. Browse the rest of the fleet at apify.com/DevilScrapes.

Which three companies would you point this at first?

Workable Jobs Scraper: Pull Every Posting From Any Workable Careers Board

Devil Scrapes — Sat, 25 Jul 2026 09:35:41 +0000

Workable Jobs Scraper: Pull Every Posting From Any Workable Careers Board

Workable is one of the most common ATS choices for startups and mid-size companies — if you've ever seen a careers page at apply.workable.com/{subdomain} or {subdomain}.workable.com, that's it. Thousands of employers run on it, and every one of those boards is served by the same public widget API. This post covers what that data looks like, why hand-rolling a scraper against it gets annoying past one company, and how the Workable Jobs Scraper handles it for you.

Why Workable job data is worth pulling

Workable exposes open postings — title, department, location, remote flag, and description — through a public widget endpoint that any careers page already calls client-side. No login, no API key. That makes it a solid signal for:

Recruiters and sourcers pulling every open req from a target employer's board in one pass
Job-board aggregator operators who want Workable coverage alongside Workday, SmartRecruiters, Greenhouse, Lever, and Ashby
SDR / BD teams using active hiring as a buying-intent signal — a company scaling its engineering team is a company that might need your product
HR-tech pipeline builders wiring structured rows into a CRM, dashboard, or n8n/Make workflow on a schedule

The naive approach, and where it gets annoying

apply.workable.com/api/v1/widget/accounts/{subdomain} returns a full board in one call, which sounds like the easy path — and for one company, it basically is. The friction shows up in the details: Workable accepts both {subdomain}.workable.com and apply.workable.com/{subdomain} as valid public URL forms and you have to normalize both to the same subdomain before you can hit the widget endpoint. The server-side facet filters that look like they should narrow results (department, location) don't reliably work against this endpoint, so filtering has to happen after the fetch, not before. And subdomains that don't exist need to fail gracefully rather than take down a whole multi-company batch. Multiply that across dozens of subdomains and it's a script you're now maintaining, not writing once.

What we built instead

The Workable Jobs Scraper hits the same public widget API and absorbs the annoying parts: curl-cffi browser-fingerprint impersonation on every request, retries with backoff on transient 429/5xx, and normalization across whichever mix of subdomains you feed it. It also documents the client-side-filter constraint honestly instead of pretending server-side filters work — searchQuery, department, and location all run after the single fetch per company, at no extra request cost, so you're never billed twice for the same board.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/workable-jobs-scraper").call(run_input={
    "companies": ["remotebase"],
    "maxResultsPerCompany": 25,
    "includeDescription": True,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["location_country"], item["is_remote"])

Or paste the same JSON into the Store page input form directly.

Sample output row:

{
  "job_id": "6E7795E82F",
  "title": "AI Engineer",
  "company": "remotebase",
  "company_name": "Remotebase",
  "department": "Core",
  "location_city": null,
  "location_country": "Pakistan",
  "is_remote": true,
  "url": "https://apply.workable.com/j/6E7795E82F",
  "posted_date": "2026-04-01",
  "description_html": null,
  "scraped_at": "2026-07-23T15:00:00Z"
}

The technically interesting bit

Workable accepts a bare subdomain, apply.workable.com/{subdomain}, or {subdomain}.workable.com as equally valid public entry points, but they don't all resolve identically against the widget API — you have to extract the actual account subdomain regardless of which form someone pastes in. And the facet filters that look like a free server-side narrowing option (department=, location=) are a trap: they don't reliably filter this specific endpoint's results, so a scraper that trusts them silently returns incomplete data. We treat that as a documented constraint, not a bug to route around with retries — filtering happens client-side, after the one fetch, so you always get the full board first.

Pricing — no fine print

Pay-Per-Event: $0.005 per run plus $0.0015 per job posting written to the dataset — $1.50 / 1,000 results. No data lands, no charge beyond the warm-up fee. New Apify accounts get $5 of free credit, no card required.

Honest limitations

A posting listed at multiple locations emits only its first location as one row. Only currently-live postings are returned — there's no historical archive of removed postings. And as noted above, filtering runs client-side after the fetch, since Workable's own facet params don't reliably narrow this endpoint.

Try it, and the rest of the fleet

Workable is one piece of the ATS puzzle. We also run the Workday Jobs Scraper, the SmartRecruiters Jobs Scraper, and the Multi-ATS Jobs Scraper covering Greenhouse, Lever, and Ashby in one normalized schema — same keyless, pay-per-result model throughout.

Grab the Workable Jobs Scraper on the Apify Store and try it with the free $5 credit. Hit a board that behaves differently, or need a field we don't expose? Reach out via DevilScrapes on Apify — we ship fixes fast. Browse the rest of the fleet there too.

Which Workable board would you pull first — and what would you do with the "actively hiring" signal?

SmartRecruiters Jobs Scraper: Pull Any Employer's Postings From One Keyless API

Devil Scrapes — Sat, 25 Jul 2026 09:34:30 +0000

SmartRecruiters Jobs Scraper: Pull Any Employer's Postings From One Keyless API

Thousands of mid-size and enterprise employers run their careers page on SmartRecruiters — Visa, Bosch, and a long tail of companies you'd recognize. If you're building recruiting pipelines, a job-board aggregator, or a competitive hiring-intel feed, SmartRecruiters coverage is table stakes. This post walks through what the data looks like, why the naive scraping approach breaks down at scale, and how the SmartRecruiters Jobs Scraper gets you clean rows instead.

Why SmartRecruiters data is worth pulling

Every SmartRecruiters careers board (jobs.smartrecruiters.com/{companyId}) exposes structured posting data — title, location, department, function, employment type, and a canonical apply URL — through the same public JSON API the page itself calls. That's useful for:

Recruiters and sourcers tracking open reqs at target employers by function and location
Job-board aggregator operators who need SmartRecruiters alongside Workday, Greenhouse, Lever, and Ashby
Competitive-intel analysts inferring a rival's growth areas from where they're actively hiring
ATS/data-pipeline builders wiring structured job rows into a CRM or workflow tool

The naive approach, and where it falls apart

api.smartrecruiters.com/v1/companies/{companyId}/postings is a real, documented-ish public endpoint, and a single curl against it will return JSON. The friction shows up once you're doing this for more than one company: you need to paginate correctly, apply the server-side filters (q, department, city, country) without silently dropping rows, handle the detail endpoint for full descriptions without doubling your request count unnecessarily, and survive transient 429s across a batch of dozens of employers without the whole run dying on employer #12. None of that is hard in isolation — it's the compounding maintenance burden that gets expensive.

What we built instead

The SmartRecruiters Jobs Scraper hits that same public postings API directly and absorbs the parts that make multi-employer batches fragile: curl-cffi browser-fingerprint impersonation so requests present a real handshake, retries with exponential backoff on 429/5xx, and normalized output regardless of how many company IDs you pass in one run. Server-side filters (searchQuery, department, city, country) narrow the result set before you're billed for a row.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/smartrecruiters-jobs-scraper").call(run_input={
    "companyIds": ["Visa"],
    "maxResultsPerCompany": 25,
    "includeDescription": True,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["location_city"], item["department"])

Or drop the same JSON straight into the input form on the Store page — no code needed.

Sample output row:

{
  "posting_id": "743999...abc",
  "title": "Sr. Manager, Data Platform",
  "company_id": "Visa",
  "company_name": "Visa",
  "location_city": "Austin",
  "location_region": "Texas",
  "location_country": "US",
  "is_remote": false,
  "department": "Technology",
  "employment_type": "Full-time",
  "apply_url": "https://jobs.smartrecruiters.com/Visa/743999...abc",
  "description_html": "Visa is looking for a Sr. Manager...",
  "scraped_at": "2026-07-21T14:00:00Z"
}

The technically interesting bit

SmartRecruiters' companyId isn't always the display name you'd guess — it's a slug pulled straight from the careers URL, and it's case-sensitive with its own quirks per tenant. Most "generic" scrapers treat this as a solved problem and just string-match against a display name, which silently returns zero results for a lot of real companies. We take the company ID as-is from the URL structure the site itself uses, so jobs.smartrecruiters.com/Visa maps directly to companyIds: ["Visa"] without any guessing layer that can quietly fail.

Pricing — no fine print

Pay-Per-Event: $0.005 per run plus $0.0015 per job posting written to the dataset. A 1,000-posting pull costs $1.51. No matching postings, no charge beyond the small warm-up fee. New Apify accounts get $5 of free credit, no card required.

Honest limitations

We only reach the public, unauthenticated postings boards — not internal employee-only views. includeDescription adds one detail request per posting, so leave it off if you just need metadata fast. And you supply the SmartRecruiters companyId, not a freeform company name — a display name like "Bosch" can differ from its actual slug, so pull the ID straight from the careers URL.

Try it, and the rest of the fleet

SmartRecruiters is one platform in a bigger landscape. We also run the Workday Jobs Scraper, the Workable Jobs Scraper, and the Multi-ATS Jobs Scraper for Greenhouse, Lever, and Ashby in one normalized schema — same keyless, pay-per-result approach across the board.

Grab the SmartRecruiters Jobs Scraper on the Apify Store and try it with the free $5 credit. Found a company that behaves differently, or need a field we don't expose? Open an issue on the Actor's Issues tab — we ship fixes weekly. Browse the rest of the fleet at apify.com/DevilScrapes.

Which employer's SmartRecruiters board would you pull first?

Workday Jobs Scraper: How to Pull Every Posting From Any Workday Career Site

Devil Scrapes — Sat, 25 Jul 2026 09:34:25 +0000

Workday Jobs Scraper: How to Pull Every Posting From Any Workday Career Site

If you've ever tried to build a recruiting pipeline, a labor-market dataset, or a "who's hiring" competitive-intel feed, you've hit Workday. It powers the career sites of a huge share of the Fortune 500 — NVIDIA, Salesforce, Adobe, half of enterprise healthcare and finance. Every one of those *.myworkdayjobs.com boards looks different on the surface, but underneath they all run on the same internal JSON API. That's the detail this post is about, and it's also the reason we built the Workday Jobs Scraper.

Why Workday job data is worth pulling

Workday postings are structured, current, and — unlike a lot of enterprise HR data — genuinely public. No login wall, no gated candidate portal. Every req has a title, location, requisition ID, and (usually) a full description sitting behind the site's own search box. That's a clean signal for:

Recruiters and sourcers mapping which enterprise employers have open reqs in a given function
Job-board aggregators who need Workday coverage next to Greenhouse, Lever, and Ashby
Competitive-intel teams watching a rival's hiring velocity as a proxy for headcount growth
Labor-market researchers sampling demand by role, location, or industry at scale

The naive approach, and why it gets annoying fast

Open devtools on any Workday career site, filter to XHR, and you'll spot a call to /wday/cxs/{tenant}/{site}/jobs. Looks trivial — just a POST with a search payload. The catch is everything downstream of that first successful call. Workday sits behind Cloudflare, so a bare requests.post() from a script gets a fingerprint mismatch and a block long before you've paginated through a real board. You also need to work out the tenant/data-center/site triple from the URL structure ({tenant}.{dc}.myworkdayjobs.com/{site}), handle pagination correctly, and — if you want full descriptions — fire a second request per posting without tripping rate limits. None of that is exotic, but multiply it by dozens of tenants and it's a maintenance job, not a script.

What we built instead

The Workday Jobs Scraper talks to that same cxs JSON endpoint directly, but does the parts that make it fragile for you: we present a real Chrome TLS/H2 handshake via curl-cffi browser impersonation, retry 408/429/5xx responses with exponential backoff (Retry-After honoured), and rotate proxy sessions through Apify Proxy when a run is pulling many tenants back to back. You give it career-site URLs — or explicit {tenant, dc, site} objects if you already know them — and get back one normalized row per posting, regardless of which of the hundreds of Workday tenants you're pointed at.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/workday-jobs-scraper").call(run_input={
    "careerSites": ["nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite"],
    "maxResultsPerSite": 25,
    "includeDescription": True,
    "proxyConfiguration": {"useApifyProxy": True},
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], item["location"], item["job_req_id"])

Or paste the same input directly into the Store page and hit Start — no code required.

One output row looks like this:

{
  "job_id": "/job/US-CA-Santa-Clara/Senior-Factory-Support-Firmware-Engineer_JR1998421",
  "title": "Senior Factory Support Firmware Engineer",
  "location": "US, CA, Santa Clara",
  "posted_on_text": "Posted 5 Days Ago",
  "time_type": "Full time",
  "job_req_id": "JR1998421",
  "url": "https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Factory-Support-Firmware-Engineer_JR1998421",
  "company": "nvidia",
  "dc": "wd5",
  "site": "NVIDIAExternalCareerSite",
  "description_html": "NVIDIA is seeking a Senior Factory Support Firmware Engineer...",
  "scraped_at": "2026-07-21T11:40:00Z"
}

The technically interesting bit

Workday's public URL ({tenant}.{dc}.myworkdayjobs.com/{site}/job/...) and its internal detail payload aren't the same thing — the detail data actually lives behind the cxs path, keyed by an externalPath that only shows up once you've already paginated the listing endpoint. A lot of "generic" Workday scrapers on the market skip this and just re-request the public HTML page for descriptions, which is slower and more fragile. We fetch the detail payload the way the site's own JS does, which is also why includeDescription costs one extra request per posting rather than a full page render.

Pricing — no fine print

Pay-Per-Event: $0.005 per run (one-off warm-up) plus $0.0015 per job posting written to the dataset. A 1,000-posting pull costs $1.51. No data lands, no charge beyond the warm-up fee. Apify gives every new account $5 of free credit, no card required, which covers a couple thousand rows before you spend a cent.

Honest limitations

We only reach public career sites — not SSO-gated internal portals — and Workday doesn't expose a structured salary field, so we don't invent one. maxResultsPerSite caps volume but doesn't change price, because you're billed per row written, not per HTTP call. And like any scrape, this is a point-in-time snapshot — schedule recurring runs if you want to track how a board changes week over week.

Try it, and the rest of the fleet

Workday is one piece of the ATS landscape. If you need coverage beyond it, we also run the SmartRecruiters Jobs Scraper, the Workable Jobs Scraper, and the Multi-ATS Jobs Scraper covering Greenhouse, Lever, and Ashby in one normalized schema — all keyless, all pay-per-result.

Grab the Workday Jobs Scraper on the Apify Store and try it with the free $5 credit — no card needed. If you hit a tenant that behaves differently or need a field we don't expose yet, open an issue on the Actor's Issues tab; we read every one. Browse the rest of the fleet at apify.com/DevilScrapes.

What Workday tenant would you point this at first?

How to Scrape Mobile.bg Car Listings (Bulgaria) to JSON/CSV

Devil Scrapes — Sat, 13 Jun 2026 10:33:11 +0000

Quick answer: Use the Mobile.bg Bulgaria Car Scraper on Apify. Paste a filtered Mobile.bg search URL (or leave it empty for the default cars-and-jeeps feed), set maxResults, and the Actor returns structured rows — price in EUR and BGN, make, model, year, mileage, fuel, gearbox, engine power, body type, colour, and seller — as a JSON or CSV dataset. No API key needed.

Why scrape Mobile.bg?

Mobile.bg is Bulgaria's largest used-car marketplace and the primary venue for Bulgarian used-car transactions. It carries tens of thousands of listings from private sellers and dealers, priced in both EUR and BGN (Bulgarian lev). Bulgaria is adopting the euro, so the site exposes dual-currency pricing natively — the price field is in EUR and price_bgn carries the lev equivalent.

Mobile.bg publishes no public API. The site is served in Windows-1251 encoding — the Cyrillic character set that predates UTF-8 adoption in Eastern Europe. Getting clean, correctly decoded data out of it requires handling that encoding correctly, matching the browser's TLS fingerprint so the site does not block automated requests, and mapping Bulgarian-language spec labels to standard field names. The Actor handles all of that.

What fields does the scraper return? 🔥

Every row maps to the ResultRow Pydantic model in the Actor source. Here is a realistic sample:

{
  "listing_id": "11772628462288076",
  "listing_url": "https://www.mobile.bg/obiava-11772628462288076-audi-a3-s-line-s3",
  "title": "Audi A3 S-line S3",
  "make": "Audi",
  "model": "A3",
  "year": 2010,
  "price": 7600,
  "currency": "EUR",
  "price_bgn": 14864,
  "mileage_km": 255000,
  "fuel_type": "Дизелов",
  "transmission": "Ръчна",
  "engine_power_hp": 140,
  "engine_size_cc": 2000,
  "body_type": "Хечбек",
  "color": "Бял",
  "first_registration": "октомври 2010",
  "location": "гр. Варна",
  "region": null,
  "seller_type": "private",
  "seller_name": null,
  "photo_urls": [
    "//mobistatic1.focus.bg/mobile/photosorg/076/1/11772628462288076_t4.webp"
  ],
  "description": "Audi A3 2.0 TDI ...",
  "posted_date": "НОВА ОБЯВА",
  "scraped_at": "2026-06-02T10:00:00+00:00"
}

The complete field set is: listing_id, listing_url, title, make, model, year, price, currency (always EUR), price_bgn, mileage_km, fuel_type, transmission, engine_power_hp, engine_size_cc, body_type, color, first_registration, location, region, seller_type, seller_name, photo_urls, description, posted_date, and scraped_at. Fields tagged enrichment-only (location, region, seller_type, seller_name) are populated when enrichDetails is true.

A note on Cyrillic values: fuel type, gearbox, body type, and colour are returned in their original Bulgarian text (e.g. Дизелов, Автоматична, Хечбек). The column names are standard English, so your pipeline can process them without knowing Bulgarian — just map the Cyrillic values if you need English labels in your output.

What does it cost to scrape Mobile.bg listings? 💰

Pricing is Pay-Per-Event.

Event	Price
`actor-start` (one-off per run)	$0.05
`result-row` (per car listing)	$0.002

1,000 results in a single run costs $2.05 ($0.05 start + 1,000 × $0.002). Every new Apify account gets $5 of free credit — enough for two full runs of 1,000 listings before you spend anything. No credit card required to start.

How does the anti-blocking work?

Mobile.bg is a Windows-1251 encoded site with anti-bot protections on its listing pages. We handle the blocks:

We rotate Chrome, Firefox, and Safari TLS fingerprints using curl-cffi browser impersonation — the site sees a real-browser handshake, not a Python HTTP client.
We decode the Windows-1251 Cyrillic encoding correctly on every response before parsing, so you never get garbled text in fuel_type, body_type, or description.
We rotate residential proxy sessions through Apify Proxy on every block — a fresh session ID and a fresh Bulgarian exit IP, automatically.
We retry on 408, 429, and 5xx responses with exponential backoff (starting at two seconds, capping at thirty, up to five attempts per page), honouring Retry-After headers.
Partial runs surface with a clear status message; we never return empty data with a green status.

How to run the scraper from Python

Install apify-client and call the Actor:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

run_input = {
    "searchUrl": "https://www.mobile.bg/obiavi/avtomobili-dzhipove/bmw",
    "maxResults": 150,
    "enrichDetails": True,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

run = client.actor("DevilScrapes/mobile-bg-bulgaria-cars").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(
        item["make"], item["model"], item["year"],
        item["price"], "EUR", "/", item["price_bgn"], "BGN",
        item["mileage_km"], "km"
    )

To target a specific make or model, apply filters on mobile.bg and paste the resulting URL into searchUrl. Leave searchUrl empty to scrape the default cars-and-jeeps feed. Turn enrichDetails off to halve the request count — you still get make, model, year, price (both EUR and BGN), mileage, fuel, gearbox, engine power and size, body type, colour, and photos from the listing cards.

What are the main use cases?

Cross-border arbitrage: Bulgaria vs Western Europe 💡

Bulgaria's used-car prices are among the lowest in the EU. A dataset of Mobile.bg listings filtered by make, year, and mileage_km, compared against equivalent data from coches.net, leboncoin, or Marktplaats, reveals the price delta for specific models. Importers use this to source vehicles for resale in higher-price markets.

EUR/BGN dual-currency price tracking

Bulgaria is in the process of euro adoption. Mobile.bg already shows both prices. The price (EUR) and price_bgn (BGN) fields let you track market pricing through the currency transition — useful for financial modelling and policy research.

Depreciation modelling: price vs mileage vs age

Pull a broad dataset for a specific model (e.g. all Volkswagen Passat listings), export to CSV, and fit a regression. The year, mileage_km, engine_power_hp, and transmission fields give you the standard independent variables. Bulgarian private-seller pricing tends to be less filtered by dealer margin, which produces more signal for depreciation studies.

Dealer intelligence

Filter seller_type to dealer (requires enrichDetails: true), segment by location, and you have a directory of active Bulgarian car dealers with their listed inventory. Track price changes across successive runs.

Fresh listings monitoring

The posted_date field carries Mobile.bg's own relative posting marker (e.g. НОВА ОБЯВА — "new listing"). Run the Actor daily and filter to new listings to catch fresh private-seller posts before they attract attention.

FAQ

Is it legal to scrape Mobile.bg?

The Actor accesses publicly visible listing data — the same data any browser user sees without logging in. Mobile.bg's terms of service govern automated access; review them for your jurisdiction and use case before running at scale. We recommend responsible pacing and legitimate analytical use.

The fuel type and body type values are in Bulgarian — can I translate them?

The column names (fuel_type, body_type, transmission, color) are standard English. The values are preserved in the original Bulgarian so nothing is lost. Common mappings: Дизелов → Diesel, Бензинов → Petrol, Електрически → Electric, Ръчна → Manual, Автоматична → Automatic, Хечбек → Hatchback, Седан → Sedan, Комби → Estate/Wagon.

What currency are prices in?

price is always in EUR and currency is always EUR. price_bgn carries the Bulgarian-lev equivalent shown on the site. If price_bgn is null, the listing did not show a lev amount.

What does enrichDetails add?

Setting enrichDetails: true fetches each listing's detail page for the full technical table, exact engine displacement, the seller's city (location), and seller_type. Without enrichment you still get make, model, year, price (EUR + BGN), mileage, fuel, gearbox, engine power and size, body type, colour, and photos from the listing card.

How much does a run of 2,000 listings cost?

$0.05 (start) + 2,000 × $0.002 = $4.05.

Start collecting Bulgarian car listing data

The Mobile.bg Bulgaria Car Scraper is live on the Apify Store. Click Try for free — $5 of credit included, no card required.

Your first dataset exports to JSON or CSV in minutes. If you hit an edge case or need a field added, open an issue on the Actor's Issues tab and we'll address it in the next weekly release.

Resources:

How to Scrape Marktplaats Car Listings (Netherlands) to JSON/CSV

Devil Scrapes — Sat, 13 Jun 2026 10:27:55 +0000

Quick answer: Use the Marktplaats Netherlands Car Scraper on Apify. Paste a filtered Marktplaats auto's search URL, set maxResults, and the Actor walks the result pages and delivers structured rows — price in EUR, make, model, year, mileage, fuel, transmission, body type, drivetrain, energy label, seller, and photos — into a JSON or CSV dataset. No API key, no local scraping stack.

Why scrape Marktplaats for car listings?

Marktplaats.nl is the Netherlands' dominant classifieds marketplace — the Dutch equivalent of Craigslist and eBay Classifieds rolled into one. The auto's category carries listings from both private sellers and professional dealers (handelaren), making it the primary source for Dutch used-car pricing data. Marktplaats publishes no public API for its listings.

The site has a few characteristics that make extraction non-trivial: listing data is served through Marktplaats' own internal search service (not a straightforward HTML scrape), the platform uses Dutch-language spec labels (bouwjaar, kilometerstand, Handgeschakeld, Voorwielaandrijving), and it employs residential proxy detection on its search endpoints. The Actor handles all of that, including Dutch-specific quirks like bid-based listings (Bieden) and the Dutch energy label system (A through G).

What fields does the scraper return? 💡

Every row maps directly to the ResultRow Pydantic model in the Actor's source. Here is a realistic sample row:

{
  "listing_id": "m2406042278",
  "listing_url": "https://www.marktplaats.nl/v/auto-s/skoda/m2406042278-skoda-octavia-1-6-comfort",
  "title": "Skoda Octavia 1.6 Comfort",
  "make": "Skoda",
  "model": "Octavia",
  "year": 2001,
  "price": 1150,
  "currency": "EUR",
  "price_type": "Te koop",
  "mileage_km": 315844,
  "fuel_type": "Benzine",
  "transmission": "Handgeschakeld",
  "body_type": "Hatchback",
  "drive_train": "Voorwielaandrijving",
  "energy_label": "B",
  "color": "Grijs",
  "location": "Lienden",
  "region": "Nederland",
  "seller_type": "dealer",
  "seller_name": "MBSmart",
  "seller_id": "56392587",
  "photo_urls": [
    "https://images.marktplaats.com/api/v1/hz-mp-pro-listing/images/4933080c-8de2-40e5-95e9-e9cc46631ef3?rule=ecg_mp_eps$_85.jpg"
  ],
  "description": "Algemene informatie Aantal deuren: 5 Kleur: Grijs ...",
  "posted_date": "Vandaag",
  "scraped_at": "2026-06-02T00:00:00+00:00"
}

The complete field set is: listing_id, listing_url, title, make, model, year, price, currency (always EUR), price_type, mileage_km, fuel_type, transmission, body_type, drive_train, energy_label, color, location, region, seller_type, seller_name, seller_id, photo_urls, description, posted_date, and scraped_at. The color and description fields are populated via detail-page enrichment.

Note drive_train — this is the exact field name in the ResultRow model (not drivetrain). Use it as-is when processing the dataset.

What does it cost to scrape Marktplaats listings?

Pricing is Pay-Per-Event.

Event	Price
`actor-start` (one-off per run)	$0.05
`result-row` (per car listing)	$0.002

1,000 results in a single run costs $2.05 ($0.05 start + 1,000 × $0.002). Every new Apify account gets $5 of free credit — enough to pull your first two thousand listings before spending anything. No credit card required to start.

How does the anti-blocking work? 🔥

Marktplaats.nl protects its search endpoints from automated scraping. Here is how we handle it:

We rotate Chrome, Firefox, and Safari TLS fingerprints on every request using curl-cffi browser impersonation. The platform sees a real-browser TLS handshake, not a Python HTTP client.
We rotate residential proxy sessions through Apify Proxy on any block — a new session ID and a new Dutch exit IP, automatically.
We retry on 408, 429, and 5xx responses with exponential backoff (starting at two seconds, capping at thirty, up to five attempts per page), honouring Retry-After headers.
Sponsored Admarkt ads (item IDs beginning with a) are detected and skipped — they contaminate price analysis datasets.
Partial successes surface with a clear status message. We never quietly return an empty dataset.

The default proxy configuration already requests Dutch RESIDENTIAL exits. Nothing to configure.

How to run the scraper from Python

Install apify-client and call the Actor:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

run_input = {
    "searchUrl": "https://www.marktplaats.nl/l/auto-s/volkswagen/golf/#Language:all-languages",
    "maxResults": 200,
    "enrichDetails": True,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

run = client.actor("DevilScrapes/marktplaats-netherlands-cars").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(
        item["make"], item["model"], item["year"],
        item["price"], item["mileage_km"], item["energy_label"]
    )

To target a specific search, apply your filters on marktplaats.nl, copy the URL, and paste it into searchUrl. The Actor translates it to the internal search query. Set enrichDetails to False if you only need the search-payload fields — price, make, model, year, mileage, fuel, transmission, body type, drivetrain, energy label, seller, and photos — without fetching each detail page.

What are the main use cases?

Used-car price analytics for the Dutch market

Marktplaats combines private sellers and dealers in a single feed, giving a complete picture of Dutch retail and C2C pricing. Pull listings for a model range, export to CSV, and build price-vs-age and price-vs-mileage charts. The energy_label column adds a sustainability dimension you will not find on most car data APIs.

Energy label composition research

The Netherlands has aggressive emission regulations and a strong push toward electrification. The energy_label field (A through G) lets you aggregate the label distribution across any search filter — useful for fleet compliance reporting, real-estate proximity analysis, or EV market research.

Dealer inventory monitoring

Filter seller_type to dealer, run on a cron schedule, and diff by listing_id to track new stock and price changes. The seller_id field lets you group all listings from the same dealership even when the seller_name display name changes.

Bid-based listing research

Listings with price_type: "Bieden" are minimum-bid auctions. Filter to Bieden listings and you have a view of the distressed-sale segment of the Dutch market — a common source of under-market deals.

Cross-border sourcing

Dutch used-car prices are competitive against some neighbouring markets, especially on specific segments. A Marktplaats dataset filtered by fuel_type, mileage_km, and year gives European importers a current Dutch benchmark.

FAQ

Is it legal to scrape Marktplaats.nl?

The Actor accesses publicly visible listing data — the same data any browser user sees without logging in. Marktplaats's terms of service govern automated access; review them for your use case before running at scale. We recommend responsible pacing and legitimate analytical use.

What is price_type and why is it sometimes Bieden?

Marktplaats supports fixed-price listings (Te koop), bid-based listings (Bieden), and reserved listings (Gereserveerd). The price field is still populated for Bieden listings (it shows the minimum bid), and price_type tells you which mode applies.

Why are color and description sometimes null?

Both come from the listing detail page, not the search payload. Set enrichDetails: true (the default) to fill them in. Turn it off to halve the request count — price, specs, seller, and photos all come from the search results.

How much does a run of 5,000 listings cost?

$0.05 (start) + 5,000 × $0.002 = $10.05.

Are sponsored Admarkt ads included?

No. Admarkt ads (item IDs starting with a) are detected and skipped automatically. They distort price comparisons and typically point to dealer landing pages rather than individual listings.

Start collecting Dutch car listing data

The Marktplaats Netherlands Car Scraper is live on the Apify Store. Click Try for free — $5 of credit included, no card required.

Your first dataset exports to JSON or CSV in minutes. Questions or edge cases? Open an issue on the Actor's Issues tab and we'll address it in the next weekly release.

Resources:

How to Scrape leboncoin Car Listings (France) to JSON/CSV

Devil Scrapes — Sat, 13 Jun 2026 10:22:38 +0000

Quick answer: Use the leboncoin France Car Scraper on Apify. Paste a filtered leboncoin.fr Voitures search URL, set maxResults, and the Actor walks the result pages and returns structured rows — price in EUR, brand, model, year, mileage, fuel, gearbox, Crit'Air sticker, seller, location, and photos — as a JSON or CSV dataset. No API key, no local scraping stack to configure.

Why scrape leboncoin for car listings?

leboncoin.fr is France's largest classifieds marketplace and the default venue for millions of French used-car transactions each year. The Voitures section carries hundreds of thousands of live listings from private sellers and professional dealers alike. There is no public API — structured data access means extracting it from the search pages.

That extraction is not straightforward. leboncoin is fronted by bot protection. Datacenter IP ranges get challenged or blocked. The listing data is embedded in JSON inside server-rendered HTML, paginated at 35 results per page. The French automotive classification has its own vocabulary: Crit'Air air-quality stickers (a regulatory requirement for driving in low-emission zones), horsepower measured in chevaux, fuel types like Essence and Hybride. The Actor handles all of it and delivers clean, typed rows.

What fields does the scraper return?

Every row maps exactly to the ResultRow Pydantic class in the Actor source. Here is a realistic sample row:

{
  "listing_id": "3117822207",
  "listing_url": "https://www.leboncoin.fr/ad/voitures/3117822207",
  "title": "Morris Mini COOPER 1275 S",
  "make": "MORRIS",
  "model": "Autre",
  "year": 1968,
  "price": 34900,
  "currency": "EUR",
  "mileage_km": 101890,
  "fuel_type": "Essence",
  "transmission": "Manuelle",
  "engine_power_hp": null,
  "engine_size_cc": null,
  "body_type": "Citadine",
  "color": "Vert",
  "first_registration": "07/1968",
  "location": "Saint-Jean-du-Cardonnay",
  "region": "Seine-Maritime",
  "postcode": "76150",
  "seller_type": "dealer",
  "seller_name": "MECA SPORT",
  "critair": "Non classé",
  "photo_urls": [
    "https://img.leboncoin.fr/api/v1/lbcpb1/images/12/99/eb/1299eb9f07aa41b5f0430ce684d22b4e689dc47c.jpg?rule=ad-image"
  ],
  "description": "Magnifique Morris Mini Cooper 1275 S restaurée.",
  "posted_date": "2025-12-26 14:04:00",
  "scraped_at": "2026-06-02T00:00:00+00:00"
}

The full field set is: listing_id, listing_url, title, make, model, year, price, currency (always EUR), mileage_km, fuel_type, transmission, engine_power_hp, engine_size_cc, body_type, color, first_registration, location, region, postcode, seller_type, seller_name, critair, photo_urls, description, posted_date, and scraped_at. The description field comes from the search payload; when enrichDetails is on, it is re-fetched from the detail page for the freshest copy.

What does it cost to scrape leboncoin listings? 💰

Pricing is Pay-Per-Event — you pay only when rows land in the dataset.

Event	Price
`actor-start` (one-off per run)	$0.05
`result-row` (per car listing)	$0.002

1,000 results in a single run costs $2.05 ($0.05 start + 1,000 × $0.002). Every new Apify account receives $5 of free credit — enough to pull your first couple of thousand listings before you spend anything. No credit card required to begin.

How does the anti-blocking work?

leboncoin.fr is fronted by bot protection. Residential proxy sessions are not optional — they are required for reliable access. Here is what we do so you do not have to:

We rotate Chrome, Firefox, and Safari TLS fingerprints on every request using curl-cffi browser impersonation. The site's protection stack sees a real browser handshake, not a Python HTTP client.
We rotate residential proxy sessions through Apify Proxy on every block — a fresh session ID and a fresh French exit IP, automatically.
We retry on 408, 429, and 5xx responses with exponential backoff (starting at two seconds, capped at thirty, up to five attempts per page), honouring Retry-After headers.
When the site pushes back, we slow down rather than triggering a harder block. Partial runs surface with a clear status message; we never return empty data with a green status.

The default proxy config already requests RESIDENTIAL exits. You do not need to configure anything.

How to run the scraper from Python 🔥

Install the apify-client library and call the Actor:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

run_input = {
    "searchUrl": "https://www.leboncoin.fr/recherche?category=2&brand=Peugeot&vehicle_type=4",
    "maxResults": 300,
    "enrichDetails": True,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

run = client.actor("DevilScrapes/leboncoin-france-cars").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["make"], item["model"], item["price"], item["mileage_km"], item["critair"])

To target a specific search, apply your filters on leboncoin.fr, copy the URL, and paste it into searchUrl. The Actor walks result pages from there. Set enrichDetails to False if you only need the search-payload fields — price, brand, model, year, mileage, fuel, gearbox, colour, seller, location, and photos — and want to halve the request count.

What are the main use cases?

Used-car price analytics for the French market

Pull listings for a specific make and model, export to CSV, and load into your analysis tool of choice. leboncoin's depth in the private-seller segment gives a view of French retail pricing that dealer-only datasets miss.

Crit'Air label composition research

France's Crit'Air system classifies vehicles from label 0 (fully electric) through to Non classé for the oldest polluters. Low-emission zones in Paris, Lyon, and other major cities restrict entry by Crit'Air label. The critair field lets you aggregate the sticker distribution across any filtered search — useful for policy research, fleet management planning, or market forecasting.

Dealer inventory monitoring

Filter seller_type to dealer, set a daily schedule, and diff successive datasets by listing_id to catch new arrivals and price cuts. seller_name, location, and postcode let you segment by dealer and geography.

Cross-border arbitrage from France

French private-seller prices, especially on diesel estates and older luxury cars, can diverge meaningfully from equivalent listings in neighbouring markets. A dataset filtered by fuel_type, year, and mileage_km gives importers a current benchmark.

Lead generation: French car dealers

Dealer listings carry seller_name and the dealer's location and postcode. Run a broad sweep across the Voitures section and you have a structured directory of active French car dealers, filterable by region.

FAQ

Is it legal to scrape leboncoin.fr?

The Actor accesses publicly visible listing data — the same data any browser user sees without logging in. leboncoin's terms of service govern automated access; review them for your jurisdiction and use case before running at scale. We recommend responsible pacing and legitimate analytical use.

Why does this Actor require a residential proxy?

leboncoin.fr's bot protection challenges or blocks datacenter IP ranges. Residential exit IPs from Apify Proxy ride past this reliably. The default proxy configuration already requests RESIDENTIAL — you do not need to change anything.

What is the Crit'Air field?

The Crit'Air (Certificat Qualité de l'Air) is a French government air-quality sticker required for driving in designated low-emission zones (Zones à Faibles Émissions). It is published on listings where the seller has declared it. Values include 0, 1, 2, 3, 4, 5, and Non classé. Not all listings include it.

How much does 5,000 listings cost?

$0.05 (start) + 5,000 × $0.002 = $10.05 for one run of 5,000 results.

How often can I run it?

As often as you need. Schedule in the Apify Console with a cron expression. The Apify API lets you access and diff successive datasets programmatically.

Start collecting French car listing data

The leboncoin France Car Scraper is live on the Apify Store. Click Try for free — $5 of credit included, no card required.

Your first dataset is minutes away. If you hit an edge case or need an additional field, open an issue on the Actor's Issues tab and we'll address it in the next weekly release.

Resources: