If you want remote job data, you do not need to scrape HTML or sign up for anything. Four of the bigger remote job boards publish keyless public feeds. The catch is that they all speak different dialects, so the real work is normalization. Here are the endpoints and the traps.
The four feeds
RemoteOK returns its whole current board as one JSON array:
GET https://remoteok.com/api
The first element is a legal notice, not a job: they ask for a link back with attribution as a condition of using the feed. Skip element zero, and honor the attribution if you republish. Jobs carry salary_min and salary_max as numbers, tags, and ISO dates.
Remotive has the friendliest API of the four, including server side search:
GET https://remotive.com/api/remote-jobs?search=python&limit=100
Salary here is free text ("$120k - $160k"), so do not expect numbers. Attribution with a link back is required here too.
WeWorkRemotely publishes RSS:
GET https://weworkremotely.com/remote-jobs.rss
Two quirks: the company name is not a field, it is baked into the title as Company: Role, so split on the first colon. And useful data hides in nonstandard tags like <region>, <skills>, and <category> that generic RSS parsers drop on the floor.
Himalayas has a proper paginated API with a surprisingly deep catalog (100k+ listings):
GET https://himalayas.app/jobs/api?limit=100&offset=0
It gives structured minSalary/maxSalary with a currency and period, seniority arrays, location restrictions, and even timezone restrictions as UTC offsets. Dates are epoch seconds, not ISO strings.
The normalization layer
The row schema that survived contact with all four sources:
{
"source": "Remotive",
"title": "Senior Backend Engineer",
"company": "Acme Corp",
"tags": ["python", "aws"],
"salaryMin": null,
"salaryMax": null,
"salaryText": "$120k - $160k",
"location": "Worldwide",
"postedAt": "2026-07-03T20:01:13.000Z",
"applyUrl": "https://..."
}
Rules that mattered in practice:
- Keep both salary shapes. Boards with numbers fill
salaryMin/salaryMax; boards with prose fillsalaryText. Collapsing one into the other loses information either way. - Normalize every date to ISO 8601 at the edge. Epoch seconds, RFC 822 RSS dates, and ISO strings all flow through one converter, so downstream code never branches on source.
- Dedupe on lowercased
title|company. Companies cross post to multiple boards, and the same listing showing up four times makes the feed look broken. - Carry
sourceandsourceUrlon every row. It satisfies the attribution requirements and it turns out buyers of job data want to know provenance anyway.
What this is good for
The obvious build is a job alert pipeline: run it hourly with keywords, diff against what you have seen, push new rows to Slack. The less obvious one is sales intelligence: a company hiring for a role is telling you what they are about to spend money on, and job feeds are the earliest public signal of that.
I packaged the whole thing (four fetchers, normalization, dedupe, keyword and freshness filters) into an actor on Apify if you want it as a scheduled feed. But every endpoint above works with nothing more than fetch, and the boards deserve the link backs their terms ask for.
Top comments (0)