Job posts are the strongest B2B buying signal there is. Here's how we turned public Google search results into a hiring-intent lead finder — and the parsing traps that nearly sank it.
A company advertising a "Marketing Manager, London" is telling you three things at once: it has budget, it has a gap right now, and you know exactly what the gap is. That's the strongest cold-outreach trigger in B2B — and it's sitting in public, on job boards, for free.
So we built a small Apify actor that turns it into a lead list: give it roles + locations, get back one lead per hiring company with the role, the location, the job link, and a ready-to-paste opener. Here's how it works, and — more usefully — the three parsing traps that nearly made the output garbage.
The core trick: don't scrape the job boards. Search them.
Indeed, LinkedIn and Glassdoor all run serious anti-bot (Cloudflare, DataDome). Scraping them directly means residential proxies, headless browsers, and a constant cat-and-mouse you will eventually lose.
You don't have to play. Google has already crawled those postings. So instead of fetching indeed.com, you ask Google:
"Marketing Manager" "London" (site:indeed.com OR site:linkedin.com/jobs OR site:glassdoor.com)
Read the search-results HTML, parse the titles, done. No login, no cookie, no anti-bot wall on the boards themselves — nothing of yours to get blocked. We route the Google request through Apify's GOOGLE_SERP proxy (it's HTTP-only — you request http://www.google.com/search?... and the proxy does the TLS to Google) with got-scraping, and fall back to Bing on an empty result.
That part took an afternoon. Then we ran it for real, and the output was junk. Here's why — and the fixes.
Trap 1: site:indeed.com returns category pages, not jobs
The first live run for "Marketing Manager / Leeds" returned "companies" like Email Marketing Leeds and Performance Marketing Leeds Ls10. Those aren't businesses — they're Indeed's category/listing pages (indeed.com/q-email-marketing-l-leeds-jobs.html), which rank brilliantly for SEO and name no single employer.
The fix is to target the posting path, not the board root:
const BOARD_SITES = {
indeed: 'indeed.com/viewjob',
linkedin: 'linkedin.com/jobs/view',
glassdoor: 'glassdoor.com/job-listing',
};
site:linkedin.com/jobs/view "Marketing Manager" "London" returns individual postings whose titles read cleanly — "Marketing Manager - Spotify", "House of CB hiring Marketing Manager". Same query against the board root returns the listing-page noise. One-line change, completely different output quality.
*Trap 2: a Google login link that *looked like a job host
**
A accounts.google.com/ServiceLogin?...continue=...site:indeed.com... URL slipped through and became a "lead." The bug: we were checking whether the job-host string appeared anywhere in the URL — and the search query (with site:indeed.com in it) was echoed inside the continue= parameter.
Fix: match on the parsed host, not a substring of the whole URL.
function hostMatches(url, hosts) {
const u = new URL(url);
const host = u.hostname.toLowerCase();
const hostPath = (host + u.pathname).toLowerCase();
return hosts.some(h =>
h.includes('/') ? hostPath.includes(h) // linkedin.com/jobs/view
: host === h || host.endsWith(`.${h}`)); // indeed.com
}
Lesson that keeps recurring in scraping: parse the thing, don't substring-match the thing.
Trap 3: Google's near-matches
Searching for "Plumber" surfaced "Solar Installer" and "Cyber Security Architect" postings — Google helpfully returns loosely-related results, and our title parser dutifully extracted those roles as companies.
The fix is a relevance gate: keep a posting only if its title actually contains the role you searched for.
export function titleMatchesRole(title, role) {
const t = title.toLowerCase();
const tokens = role.toLowerCase().split(/[^a-z0-9]+/).filter(Boolean);
const sig = tokens.filter(w => w.length >= 4);
return (sig.length ? sig : tokens).some(w => t.includes(w));
}
This sharpened precision dramatically for named professional roles (marketing, sales, ops) — exactly the roles where "you're hiring for this, here's why you might not need to" is a killer opener.
The honest part
Even after all that, company-name extraction from arbitrary job-board titles isn't perfect — Indeed titles especially are inconsistent. So every result carries the jobUrl: one click verifies the company. We say so plainly in the docs rather than pretending the parse is flawless. LinkedIn and Glassdoor titles (Company hiring Role) extract cleanest; Indeed adds breadth.
Optional last step: flip on findEmails and, for each distinctively-named company, it finds a decision-maker from public LinkedIn results and enriches a verified work email via your own Prospeo key. We gate that to distinctive company names — running an email lookup on a vague extracted name ("Delivery & Digital") just matches a random person at the wrong company, and a confidently-wrong email is worse than none.
Try it
It's live on the Apify Store, pay-per-result: Hiring Intent Lead Finder. Point it at a role + city and you'll get a graded list of companies with a live buying signal.
It's one piece of a bigger thing we're building — SignalEngine, agentic outbound that discovers, enriches, and emails leads autonomously. The hiring finder is a taste of the discovery layer.
If you'd rather find which local businesses are leaking leads than who's hiring, we shipped a sibling actor for that too — Local Business Website Audit grades a homepage's lead-capture (contact form, click-to-call, chat, booking) and hands back the weak ones as a prospect list.
Building these in public — next up is pushing them toward Apify Rising Stars. The recurring lesson across all of them: reaching the data is easy; the entire game is in how honestly you parse it.
Top comments (0)