DEV Community

Shoogar
Shoogar

Posted on

I automated my job search pipeline — here's what the architecture looks like

I automated my job search pipeline — here's what the architecture looks like

A few months ago I was spending about an hour every morning doing the same thing: opening Glassdoor, LinkedIn, Indeed, a couple of niche boards, filtering by role and location, skimming postings to see if they were actually relevant, then closing 80% of them because they weren't. That's before writing a single word of a cover letter.

I'm a developer. I automate repetitive tasks. So I started pulling on that thread.

The core problem is signal-to-noise

The job board aggregation problem looks simple from the outside: fetch postings, deduplicate, display. But the part that actually wastes your time isn't the volume — it's that every posting looks plausible until you read it. A "senior fullstack developer" role might require 10 years of Salesforce experience buried in paragraph four. You only find out after reading it.

The fix I landed on was using the job description text itself as an input to an AI relevance score before surfacing the result to the user. You upload your resume once. Every posting gets scored against it before you see it. Jobs scoring below a threshold get filtered out entirely.

The scoring prompt matters a lot here. You can't just ask "is this relevant?" and get useful output. I structure it as: role alignment (does the job title and core responsibilities match the candidate's experience), skills gap (are there hard requirements the resume clearly doesn't meet), seniority fit, and location/remote compatibility. Each dimension contributes to a composite score, and the model returns a short explanation alongside the number so you can sanity-check its reasoning.

That alone cuts the time spent reading irrelevant postings by a significant margin.

Multi-source scraping without getting rate-limited

The second piece is source aggregation. Different job boards serve different audiences: Adzuna works well for broad discovery, Job Bank Canada for government and public sector, We Work Remotely and Himalayas for remote engineering roles. Running searches across all of them on demand would be too slow, so the approach is:

  • Detect country from IP and profile settings at search time
  • Select the relevant source subset automatically
  • Fan out requests with per-domain rate limiting and caching
  • Normalize the response format before AI scoring

The normalization step is tedious but necessary. Every board returns slightly different field names, date formats, and description structures. A unified schema (title, company, location, posted_at, description, source_url) makes downstream processing consistent.

One thing I got wrong initially: I tried to scrape employer career pages directly via a generic crawler. That's a maintenance nightmare. Every site has different markup, anti-bot measures, and update cadence. I pulled back from that for most sources and kept it selective.

The resume generation problem

This is where I spent most of my time, and honestly where the interesting technical work is.

The problem with a generic resume isn't that it's bad. ATS systems do keyword matching before a human ever reads the document. A strong candidate applying with a resume that uses "built" where the job description says "developed" might score lower than a weaker candidate who happened to use the right synonyms. That's the system working as designed, which is frustrating but real.

The approach I use is full job description ingestion: not just the title, but the complete text including requirements, responsibilities, and preferred qualifications. I extract keyword patterns from it (hard skills, tools, methodologies, verb patterns) and use that as a constraint layer when generating the resume and cover letter. The model isn't rewriting your experience; it's selecting which aspects of your real background to emphasize based on what this specific role asks for.

The implementation is roughly: parse JD → extract weighted keywords → retrieve candidate experience bullets from profile → rank bullets by keyword overlap → regenerate summary and skills sections with mirrored language → generate cover letter grounded in the same keyword set.

You can validate this works by pasting the output back into a resume scoring tool and checking the keyword match rate. It's not gaming the system; it's speaking the system's language.

What I actually built

This became scourr. It handles the multi-board discovery, AI relevance scoring, and ATS-optimized generation end-to-end. Free tier gives you 3 application generations a day with no credit card. Paid tiers are pay-as-you-go credits that don't expire.

It's not going to replace every part of your job search. Networking still matters. Referrals still get you further. But the mechanical part (scanning boards, reading irrelevant postings, spending 45 minutes tailoring a resume for a job you're not sure about) can be automated.

A few things I'd do differently

If I were building this again, I'd spend more time on the relevance scoring calibration earlier. The first version had too coarse a threshold and filtered out some genuinely good matches. The model also occasionally hallucinates seniority mismatches when job descriptions are poorly written, which produces confusing explanations. I added a "why this was flagged" display so users can override the score when the reasoning is clearly wrong.

The cover letter generation is still the weakest part. It's good enough to use, but cover letters that sound generated are easy to spot. I've been iterating on prompts that produce more specific, less formulaic output, but it's harder than the resume problem.

The other thing I underestimated was how much the source selection matters by region. The "just use LinkedIn" approach misses a huge portion of the market outside the US, particularly for government, non-profit, and regional employers. Building the source routing properly was worth the extra complexity.

If you've built anything similar or have opinions on the ATS parsing side, I'd be interested to hear it.

Top comments (0)