The Problem
Job hunting in 2026 is mostly busywork:
- Hours scrolling LinkedIn and job portals
- Customizing the same resume 30+ times (same content, different keywords)
- Losing track of applications
- Missing JD keywords that matter for ATS systems
- Cold outreach skipped because researching each company takes too long
I built a tool to automate all of it.
What It Does
Scrape jobs → Filter → Score (AI) → Tailor resume (AI) → Generate PDF → Export Excel → Open top URLs
One command runs the whole pipeline:
python -m src.main --keywords "React Developer" --location "India" --pages 2
How It Works
1. Scraping
Four scrapers (LinkedIn, Internshala, Naukri, Indeed) with a shared base class handling:
- Rotating user agents via
fake-useragent - Random 1.5-3.5s delays between requests
- Full job description fetching from detail pages
- Rate limiting to avoid getting blocked
LinkedIn uses their public guest API, Internshala parses HTML, Naukri tries JSON API then falls back to HTML.
2. Filtering
Drops jobs that don't meet your criteria:
- Salary below threshold (parses
₹ 4,50,000 - 6,00,000→ 6.0 LPA) - Internships (
--exclude-internshipsby default) - Tags product companies (Razorpay, Swiggy, CRED, etc.) vs service (TCS, Infosys)
3. Scoring with AI
Every job gets sent to Claude with your resume:
{
"match_score": 87,
"matched_keywords": ["React", "Redux", "WebSocket"],
"missing_keywords": ["AWS", "Kubernetes"],
"recommendation": "Strong Match",
"reason": "4+ years React experience matches the 3+ requirement..."
}
Two backends:
- Anthropic SDK (if you have an API key)
- Claude Code CLI via subprocess (if you have Claude Code installed — no API key needed)
4. Resume Tailoring
For top matches (≥70/100), Claude rewrites your resume:
- Reorders skills by JD relevance
- Rewrites bullets using EXACT JD vocabulary
- Injects
core_competenciesbadges for the modern PDF style - Validates structure — can't fabricate skills, can't change company names/dates
5. PDF Generation
Two styles:
- Classic: fpdf2, system fonts (Georgia/Times fallback). ATS-safe, plain, small file size.
- Modern: HTML + Playwright Chromium. Space Grotesk + DM Sans fonts, cyan-purple gradient header, keyword badges.
6. Excel Export
4-sheet workbook with openpyxl:
- All Jobs: sorted by score, color-coded (green ≥70, yellow 40-69, red <40), clickable URLs
- Top Matches: ≥70 only, product companies highlighted
- Cold Outreach: 20 curated product companies (Razorpay, CRED, Postman, BrowserStack, etc.) with LinkedIn links
- Summary: stats, salary distribution, skill gaps
7. One-Click Apply
--open-top 10 opens the top 10 job URLs in your browser. Zero friction to apply.
8. Scheduled Scans
--schedule daily sets up a Windows Task Scheduler entry or crontab line. Fresh postings get fewer applicants — first-mover advantage.
What Works, What Doesn't
Works reliably:
- LinkedIn (guest API is stable)
- Internshala (HTML scraping, 40 jobs/page)
- All AI-powered parts (with API key or Claude Code CLI)
Known limitations:
- Naukri often returns 400 (their API is inconsistent)
- Indeed returns 403 (Cloudflare anti-bot)
- Requests + BS4 can't render JS-heavy pages (would need Playwright scrapers — contributions welcome)
Try It
git clone https://github.com/parth-r-parmar/ai-job-pipeline
cd ai-job-pipeline
pip install -r requirements.txt
# Edit resume.json with your data
python -m src.main --keywords "React Developer" --location "India"
Results end up in output/.
Stack
Python 3.10+ · requests · BeautifulSoup · openpyxl · fpdf2 · Playwright · Anthropic SDK · Claude Code CLI
What's Next
- Playwright-based scrapers for Naukri/Indeed
- LinkedIn detail pages (currently only getting listing-level data)
- Email digest for scheduled runs
- Optional browser extension for one-click scraping
MIT licensed. Feedback, issues, and PRs welcome.
Top comments (0)