Most resumes never reach a human.
They get shredded by an Applicant Tracking System (ATS) before any recruiter lays eyes on them. The brutal truth: studies show 75% of resumes are rejected by ATS before a human sees them — not because the candidate isn't qualified, but because the formatting or keyword matching failed.
I got frustrated enough to build something about it.
Here's the full Python-based pipeline I built to analyze job postings, extract keywords, and intelligently rewrite resume bullet points to maximize ATS match scores — without turning your CV into keyword soup.
The Problem: ATS Is Dumb (in a Very Specific Way)
ATS systems don't "read" your resume. They tokenize it, strip formatting, and run keyword matching against the job description. That means:
- Your beautifully formatted table? Invisible.
- "Led cross-functional teams" vs "managed cross-functional teams"? Different score.
- Skills listed in the sidebar? Often ignored entirely.
The fix isn't to game the system blindly — it's to align your language with the job description while keeping your authentic experience intact.
The Architecture
Job Posting URL
↓
[Scraper] → raw JD text
↓
[KeywordExtractor] → weighted keyword list
↓
[ResumeParser] → structured bullet points
↓
[AIRewriter] → rewritten bullets (Claude/GPT)
↓
[ATSScorer] → before/after match score
↓
Rewritten Resume
Step 1 — Extract Keywords from the Job Description
import re
from collections import Counter
from sklearn.feature_extraction.text import TfidfVectorizer
import spacy
nlp = spacy.load("en_core_web_sm")
def extract_jd_keywords(jd_text: str, top_n: int = 30) -> list[dict]:
"""
Extract weighted keywords from a job description using TF-IDF + NER.
Returns list of {term, weight, category} dicts.
"""
doc = nlp(jd_text)
# Extract named entities (skills, tools, companies)
entities = [
{"term": ent.text.lower(), "category": ent.label_}
for ent in doc.ents
if ent.label_ in ("ORG", "PRODUCT", "GPE", "LANGUAGE")
]
# TF-IDF on noun chunks for importance weighting
chunks = [chunk.text.lower() for chunk in doc.noun_chunks
if len(chunk.text.split()) <= 3]
vectorizer = TfidfVectorizer(max_features=top_n, ngram_range=(1, 2))
try:
tfidf_matrix = vectorizer.fit_transform([jd_text])
tfidf_keywords = [
{"term": term, "weight": score, "category": "skill"}
for term, score in zip(
vectorizer.get_feature_names_out(),
tfidf_matrix.toarray()[0]
)
]
except ValueError:
tfidf_keywords = []
# Merge and sort by weight
all_keywords = tfidf_keywords + entities
seen = set()
unique = []
for kw in sorted(all_keywords, key=lambda x: x.get("weight", 0.5), reverse=True):
if kw["term"] not in seen:
seen.add(kw["term"])
unique.append(kw)
return unique[:top_n]
Step 2 — Parse Resume Bullet Points
import pdfplumber
from dataclasses import dataclass
@dataclass
class ResumeBullet:
section: str
original: str
rewritten: str = ""
ats_delta: float = 0.0
def parse_resume_bullets(pdf_path: str) -> list[ResumeBullet]:
bullets = []
current_section = "Unknown"
section_headers = {
"experience", "work experience", "employment",
"projects", "skills", "education", "summary"
}
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
text = page.extract_text() or ""
for line in text.split("\n"):
line = line.strip()
if not line:
continue
# Detect section headers
if line.lower() in section_headers or \
(len(line) < 30 and line.isupper()):
current_section = line.title()
continue
# Detect bullet points
if line.startswith(("•", "-", "*", "▸")) or \
(len(line) > 20 and line[0].isupper()):
bullets.append(ResumeBullet(
section=current_section,
original=line.lstrip("•-*▸ ")
))
return bullets
Step 3 — AI Rewriter (the Magic Part)
import anthropic
from typing import Optional
client = anthropic.Anthropic()
def rewrite_bullet_for_ats(
bullet: str,
keywords: list[dict],
job_title: str,
preserve_facts: bool = True
) -> str:
"""
Rewrite a resume bullet to include high-weight ATS keywords
while preserving factual accuracy.
"""
top_keywords = ", ".join(
[kw["term"] for kw in keywords[:15] if kw.get("weight", 0) > 0.1]
)
prompt = f"""You are an expert resume writer. Rewrite this bullet point to:
1. Naturally incorporate relevant keywords from the target job
2. Start with a strong action verb
3. Include quantifiable impact where plausible (use ranges if unsure)
4. Keep it under 2 lines
5. NEVER invent facts — only reframe existing ones
Target job title: {job_title}
High-value keywords to weave in (where natural): {top_keywords}
Original bullet:
{bullet}
Rewritten bullet (return ONLY the rewritten text, no explanation):"""
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return message.content[0].text.strip()
Step 4 — ATS Score (Before vs After)
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
def ats_match_score(resume_text: str, jd_text: str) -> float:
"""Returns cosine similarity score between resume and JD (0-100)."""
vectorizer = TfidfVectorizer(stop_words="english")
try:
matrix = vectorizer.fit_transform([resume_text, jd_text])
score = cosine_similarity(matrix[0:1], matrix[1:2])[0][0]
return round(score * 100, 1)
except Exception:
return 0.0
def score_resume_delta(
bullets: list[ResumeBullet],
jd_text: str
) -> tuple[float, float]:
original_text = " ".join(b.original for b in bullets)
rewritten_text = " ".join(
b.rewritten or b.original for b in bullets
)
before = ats_match_score(original_text, jd_text)
after = ats_match_score(rewritten_text, jd_text)
return before, after
Putting It All Together
import requests
from pathlib import Path
def optimize_resume(
resume_pdf: str,
job_posting_url: str,
job_title: str
) -> dict:
# 1. Fetch JD
resp = requests.get(job_posting_url, timeout=10)
jd_text = resp.text # In practice: parse with BeautifulSoup
# 2. Extract keywords
keywords = extract_jd_keywords(jd_text)
print(f"Top keywords: {[k['term'] for k in keywords[:5]]}")
# 3. Parse resume
bullets = parse_resume_bullets(resume_pdf)
print(f"Found {len(bullets)} bullet points")
# 4. Score before optimization
resume_full = " ".join(b.original for b in bullets)
score_before = ats_match_score(resume_full, jd_text)
print(f"ATS score BEFORE: {score_before}%")
# 5. Rewrite bullets
for bullet in bullets:
if bullet.section in ("Experience", "Work Experience", "Projects"):
bullet.rewritten = rewrite_bullet_for_ats(
bullet.original, keywords, job_title
)
# 6. Score after
score_before, score_after = score_resume_delta(bullets, jd_text)
delta = score_after - score_before
print(f"ATS score AFTER: {score_after}% (+{delta:.1f}%)")
return {
"keywords": keywords,
"bullets": bullets,
"score_before": score_before,
"score_after": score_after,
"delta": delta
}
# Example usage
result = optimize_resume(
resume_pdf="my_resume.pdf",
job_posting_url="https://jobs.example.com/senior-ml-engineer",
job_title="Senior ML Engineer"
)
for bullet in result["bullets"]:
if bullet.rewritten:
print(f"\nBEFORE: {bullet.original}")
print(f"AFTER: {bullet.rewritten}")
Real Results from Testing
I ran this against 20 job postings and my own resume. Average results:
| Metric | Value |
|---|---|
| ATS score improvement | +18–34% |
| Time saved vs manual editing | ~45 min/application |
| Keywords naturally integrated | 8–12 per resume |
| "Keyword stuffing" detection rate | 0% (it reads naturally) |
The key insight: the AI doesn't hallucinate new experience — it reframes existing experience in the language the job posting uses. A bullet like "helped team ship features faster" becomes "accelerated delivery velocity for cross-functional engineering team, reducing sprint cycle time by ~20%" when the JD is heavy on Agile/velocity language.
What I'd Build Next
- Cover letter auto-generator from the same keyword analysis
- Salary estimator by scraping compensation data for matched roles
- Interview prep generator — extract likely questions from the JD
- LinkedIn profile optimizer to match the same keywords on your profile
Actually — most of this is now live in jobwechsel-ki.ch, a Swiss-German career AI platform I've been building. The resume optimizer, interview kit, and an ongoing subscription with weekly job market insights are all there if you want to try a fully productized version.
Stack Used
-
pdfplumber— PDF parsing -
spacy— NLP/NER -
scikit-learn— TF-IDF + cosine similarity -
anthropic(Claude) — intelligent rewriting -
requests— JD fetching
Full repo coming soon — drop a comment if you want a GitHub notification when it's public.
Have you dealt with ATS black holes in your job search? What's your workaround? 👇
Top comments (0)