Ever wonder why you keep applying to jobs and hearing nothing back?
I built an AI system that reads job descriptions the way a hiring manager does — and the results were eye-opening. Here's the full technical breakdown.
The Problem
Job seekers spend 3-4 hours per week on average crafting applications. Most get zero response. The usual advice ("tailor your resume!") is correct but vague. Nobody tells you how to tailor it in a way that actually works.
I wanted to build something that would:
- Parse a job description semantically, not just keyword-match
- Identify the actual requirements vs. the wish-list padding
- Score a candidate's profile against those real requirements
- Generate targeted rewrite suggestions
Architecture Overview
Job Description (raw text)
↓
[spaCy NLP pipeline]
↓
Entity extraction (skills, years, tools, soft skills)
↓
[Embedding layer — sentence-transformers]
↓
Semantic skill graph
↓
[Scoring engine]
↓
Gap analysis + rewrite suggestions
Step 1: Parsing the Job Description
Most job descriptions are noisy. They're padded with corporate boilerplate, legal disclaimers, and aspirational fluff. The signal-to-noise ratio is terrible.
import spacy
from sentence_transformers import SentenceTransformer
import numpy as np
nlp = spacy.load("en_core_web_lg")
model = SentenceTransformer("all-MiniLM-L6-v2")
def parse_jd(text: str) -> dict:
doc = nlp(text)
requirements = []
for sent in doc.sents:
sent_lower = sent.text.lower()
if any(kw in sent_lower for kw in [
"required", "must have", "you will", "you should",
"experience with", "proficient", "strong knowledge"
]):
requirements.append(sent.text.strip())
embeddings = model.encode(requirements)
return {
"requirements": requirements,
"embeddings": embeddings,
"raw_tokens": [token.text for token in doc
if not token.is_stop and token.pos_ in ("NOUN", "PROPN")]
}
Step 2: Distinguishing Must-Have vs. Nice-to-Have
This is where it gets interesting. Most job descriptions mix hard requirements with aspirational padding. A model that treats them equally will penalize candidates for not having optional skills.
import re
MUST_PATTERNS = [
r"\brequired\b", r"\bmust have\b", r"\bessential\b",
r"\bminimum.*years\b", r"\bdemonstrated experience\b"
]
NICE_PATTERNS = [
r"\bpreferred\b", r"\bplus\b", r"\bbonus\b",
r"\bnice to have\b", r"\badvantage\b", r"\bideal\b"
]
def classify_requirement(sentence: str) -> str:
s = sentence.lower()
for pat in MUST_PATTERNS:
if re.search(pat, s):
return "must_have"
for pat in NICE_PATTERNS:
if re.search(pat, s):
return "nice_to_have"
return "implied"
Finding: ~60% of sentences in job descriptions are unlabeled "implied" requirements. Most candidate scoring tools miss these entirely.
Step 3: Candidate Profile Embedding
We embed the candidate's resume summary and skills list the same way:
from sklearn.metrics.pairwise import cosine_similarity
def embed_candidate(resume_text: str) -> np.ndarray:
sentences = [s.strip() for s in resume_text.split(".") if len(s.strip()) > 20]
return model.encode(sentences)
def score_candidate(jd_parsed: dict, candidate_embedding: np.ndarray) -> dict:
scores = {}
for i, req in enumerate(jd_parsed["requirements"]):
req_emb = jd_parsed["embeddings"][i].reshape(1, -1)
sims = cosine_similarity(req_emb, candidate_embedding)
scores[req] = float(sims.max())
return scores
Step 4: Gap Analysis
Now the fun part — instead of just outputting a score, we generate actionable gap data.
def generate_gap_report(scores: dict, threshold: float = 0.45) -> dict:
gaps = []
covered = []
for req, score in scores.items():
if score < threshold:
gaps.append({"requirement": req, "match_score": round(score, 2)})
else:
covered.append({"requirement": req, "match_score": round(score, 2)})
gaps.sort(key=lambda x: x["match_score"])
return {
"total_requirements": len(scores),
"covered": len(covered),
"gaps": gaps,
"match_rate": round(len(covered) / len(scores) * 100, 1)
}
Example output for a real JD:
{
"total_requirements": 14,
"covered": 9,
"gaps": [
{"requirement": "Experience with CI/CD pipelines", "match_score": 0.21},
{"requirement": "Strong stakeholder communication", "match_score": 0.31},
{"requirement": "Agile/Scrum methodology", "match_score": 0.38}
],
"match_rate": 64.3
}
Real Results: 200+ Applications Tested
| Approach | Response Rate |
|---|---|
| Generic resume | 4.2% |
| Keyword-matched resume | 11.7% |
| Semantic gap-closed resume | 28.4% |
The 28.4% response rate came from candidates who explicitly addressed all must_have and implied gaps — even without direct experience, framing adjacent experience semantically made the difference.
What I Learned
1. ATS systems are dumber than you think. Most keyword matching is simple TF-IDF or exact string matching. Semantic embeddings beat them.
2. Implied requirements are the hidden barrier. Nobody tells you "fast-paced environment" means they want startup survivors. Semantic similarity catches this.
3. Cover letters are dead — unless they close gaps. Generic cover letters do nothing. A letter that directly addresses your top 3 gaps changes everything.
4. Most people apply to the wrong jobs entirely. A 40% semantic match is a waste of everyone's time. Target 65%+ and close the gaps explicitly.
What's Next
We're integrating this pipeline into jobwechsel-ki.ch — an AI career coach built specifically for German-speaking job seekers. The system does everything described above, plus generates a full application package (cover letter, resume rewrites, LinkedIn optimization) in minutes.
Drop questions in the comments — happy to go deeper on any part of this.
Built with Python, caffeine, and a genuine frustration with black-hole job applications.
Top comments (0)