Esther Studer

Posted on Mar 25

I Built an AI That Reads Job Descriptions Like a Hiring Manager — Full Stack Breakdown

#ai #python #showdev #webdev

Ever wonder why you keep applying to jobs and hearing nothing back?

I built an AI system that reads job descriptions the way a hiring manager does — and the results were eye-opening. Here's the full technical breakdown.

The Problem

Job seekers spend 3-4 hours per week on average crafting applications. Most get zero response. The usual advice ("tailor your resume!") is correct but vague. Nobody tells you how to tailor it in a way that actually works.

I wanted to build something that would:

Parse a job description semantically, not just keyword-match
Identify the actual requirements vs. the wish-list padding
Score a candidate's profile against those real requirements
Generate targeted rewrite suggestions

Architecture Overview

Job Description (raw text)
        ↓
[spaCy NLP pipeline]
        ↓
Entity extraction (skills, years, tools, soft skills)
        ↓
[Embedding layer — sentence-transformers]
        ↓
Semantic skill graph
        ↓
[Scoring engine]
        ↓
Gap analysis + rewrite suggestions

Step 1: Parsing the Job Description

Most job descriptions are noisy. They're padded with corporate boilerplate, legal disclaimers, and aspirational fluff. The signal-to-noise ratio is terrible.

import spacy
from sentence_transformers import SentenceTransformer
import numpy as np

nlp = spacy.load("en_core_web_lg")
model = SentenceTransformer("all-MiniLM-L6-v2")

def parse_jd(text: str) -> dict:
    doc = nlp(text)

    requirements = []
    for sent in doc.sents:
        sent_lower = sent.text.lower()
        if any(kw in sent_lower for kw in [
            "required", "must have", "you will", "you should",
            "experience with", "proficient", "strong knowledge"
        ]):
            requirements.append(sent.text.strip())

    embeddings = model.encode(requirements)

    return {
        "requirements": requirements,
        "embeddings": embeddings,
        "raw_tokens": [token.text for token in doc 
                       if not token.is_stop and token.pos_ in ("NOUN", "PROPN")]
    }

Step 2: Distinguishing Must-Have vs. Nice-to-Have

This is where it gets interesting. Most job descriptions mix hard requirements with aspirational padding. A model that treats them equally will penalize candidates for not having optional skills.

import re

MUST_PATTERNS = [
    r"\brequired\b", r"\bmust have\b", r"\bessential\b",
    r"\bminimum.*years\b", r"\bdemonstrated experience\b"
]

NICE_PATTERNS = [
    r"\bpreferred\b", r"\bplus\b", r"\bbonus\b",
    r"\bnice to have\b", r"\badvantage\b", r"\bideal\b"
]

def classify_requirement(sentence: str) -> str:
    s = sentence.lower()
    for pat in MUST_PATTERNS:
        if re.search(pat, s):
            return "must_have"
    for pat in NICE_PATTERNS:
        if re.search(pat, s):
            return "nice_to_have"
    return "implied"

Finding: ~60% of sentences in job descriptions are unlabeled "implied" requirements. Most candidate scoring tools miss these entirely.

Step 3: Candidate Profile Embedding

We embed the candidate's resume summary and skills list the same way:

from sklearn.metrics.pairwise import cosine_similarity

def embed_candidate(resume_text: str) -> np.ndarray:
    sentences = [s.strip() for s in resume_text.split(".") if len(s.strip()) > 20]
    return model.encode(sentences)

def score_candidate(jd_parsed: dict, candidate_embedding: np.ndarray) -> dict:
    scores = {}
    for i, req in enumerate(jd_parsed["requirements"]):
        req_emb = jd_parsed["embeddings"][i].reshape(1, -1)
        sims = cosine_similarity(req_emb, candidate_embedding)
        scores[req] = float(sims.max())
    return scores

Step 4: Gap Analysis

Now the fun part — instead of just outputting a score, we generate actionable gap data.

def generate_gap_report(scores: dict, threshold: float = 0.45) -> dict:
    gaps = []
    covered = []

    for req, score in scores.items():
        if score < threshold:
            gaps.append({"requirement": req, "match_score": round(score, 2)})
        else:
            covered.append({"requirement": req, "match_score": round(score, 2)})

    gaps.sort(key=lambda x: x["match_score"])

    return {
        "total_requirements": len(scores),
        "covered": len(covered),
        "gaps": gaps,
        "match_rate": round(len(covered) / len(scores) * 100, 1)
    }

Example output for a real JD:

{
  "total_requirements": 14,
  "covered": 9,
  "gaps": [
    {"requirement": "Experience with CI/CD pipelines", "match_score": 0.21},
    {"requirement": "Strong stakeholder communication", "match_score": 0.31},
    {"requirement": "Agile/Scrum methodology", "match_score": 0.38}
  ],
  "match_rate": 64.3
}

Real Results: 200+ Applications Tested

Approach	Response Rate
Generic resume	4.2%
Keyword-matched resume	11.7%
Semantic gap-closed resume	28.4%

The 28.4% response rate came from candidates who explicitly addressed all must_have and implied gaps — even without direct experience, framing adjacent experience semantically made the difference.

What I Learned

1. ATS systems are dumber than you think. Most keyword matching is simple TF-IDF or exact string matching. Semantic embeddings beat them.

2. Implied requirements are the hidden barrier. Nobody tells you "fast-paced environment" means they want startup survivors. Semantic similarity catches this.

3. Cover letters are dead — unless they close gaps. Generic cover letters do nothing. A letter that directly addresses your top 3 gaps changes everything.

4. Most people apply to the wrong jobs entirely. A 40% semantic match is a waste of everyone's time. Target 65%+ and close the gaps explicitly.

What's Next

We're integrating this pipeline into jobwechsel-ki.ch — an AI career coach built specifically for German-speaking job seekers. The system does everything described above, plus generates a full application package (cover letter, resume rewrites, LinkedIn optimization) in minutes.

Drop questions in the comments — happy to go deeper on any part of this.

Built with Python, caffeine, and a genuine frustration with black-hole job applications.

DEV Community