Use of AI in a job search

#ai #python #career #llm

AI is being used in recruitment widely and everybody knows that. That led me to think: what if we do it the opposite way? Give a user the power of AI in his job search? Why and what for? For two reasons: first and obvious: let him find a better suited job. But there's also another thing: by studying the results we could see what in the user's profile made a given job offer to be suitable or not? Maybe there can be something changed in the profile? Because... that's also probably the same thing the recruitment's AI system is observing. So, in this way, user could make his profile better.

Now, the whole system I've made consists of 4 stages: web crawling, offer analysis, profile analysis and finally job matching. The final output is a proposal of a decision (apply, reject, maybe), summary of pros and cons and some thoughts and reasoning.

Web crawling is a huge thing and I won't discuss it here, as it's way too big for this article. I'll just mention it requires analysing the pages, deciding if it's even related, if it's a job offer or maybe a job offers list. It should analyze the links inside and find consecutive subpages. The output should be pairs of: url of an offer and its full text, without any HTML tags etc., just a pure text. All the other elements of this system we'll discuss below.

Technical stuff

Before we start diving into details of each of the stages, let's talk for a moment about technology used here. I've decided to make it as simple as possible - yet very easy to understand - and easy to run on user's home machine. That's why for provider of LLM models I chose Ollama, very handy system for self hosted LLMs. Coding will be done in Python (I'm using Python 3.12) using only elementary additional packages, like json, PdfReader, tqdm, requests and ollama. For communicating with LLMs through Ollama I've made a simple class with two types of clients: the one using ollama implementation and the one communicating through HTTP JSON API. You can see this little library on GitHub. It also contains few examples on how to use it in various scenarios. In this project we will use the structured JSON way.

Some general thoughts on how to utilize AI in similar jobs

The key is a very precise system prompt, defining how the model should act and what it should produce as an output. Something like "You are a job offer analyzer ... Return the output as JSON with the following fields ... do not add any explanations. Output only JSON ...". Also setting a model's temperature to 0 helps getting highly consistent, deterministic, and focused output. Another thing is to granulate the overall job to as small steps as possible. This is what we can see on above diagram - do not try to do few steps in one go, give model a single and precise job to do.

Offer analysis

The input to this stage is a pair of info: an url of an offer and its full text, stripped from any HTML tags etc - just a text. As an output we want a structured info, containing all the relevant info, like title, salary, type of job, requirements, nice-to-have etc. Let's see an example:

{
    "url": "https://great.company.com/job-offer",
    "company": "Great Company",
    "title": "Tech Lead",
    "location": "100% remote",
    "remote": true,
    "seniority": "lead",
    "salary": "",
    "description": "Provide technical leadership to the delivery team, be accountable for delivering defined feature sets, design and develop components within the data and analytics layer of an investment research platform, co-create system architecture and technology standards, ensure solution quality through code reviews, mentoring, and oversight of engineering practices, support the Product Owner and work closely with the team in backlog planning and execution, actively contribute to the development of analytical tools for investment analysts, participate in R&D work related to future iterations of the platform.",
    "responsibilities": [],
    "requirements": [
        "proven experience in providing technical leadership and acting as a Tech Lead in enterprise scale projects",
        "expert knowledge of agile software delivery and DevOps across the SDLC",
        "strong mentoring and coaching skills, including implementing engineering, architecture, and testing best practices",
        "experience in initiating and driving continuous improvement initiatives",
        "ability to work closely with Product Owners, stakeholders, and business users",
        "English proficiency at a minimum B2+ level"
    ],
    "nice_to_have": [],
    "technologies": [
        "agile software delivery",
        "DevOps",
        "SDLC"
    ],
    "offer": [],
    "language": "en"
}

Most important thing (as in whole this system) is a system prompt, making sure AI will return a precise output, formatted as we need:

system_prompt = """
You are a job offer analyzer. 
Extract structured information from the job description text. 
Return the output as JSON with the following fields:

- url: URL of the job offer
- company: infer company name if possible from the text or URL
- title: job title
- location: city / country / "remote"
- remote: true / false / "unknown"
- employment_type: full-time / contract / internship / unknown
- seniority: junior / mid / senior / lead / manager / unknown
- salary: salary range if specified
- description: short summary in your own words
- responsibilities: list of main responsibilities
- requirements: list of key requirements
- nice_to_have: list of additional "nice to have" skills
- technologies: programming languages, frameworks, tools
- offer: list of benefits, if available
- language: language of the job offer (en/pl)

Do not add any explanations. Output only JSON.
Never invent jobs that are not clearly present.
Never hallucinate technologies.
If a field is missing, use empty string, empty array, or "unknown".
"""

Then, the function itself is really simple if we'll use the ollama library I've quoted above:

def extract_job_offer(url, text):
    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0)
    llm_client.set_json_format(True)

    data, role = llm_client.call_llm(f"URL: {url}\n\nText:\n{text}", system_prompt)
    return data

call_llm method from LLMClientOllama class will take care of proper JSON payload sending, receiving and extracting from the response.

💡
For all the structuring/extraction jobs I've chosen the QWEN3 model, but you can check other models. Specifically, QWEN3 14B is running smoothly on nVidia with 12GB of VRAM.

User's profile creation from CV

Before we can match the offer, we need to have a second side - the user's profile. Again, it will be a structured JSON. We could do it manually, but we'll make a PDF extractor, as headhunting systems are doing. That can give us a feedback on how well our CV is composed.

First step is extracting all the text from CV. This already gives a hint: DO NEVER send PDFs made of graphics (scanning or other composition tools) - it has to be a real text (not rendered). For this task we'll use PdfReader class from pypdf package.

from pypdf import PdfReader

def extract_text_from_pdf(pdf_path: str) -> str:
    reader = PdfReader(pdf_path)
    pages = []

    for page in reader.pages:
        text = page.extract_text()
        if text:
            pages.append(text)

    return "\n".join(pages)

Then we prepare the system prompt:

CANDIDATE_SYSTEM_PROMPT = """
You are an expert technical recruiter and career analyst.

Your task is to analyze a CV (resume) and extract a structured candidate profile.

Rules:
- Output ONLY valid JSON
- Do NOT include explanations, markdown, comments or prose
- If some information is missing, infer conservatively or use null
- Normalize names (e.g. "C plus plus" → "C++")
- Seniority must be one of:
  ["junior", "mid", "senior", "lead", "staff", "principal", "staff / principal / lead", "manager", "director", "cto", "ceo", "unknown"]

The JSON schema MUST match exactly:

{
  "seniority": string,
  "years_of_experience": number,
  "primary_roles": string[],
  "core_languages": string[],
  "secondary_languages": string[],
  "domains": string[],
  "leadership": {
    "people_management": boolean,
    "tech_lead": boolean,
    "scrum_master": boolean
  },
  "cloud": string[],
  "devops": string[],
  "frontend_level": string,
  "remote_preference": boolean,
  "languages_spoken": { "pl": string, "en": string },
  "job_preferences": {
    "roles_to_avoid": string[],
    "preferred_roles": string[]
  }
}

Think carefully. This profile will be used for automated job matching.
"""

and a small function preparing a user prompt:

def build_candidate_prompt(cv_text: str) -> str:
    return f"""
Analyze the following CV and extract the candidate profile.

CV TEXT:
----------------
{cv_text}
----------------
"""

Now we're ready to build a profile from CV:

def build_candidate_profile_from_cv(pdf_path: str) -> dict:
    cv_text = extract_text_from_pdf(pdf_path)

    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0.2)
    llm_client.set_json_format(True)

    profile, llm_role = llm_client.call_llm(build_candidate_prompt(cv_text), CANDIDATE_SYSTEM_PROMPT)
    return profile

Of course the resulting JSON data can be modified to tweak it, but also it could be used to verify if maybe something should be added to the original CV instead.

Algorithmic match

It would be tempting to do all matching job using AI, but there're at least two points against it:

every AI call takes time - and if we can avoid it with some obvious rejects, it's a plus;
collecting some info, categorizing it etc. will be done better in simple code, as AI may sometimes hallucinate things.

That's why as a first step in job matching we'll perform some algorithmic data collection and first decision.

What you can do in this step, of course depends on the kind of a job, but for software developers you could score things like tech stack, seniority, domains, leadership duties, and general logistics. Let's see an example:

def normalize_token(s: str) -> str:
    return s.lower().replace(" ", "").replace(".", "")

def extract_technologies_from_offer(offer) -> set[str]:
    required_text = " ".join(offer.get("requirements", [])).lower()
    optional_text = " ".join(offer.get("nice_to_have", [])).lower()

    known_tech = {
        "c++", "c#", "python", "java", "javascript", "typescript",
        "golang", "rust", "swift", "kotlin", "scala",
        "node.js", "react", "angular", "docker", "kubernetes",
        "aws", "azure", "gcp", "qt", ".net", "opengl", "vulkan", "webgl", "webassembly",
        "postgresql", "mysql", "mongodb", "redis", "elasticsearch", "postgres", "mongo",
        "gitlab", "git", "GitLab CI", "ci/cd", "github actions", "github"
    }

    found_required = set()
    found_optional = set()
    for tech in known_tech:
        if tech in required_text:
            found_required.add(tech)
        if tech in optional_text:
            found_optional.add(tech)

    return found_required, found_optional

def score_tech_stack(profile, offer):
    offer_required_tech, offer_optional_tech = extract_technologies_from_offer(offer)

    profile_tech = {
        normalize_token(t)
        for t in profile["core_languages"] + profile["secondary_languages"]
    }

    strengths = []
    gaps = []
    score = 0

    for tech in offer_required_tech:
        if normalize_token(tech) in profile_tech:
            strengths.append(tech)
            score += 10
        else:
            gaps.append(tech)
            score -= 5

    for tech in offer_optional_tech:
        if normalize_token(tech) in profile_tech:
            strengths.append(tech)
            score += 5

    return max(min(score, 30), 0), strengths, gaps

Whatever you do, in the end of this step you should have something like this as a result:

{
    "decision": "reject",
    "score": 50,
    "strengths": [
        "java",
        "python",
        "c++"
    ],
    "gaps": [
        "javascript"
    ]
}

You should set the overall score levels for taking a decision, depending on how you've set scoring. Set 3 levels of decision: apply / maybe / reject - and filter out the rejected ones before passing the offers to the next step, which is AI matching.

AI match

Now it's time for the last step - the key one. Its input would be an offer, user's profile and algorithmic matching results, so it can learn from it.

Let's start from a system prompt:

REVIEW_SYSTEM_PROMPT = """
You are a senior technical recruiter and staff-level software engineer.

Your task is to evaluate whether this job offer is worth applying to
for experienced software engineer with attached profile.

You MUST be critical and skeptical.
Reject roles that are:
- execution-only
- lacking ownership or technical impact

Return ONLY valid JSON.
Do NOT include markdown.
Do NOT include explanations outside JSON.

JSON schema:

{
  "final_verdict": "apply" | "maybe" | "reject",
  "confidence": 0-100,
  "key_reasons": [string],
  "risks": [string],
  "positive_signals": [string],
  "summary": string
}
"""

Of course it should be altered to suit your needs.

Next, let's prepare a user prompt:

def prepare_llm_input(profile: dict, offer: dict, match: dict) -> dict:
    """
    Builds a clean, stable input structure for LLM evaluation.
    No formatting, no text generation.
    """

    return {
        "candidate": {
            "seniority": profile.get("seniority"),
            "years_of_experience": profile.get("years_of_experience"),
            "core_stack": profile.get("core_languages"),
            "secondary_stack": profile.get("secondary_languages"),
            "domains": profile.get("domains"),
            "leadership": profile.get("leadership"),
            "preferences": {
                "remote": profile.get("remote_preference"),
                "preferred_roles": profile.get("preferred_roles"),
                "roles_to_avoid": profile.get("roles_to_avoid"),
            }
        },

        "job": {
            "title": offer.get("title"),
            "location": offer.get("location"),
            "responsibilities": offer.get("responsibilities", [])[:10],
            "requirements": offer.get("requirements", [])[:10],
            "nice_to_have": offer.get("nice_to_have", [])[:5]
        },

        "algorithmic_assessment": {
            "score": match.get("score"),
            "decision": match.get("decision"),
            "strengths": match.get("strengths", []),
            "gaps": match.get("gaps", []),
            "red_flags": match.get("red_flags", [])
        }
    }

def build_llm_prompt(llm_input: dict) -> str:
    """
    Converts structured LLM input into a readable prompt.
    """

    return f"""
Candidate profile:
{json.dumps(llm_input["candidate"], indent=2, ensure_ascii=False)}

Job offer:
{json.dumps(llm_input["job"], indent=2, ensure_ascii=False)}

Algorithmic assessment:
{json.dumps(llm_input["algorithmic_assessment"], indent=2, ensure_ascii=False)}

Evaluate realistically whether applying makes sense.
"""

Now we can call it:

def review_offer_llm(profile: dict, offer: dict, match: dict) -> dict:

    llm_data = prepare_llm_input(profile, offer, match)
    llm_input = build_llm_prompt(llm_data)

    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0)
    llm_client.set_json_format(True)

    data, role = llm_client.call_llm(llm_input, REVIEW_SYSTEM_PROMPT)

    return data

For this last step you could experiment with various models, as they can give slightly different reasoning. While the overall results will be probably similar, the reasoning can be very helpful for user can react and either take his action or maybe update his/her CV according to those results.

Let's see an example output of this step:

{
    "final_verdict": "reject",
    "confidence": 85,
    "key_reasons": [
        "Role lacks technical ownership and leadership responsibilities",
        "Candidate's seniority (manager) far exceeds job requirements",
        "Focus on code evaluation rather than system design/architecture",
        "Part-time hourly contract misaligned with candidate's experience level"
    ],
    "risks": [
        "Underutilization of candidate's leadership and technical expertise",
        "Potential for role to be perceived as junior-level despite candidate's seniority",
        "Mismatch between compensation structure (hourly) and candidate's career stage"
    ],
    "positive_signals": [
        "Remote work flexibility",
        "Opportunity to work with AI systems",
        "Python/C++ stack alignment"
    ],
    "summary": "While the technical stack aligns, the role's responsibilities and compensation structure are fundamentally misaligned with a senior manager's experience and career expectations. The position offers limited technical impact and leadership opportunities, making it unsuitable for someone with 25 years of experience in complex domains like medical devices and embedded systems."
}