DEV Community

Nikalai Ninichuk
Nikalai Ninichuk

Posted on • Originally published at Medium

Building an Async Job Aggregator with Python: Semantic Deduplication, Hybrid AI Scoring, and LLM Fallback Cascades

Finding a job or landing freelance contracts on platforms like Upwork and Freelancer has turned into a pure speed simulator. The most relevant postings often close within the first 10–15 minutes. But standard automation approaches — like blindly throwing every raw text into ChatGPT — fail instantly in production. You end up facing massive OpenAI API bills, unstable JSON responses from LLMs, strict Telegram Bot API rate limits, and aggressive Cloudflare blocks on corporate career pages.

In this article, I will break down the engineering inner workings of my project, Moi Job Finder Bot — an async AI assistant built with aiogram 3, Elasticsearch, and PostgreSQL. There is no magic under the hood—just pragmatic architectural trade-offs, fallback chains, and hybrid math designed to keep the system alive when the infrastructure starts to sweat.

  1. Hybrid AI Job Scoring: Merging Embeddings and LLMs

When a job seeker requests a smart search based on their resume, the system must return a sorted list of vacancies with a strict match percentage (score from 0 to 100) and a textual explanation (reason).

Doing this solely via LLM reasoning is slow and cost-prohibitive. Relying only on keywords is ineffective. To solve this, the project utilizes a hybrid blending architecture.

Pre-filtering via Semantic Vectors

First, the system calculates a semantic similarity score based on embeddings. We use OpenAI text-embedding-3-small with an automatic fallback to local sentence-transformers models if the external network experiences lag or downtime.

Since raw vector cosine similarity values typically drift within a narrow range, we apply a linear scale compression using a custom cosine_to_score_percent function. It scales the raw cosine metric into a 20-to-92 point range, followed by a strict clamp(0, 100). This normalizes the vector score before it mixes with the LLM data.

Expert LLM Verification

Next, the filtered jobs are pushed to the LLM. The model is strictly instructed to return a structured JSON object containing a matches array.

For OpenAI: In the generate_json method, we explicitly pass response_format={"type": "json_object"} to guarantee structural integrity at the API level.
For Groq (Llama 3.3): Since a native JSON mode wasn’t universally available in our shared connector interface, we implement strict validation. The response is handled via an async json.loads(). If the model wraps the JSON in conversational prose or markdown code blocks, a regex heuristic kicks in: re.search(r'{[\s\S]*}', ...) extracts the object boundaries for a secondary deserialization attempt. No raw string regex hacking for percentages here.
Score Blending Math

The final percentage displayed to the user is calculated using a weighted average. The blending coefficient is defined in the environment config:

python

Blending logic concept from resume_job_semantic.py

RESUME_JOB_EMBED_BLEND = 0.42

final_score = (embedding_score * RESUME_JOB_EMBED_BLEND) + (llm_score * (1 - RESUME_JOB_EMBED_BLEND))

In the Telegram bot interface, we set a target smart search window of 70% to 85%. If no vacancies fall into this bracket, the system smoothly relaxes the query filters so the user doesn’t end up staring at an empty screen.

Top comments (0)