Santiago Fernández de Valderrama Aparicio

Posted on Mar 17

I Built a Multi-Agent Job Search System with Claude Code — 631 Evaluations, 12 Modes

#claudecode #ai #productivity #career

I sold my business after 16 years and went all-in on AI. Week one of the job search: read JDs, map skills, customize CV, fill forms. Everything manual, everything repetitive.

By week two I stopped applying. I was building the system that would do it for me.

631 evaluations later, Career-Ops makes more filtering decisions than I do.

What I Built

A multi-agent system with 12 operational modes, each a Claude Code skill file with its own context and rules. Not a script — an agent that reasons about the problem domain.

The key architectural choice: modes over one long prompt.

career-ops/
├── modes/
│   ├── _shared.md          # North Star archetypes, proof points
│   ├── auto-pipeline.md    # Full pipeline: JD → eval → PDF → tracker
│   ├── oferta.md           # Single-offer evaluation (A-F)
│   ├── batch.md            # Parallel processing with workers
│   ├── pdf.md              # ATS-optimized CV per offer
│   ├── scan.md             # Portal discovery
│   ├── apply.md            # Playwright form-filling
│   └── ... (12 total)
├── reports/                # 631 evaluation files
├── output/                 # Generated PDFs
├── applications.md         # Central tracker
└── scan-history.tsv        # 680 deduplicated URLs

Why modes? Each one loads only the context it needs. auto-pipeline skips contact rules. apply skips scoring logic. Less context = better decisions from the LLM.

The 10-Dimension Scoring

Every offer runs through a weighted evaluation framework:

Dimension	What It Measures	Weight
Role Match	Alignment with CV proof points	Gate-pass
Skills Alignment	Tech stack overlap	Gate-pass
Seniority	Stretch level	High
Compensation	Market rate vs target	High
Geographic	Remote/hybrid feasibility	Medium
Company Stage	Startup/growth/enterprise fit	Medium
Product-Market Fit	Problem domain resonance	Medium
Growth Trajectory	Career ladder visibility	Medium
Interview Likelihood	Callback probability	High
Timeline	Hiring urgency	Low

Role Match and Skills Alignment are gate-pass — if they fail, the final score drops regardless of everything else. 74% of evaluated offers scored below 4.0.

The Pipeline

auto-pipeline is the flagship mode. A URL goes in, and out comes:

Extract JD — Playwright navigates to the URL, extracts structured content
Evaluate 10D — Claude reads JD + CV + portfolio, generates scoring
Generate report — Markdown with 6 blocks: summary, CV match, level, comp, personalization, interview probability
Generate PDF — HTML template + keyword injection + Puppeteer render
Register tracker — TSV auto-merge via Node.js script
Dedup — Checks 680 URLs in scan-history.tsv. Zero re-evaluations

Batch Processing

For high volume, batch mode launches a conductor that orchestrates parallel workers:

# conductor spawns N workers, each an independent Claude Code process
./batch-runner.sh --input batch/batch-input.tsv --workers 4

# Each worker:
# 1. Claims a URL from the queue (lock file prevents doubles)
# 2. Runs auto-pipeline
# 3. Writes result to batch-state.tsv
# 4. Picks next URL

122 URLs processed in parallel. Fault-tolerant: a worker failure never blocks the rest. Resumable — reads state and skips completed items.

The AI Resume Builder

A generic PDF loses. Career-Ops generates a different ATS-optimized CV for each offer:

Extract 15-20 keywords from the JD
Detect language (English JD → English CV)
Detect region (US → Letter, Europe → A4)
Detect archetype (6 predefined: AI Platform, Agentic, PM, SA, FDE, Transformation)
Select top 3-4 projects by relevance
Reorder bullets — most relevant experience moves up
Render PDF — Puppeteer, self-hosted fonts, single-column ATS-safe

Same CV. 6 different framings. All real — keywords get reformulated, never fabricated.

Results

2 months in production. Real numbers, not demos.

631 reports generated
68 applications sent
354 PDFs generated
680 URLs deduplicated
0 re-evaluations

What I Learned

Automate analysis, not decisions. Career-Ops evaluates 631 offers. I decide which ones get my time. HITL is not a limitation — it is the design.

Modes beat a long prompt. 12 modes with precise context outperform a 10,000-token system prompt. This was my biggest mistake early on — I started with one massive prompt and the quality was terrible.

Dedup is more valuable than scoring. 680 deduplicated URLs mean 680 evaluations I never had to repeat. Boring infrastructure, highest ROI.

A CV is an argument, not a document. A generic PDF convinces nobody. A CV that reorganizes proof points by relevance and adapts framing to the archetype — that converts.

The system IS the portfolio. Building a multi-agent system to search for multi-agent roles is the most direct proof of competence.

Stack

Claude Code — LLM agent: reasoning, evaluation, content generation
Playwright — Browser automation: portal scanning and form-filling
Puppeteer — PDF rendering from HTML templates
Node.js — Utility scripts: merge-tracker, cv-sync-check
tmux — Parallel sessions: conductor + workers in batch

Full case study: santifer.io/career-ops-system

Has anyone else built tooling for their job search? Curious about different approaches — especially around evaluation frameworks and dedup strategies.

DEV Community