DEV Community

I Built a Multi-Agent Job Search System with Claude Code — 631 Evaluations, 12 Modes

I sold my business after 16 years and went all-in on AI. Week one of the job search: read JDs, map skills, customize CV, fill forms. Everything manual, everything repetitive.

By week two I stopped applying. I was building the system that would do it for me.

631 evaluations later, Career-Ops makes more filtering decisions than I do.

What I Built

A multi-agent system with 12 operational modes, each a Claude Code skill file with its own context and rules. Not a script — an agent that reasons about the problem domain.

The key architectural choice: modes over one long prompt.

career-ops/
├── modes/
│   ├── _shared.md          # North Star archetypes, proof points
│   ├── auto-pipeline.md    # Full pipeline: JD → eval → PDF → tracker
│   ├── oferta.md           # Single-offer evaluation (A-F)
│   ├── batch.md            # Parallel processing with workers
│   ├── pdf.md              # ATS-optimized CV per offer
│   ├── scan.md             # Portal discovery
│   ├── apply.md            # Playwright form-filling
│   └── ... (12 total)
├── reports/                # 631 evaluation files
├── output/                 # Generated PDFs
├── applications.md         # Central tracker
└── scan-history.tsv        # 680 deduplicated URLs
Enter fullscreen mode Exit fullscreen mode

Why modes? Each one loads only the context it needs. auto-pipeline skips contact rules. apply skips scoring logic. Less context = better decisions from the LLM.

The 10-Dimension Scoring

Every offer runs through a weighted evaluation framework:

Dimension What It Measures Weight
Role Match Alignment with CV proof points Gate-pass
Skills Alignment Tech stack overlap Gate-pass
Seniority Stretch level High
Compensation Market rate vs target High
Geographic Remote/hybrid feasibility Medium
Company Stage Startup/growth/enterprise fit Medium
Product-Market Fit Problem domain resonance Medium
Growth Trajectory Career ladder visibility Medium
Interview Likelihood Callback probability High
Timeline Hiring urgency Low

Role Match and Skills Alignment are gate-pass — if they fail, the final score drops regardless of everything else. 74% of evaluated offers scored below 4.0.

The Pipeline

auto-pipeline is the flagship mode. A URL goes in, and out comes:

  1. Extract JD — Playwright navigates to the URL, extracts structured content
  2. Evaluate 10D — Claude reads JD + CV + portfolio, generates scoring
  3. Generate report — Markdown with 6 blocks: summary, CV match, level, comp, personalization, interview probability
  4. Generate PDF — HTML template + keyword injection + Puppeteer render
  5. Register tracker — TSV auto-merge via Node.js script
  6. Dedup — Checks 680 URLs in scan-history.tsv. Zero re-evaluations

Batch Processing

For high volume, batch mode launches a conductor that orchestrates parallel workers:

# conductor spawns N workers, each an independent Claude Code process
./batch-runner.sh --input batch/batch-input.tsv --workers 4

# Each worker:
# 1. Claims a URL from the queue (lock file prevents doubles)
# 2. Runs auto-pipeline
# 3. Writes result to batch-state.tsv
# 4. Picks next URL
Enter fullscreen mode Exit fullscreen mode

122 URLs processed in parallel. Fault-tolerant: a worker failure never blocks the rest. Resumable — reads state and skips completed items.

The AI Resume Builder

A generic PDF loses. Career-Ops generates a different ATS-optimized CV for each offer:

  1. Extract 15-20 keywords from the JD
  2. Detect language (English JD → English CV)
  3. Detect region (US → Letter, Europe → A4)
  4. Detect archetype (6 predefined: AI Platform, Agentic, PM, SA, FDE, Transformation)
  5. Select top 3-4 projects by relevance
  6. Reorder bullets — most relevant experience moves up
  7. Render PDF — Puppeteer, self-hosted fonts, single-column ATS-safe

Same CV. 6 different framings. All real — keywords get reformulated, never fabricated.

Results

2 months in production. Real numbers, not demos.

  • 631 reports generated
  • 68 applications sent
  • 354 PDFs generated
  • 680 URLs deduplicated
  • 0 re-evaluations

What I Learned

Automate analysis, not decisions. Career-Ops evaluates 631 offers. I decide which ones get my time. HITL is not a limitation — it is the design.

Modes beat a long prompt. 12 modes with precise context outperform a 10,000-token system prompt. This was my biggest mistake early on — I started with one massive prompt and the quality was terrible.

Dedup is more valuable than scoring. 680 deduplicated URLs mean 680 evaluations I never had to repeat. Boring infrastructure, highest ROI.

A CV is an argument, not a document. A generic PDF convinces nobody. A CV that reorganizes proof points by relevance and adapts framing to the archetype — that converts.

The system IS the portfolio. Building a multi-agent system to search for multi-agent roles is the most direct proof of competence.

Stack

  • Claude Code — LLM agent: reasoning, evaluation, content generation
  • Playwright — Browser automation: portal scanning and form-filling
  • Puppeteer — PDF rendering from HTML templates
  • Node.js — Utility scripts: merge-tracker, cv-sync-check
  • tmux — Parallel sessions: conductor + workers in batch

Full case study: santifer.io/career-ops-system

Has anyone else built tooling for their job search? Curious about different approaches — especially around evaluation frameworks and dedup strategies.

Top comments (0)