I sold my business after 16 years and went all-in on AI. Week one of the job search: read JDs, map skills, customize CV, fill forms. Everything manual, everything repetitive.
By week two I stopped applying. I was building the system that would do it for me.
631 evaluations later, Career-Ops makes more filtering decisions than I do.
What I Built
A multi-agent system with 12 operational modes, each a Claude Code skill file with its own context and rules. Not a script — an agent that reasons about the problem domain.
The key architectural choice: modes over one long prompt.
career-ops/
├── modes/
│ ├── _shared.md # North Star archetypes, proof points
│ ├── auto-pipeline.md # Full pipeline: JD → eval → PDF → tracker
│ ├── oferta.md # Single-offer evaluation (A-F)
│ ├── batch.md # Parallel processing with workers
│ ├── pdf.md # ATS-optimized CV per offer
│ ├── scan.md # Portal discovery
│ ├── apply.md # Playwright form-filling
│ └── ... (12 total)
├── reports/ # 631 evaluation files
├── output/ # Generated PDFs
├── applications.md # Central tracker
└── scan-history.tsv # 680 deduplicated URLs
Why modes? Each one loads only the context it needs. auto-pipeline skips contact rules. apply skips scoring logic. Less context = better decisions from the LLM.
The 10-Dimension Scoring
Every offer runs through a weighted evaluation framework:
| Dimension | What It Measures | Weight |
|---|---|---|
| Role Match | Alignment with CV proof points | Gate-pass |
| Skills Alignment | Tech stack overlap | Gate-pass |
| Seniority | Stretch level | High |
| Compensation | Market rate vs target | High |
| Geographic | Remote/hybrid feasibility | Medium |
| Company Stage | Startup/growth/enterprise fit | Medium |
| Product-Market Fit | Problem domain resonance | Medium |
| Growth Trajectory | Career ladder visibility | Medium |
| Interview Likelihood | Callback probability | High |
| Timeline | Hiring urgency | Low |
Role Match and Skills Alignment are gate-pass — if they fail, the final score drops regardless of everything else. 74% of evaluated offers scored below 4.0.
The Pipeline
auto-pipeline is the flagship mode. A URL goes in, and out comes:
- Extract JD — Playwright navigates to the URL, extracts structured content
- Evaluate 10D — Claude reads JD + CV + portfolio, generates scoring
- Generate report — Markdown with 6 blocks: summary, CV match, level, comp, personalization, interview probability
- Generate PDF — HTML template + keyword injection + Puppeteer render
- Register tracker — TSV auto-merge via Node.js script
- Dedup — Checks 680 URLs in scan-history.tsv. Zero re-evaluations
Batch Processing
For high volume, batch mode launches a conductor that orchestrates parallel workers:
# conductor spawns N workers, each an independent Claude Code process
./batch-runner.sh --input batch/batch-input.tsv --workers 4
# Each worker:
# 1. Claims a URL from the queue (lock file prevents doubles)
# 2. Runs auto-pipeline
# 3. Writes result to batch-state.tsv
# 4. Picks next URL
122 URLs processed in parallel. Fault-tolerant: a worker failure never blocks the rest. Resumable — reads state and skips completed items.
The AI Resume Builder
A generic PDF loses. Career-Ops generates a different ATS-optimized CV for each offer:
- Extract 15-20 keywords from the JD
- Detect language (English JD → English CV)
- Detect region (US → Letter, Europe → A4)
- Detect archetype (6 predefined: AI Platform, Agentic, PM, SA, FDE, Transformation)
- Select top 3-4 projects by relevance
- Reorder bullets — most relevant experience moves up
- Render PDF — Puppeteer, self-hosted fonts, single-column ATS-safe
Same CV. 6 different framings. All real — keywords get reformulated, never fabricated.
Results
2 months in production. Real numbers, not demos.
- 631 reports generated
- 68 applications sent
- 354 PDFs generated
- 680 URLs deduplicated
- 0 re-evaluations
What I Learned
Automate analysis, not decisions. Career-Ops evaluates 631 offers. I decide which ones get my time. HITL is not a limitation — it is the design.
Modes beat a long prompt. 12 modes with precise context outperform a 10,000-token system prompt. This was my biggest mistake early on — I started with one massive prompt and the quality was terrible.
Dedup is more valuable than scoring. 680 deduplicated URLs mean 680 evaluations I never had to repeat. Boring infrastructure, highest ROI.
A CV is an argument, not a document. A generic PDF convinces nobody. A CV that reorganizes proof points by relevance and adapts framing to the archetype — that converts.
The system IS the portfolio. Building a multi-agent system to search for multi-agent roles is the most direct proof of competence.
Stack
- Claude Code — LLM agent: reasoning, evaluation, content generation
- Playwright — Browser automation: portal scanning and form-filling
- Puppeteer — PDF rendering from HTML templates
- Node.js — Utility scripts: merge-tracker, cv-sync-check
- tmux — Parallel sessions: conductor + workers in batch
Full case study: santifer.io/career-ops-system
Has anyone else built tooling for their job search? Curious about different approaches — especially around evaluation frameworks and dedup strategies.
Top comments (0)