What Breaks When You Let an AI Agent Run Your Job Search

#ai #career #automation #machinelearning

What Breaks When You Let an AI Agent Run Your Job Search

I've been running an autonomous job search pipeline for about six weeks. Not "I use ChatGPT to polish my resume" — I mean a 42-cron-job, SQLite-backed, multi-LLM system that scans job boards, scores opportunities, generates tailored resumes and cover letters, and submits applications. At its peak it sent 44 applications in a single day without me touching a keyboard.

Here's what I learned when I looked at the actual data.

The Architecture

The system has a few moving parts:

Discovery layer: Nine job board integrations (Greenhouse, Lever, Adzuna, Jooble, HN Who's Hiring, CryptoJobsList, and a few others) pulling new listings every few hours via cron.
Scoring layer: An LLM (Claude Sonnet) reads each listing and scores 0-10 for fit, with reasoning stored to the DB. Anything above 7.0 enters the active pipeline.
Packet-building layer: For qualified opportunities, it generates a tailored resume and cover letter using the job description as input.
Submission layer: Browser automation (Playwright/browser-use) fills out Greenhouse/Lever/Workday forms. Email-based applications route through Resend.

Pipeline state lives in a SQLite WAL database with tables for opportunities, approvals, content, outreach history, and learnings. There's a nightly improvement cron that reads the day's activity and writes a structured daily note — basically a PM report on the agent's own performance.

After six weeks and 1,670 scraped opportunities, here's what the numbers said.

Bottleneck #1: The Funnel Was Backwards

I spent weeks optimizing discovery. I added more sources, tuned the scoring, built deduplication. The intake funnel was humming.

Then I looked at the actual pipeline distribution:

discovered: 616
strategy_ready: 495 (these have been researched and have a valid apply URL)
applied: 132

Of the 495 strategy_ready opportunities, only 10 had a generated cover letter. Only 10 had a tailored resume.

So 462 submission-ready opportunities — jobs where the system had already done the research, confirmed the URL, and identified the ATS type — were just sitting there. Not because submission was broken. Because the packet-building step was never put on a cron schedule.

The classic mistake: automate intake, ignore outflow. The leaky bucket metaphor doesn't capture it — it's more like building a massive funnel over a pinhole.

Fix: The blitz_packet_builder.py script runs fine when invoked manually. It just needs a high-frequency cron job. Adding that immediately unblocks 462 applications.

Bottleneck #2: The Scoring System Was Lying

One of the signals I track is score distribution across sources. When I looked at Jooble specifically, the scores were suspiciously high — 9.3, 9.8, 10.0 — for roles that were clearly garbage:

"Crypto Pro Network" (aggregator spam): 10.0
Tesla UK listing (location disqualified): 9.8
"Wing Assistant" content writer: 10.0
Head of Finance at a recruiter firm: 9.3

The root cause was that the batch scoring for aggregator imports was doing keyword matching without industry or seniority disqualifiers. Worse, no score_reasoning was being stored for these entries — so there was no audit trail, just inflated scores polluting the pipeline metrics.

This matters because I was making resource allocation decisions based on aggregate scores. If your quality metrics are wrong, your prioritization is wrong.

Fix: Jooble and Adzuna get quarantined until the scoring pipeline stores reasoning for every entry and enforces industry-level disqualifiers. Direct employer boards (Greenhouse/Lever scraped directly) have much higher signal-to-noise.

Bottleneck #3: Browser Automation Has a Ceiling

The submission layer uses browser-use backed by Gemini Flash for the LLM reasoning. It works well — fills out forms intelligently, handles multi-step ATS flows, adapts to different field structures.

But at 3am batch runs, it started hitting 429 RESOURCE_EXHAUSTED errors after 5-6 submissions. Gemini Flash has aggressive rate limits on the free tier, and the agent was burning through its quota in the first few jobs of a batch.

This killed the overnight batch processing that was supposed to be the velocity engine.

Fix: Route browser-use's LLM backend through OpenRouter instead of direct Gemini API. OpenRouter gives access to multiple models behind a single API, so when one hits rate limits, fallback routing kicks in automatically. Should have done this at the start.

The Lesson Nobody Talks About: Scoring Your Own Scoring

The most valuable output from this whole build isn't the 132 applications — it's the nightly improvement loop. Every night, a cron job reads the day's pipeline activity, application velocity, stage transition rates, and source performance, then writes a structured analysis. It's basically asking "what went wrong today and why?"

That loop is where I caught all three of the above issues. Not by looking at dashboards, but by having the system describe its own failure modes in plain language and commit that to a daily note file.

If you're building any agentic workflow, instrument the meta-layer first. The system should be able to tell you where it's stuck.

The Positioning Insight the Data Forced

Six weeks of data made something obvious that I had been rationalizing away.

I'd been applying for Community Manager and Social Media Lead roles at $80-$100K because that's what my most recent title maps to. But the roles where I had genuine differentiation — the AI infrastructure work, the heterogeneous LLM routing, the 42-cron production agent — weren't getting surfaced because the scoring wasn't looking for them.

The system I built to automate my job search is itself a more compelling portfolio piece than most of the roles I was applying for. Running a production AI agent across 9 job boards with 42 scheduled jobs, SQLite pipeline state, multi-model routing, and browser automation isn't community management. It's applied AI systems work.

That realization changed the pipeline targeting entirely: AI consulting, AI-native companies hiring operators who can actually build, roles at the AI x crypto intersection where the two skill sets compound.

The agent uncovered the positioning pivot by generating the data that made the old positioning obviously wrong.

The code isn't public yet. If you're building something similar or have hit the same bottlenecks, I'm curious what your packet-generation approach looks like — the cover letter generation step is where most quality control has to happen and it's the hardest part to parallelize without sacrificing output quality.

Nathaniel Hamlett — AI systems builder and ecosystem operator. Currently available for consulting and select full-time roles. nathanhamlett.com