Part 1 of 5: The problem, the mental model, and what an autonomous AI agent actually is under the hood.
A few weeks ago, a friend of mine — let’s call him R — sent me a voice note at an unreasonable hour.
He’s 40. Smart guy. Spent the last decade running operations at two different companies, one of which exited. Legitimately good at what he does. But the kind of person who built his career through referrals and relationships and had never really needed to job-hunt the traditional way. Until now.
The note was about seven minutes long. I’ll spare you the full transcript, but the gist: he’d been applying for three months. Targeting AI-adjacent product and operations roles. Had a strong profile on paper. Was getting almost nowhere.
“I’m sending 15 applications a week,” he said. “Every resume looks basically the same because I don’t have time to rewrite it properly each time. I don’t know which ones to prioritize. I’m tracking everything in a spreadsheet that’s become a nightmare. And I feel like I’m just shouting into a void.”
Seven minutes of this. He ended with: “You build AI stuff. Is there something that could help?”
I sat with that for a day.
The honest answer was: nothing good enough exists yet. The tools in the space are either janky resume spinners with no real intelligence, or “AI job search” features bolted onto job boards that mostly just mean keyword matching with extra steps.
And I could see the actual problem clearly — not just “job searching is hard” but the specific mechanical reasons why his approach wasn’t working and why no amount of extra effort would fix it.
So I said: let me just build it.
This is Part 1 of a 5-part series documenting that build. JobFlow AI — an autonomous AI agent that finds roles, scores them against your profile, tailors your resume for each one, writes outreach emails, tracks follow-ups, and preps you for interviews. Built in public. Will be open sourced.
I’m sharing the full engineering process: the architecture decisions, the agent design patterns, the prompt engineering, the orchestration, the deployment, the things that went wrong. Everything.
But before we get into any of that — let’s talk about what’s actually broken with job searching right now. Because understanding the problem is what made the design obvious.
Part 1: The Actual Problem (It’s Not What You Think)
R’s instinct was that he needed to apply to more jobs, or write better cover letters, or optimize his LinkedIn profile.
All wrong. Not because those things don’t matter, but because he was trying to win a game whose rules had changed without anyone sending the memo.
Here’s what happened.
For about two decades, job applications were filtered by ATS software — Applicant Tracking Systems — that parsed resumes and matched them against keywords in job descriptions. The career advice was: “get the right keywords in.” Reasonable. Played the game. Rules were clear.
Then the rules changed.
Most serious ATS platforms now run semantic search — vector embeddings and similarity scoring, not keyword matching. The system understands meaning. “Led revenue operations overhaul” and “built CRM pipeline from scratch” can map to the same concept. The system knows.
The problem: the career advice industry hasn’t caught up.
Most resume coaches and LinkedIn experts are still giving keyword-stuffing advice. For an algorithm that doesn’t use keywords anymore. You’re being told to optimize for a game that no longer exists. And a resume that’s been obviously keyword-engineered can actively hurt you — it reads as manipulative to both the AI doing first-pass screening and the human reviewing finalists.
Simultaneously: AI tools made it trivially easy to spray-apply to 300 jobs. Which means recruiters are drowning in volume. Which means the actual human review window for your application — before someone moves on — might be 15 seconds.
You’re competing with 400 applicants. In 15 seconds. For a system using semantic matching that most of the advice you’ve received doesn’t account for.
R wasn’t doing anything wrong. He was doing the right things for the wrong era.
Solving this properly means: the right information, in the right format, for the right role, without spending three hours per application. A system that reads the job, understands your full profile, picks the right angle, and generates exactly what that specific opportunity requires.
Not one generic resume. Not manual tailoring that eats your evenings. An agent that handles the work so you can handle the judgment calls.
Part 2: The Mental Model That Makes Everything Else Click
I’ve been building agent systems for a while now — Praxiom, our AI product manager at Einstein Labs, runs 36 agent tools in production. Polaris is a spec-driven AI coding agent. agent-stream is a protocol we open sourced for streaming agent events. The agent architecture patterns in this series aren’t new territory for me, but I want to explain them from first principles because the framing matters.
The mental model that makes everything else make sense:
An AI agent = a brain + a harness.
Full stop. That’s the whole thing.
The brain is the Large Language Model — Claude, GPT-4, whatever you’re using. Text goes in, text comes out. Extraordinary at reasoning, synthesis, generation when given proper context.
The harness is everything you build around it. The database that gives it memory. The scrapers and APIs that feed it information. The orchestration layer that decides what runs and in what order. The validation layer that checks its output. The task queue that lets it run in the background. The UI that makes it usable. The retry logic. The logging. The error handling.
The brain: maybe 5% of what you build.
The harness: the other 95%.
This is the thing that gets underestimated, consistently, by everyone writing about “building AI products.” The content focuses on prompts — because prompts are visible and legible and feel like the interesting part. But the prompt is downstream of everything the harness has to do. If the harness fails to assemble the right context, the best prompt in the world produces noise.
Here’s why.
The LLM is stateless. Completely. Every call starts with no memory of anything that happened before. It doesn’t know what job you looked at yesterday. Doesn’t know what your resume says. Doesn’t know which companies you’ve already applied to. Doesn’t know anything about you at all.
Clean slate. Every time.
So when you want an agent to tailor a resume for a specific role, you can’t just say “tailor my resume for this job.” Before that call can produce anything useful, the harness needs to have assembled:
- Your full work history, projects, and skills
- Which of your resume variants best fits this role type
- The specific job requirements, extracted and structured
- Company context — size, stage, culture signals from the listing
- Your positioning for this type of role
- Any relevant achievements that map to what this company is looking for
All of that, formatted correctly, labeled clearly, delivered in a form the LLM can work with efficiently.
The harness does that. Not the LLM. The harness.
The analogy I keep coming back to: you hire the sharpest consultant you’ve ever worked with. Extraordinary. But she has complete amnesia between every session. Comes in fresh every time, knows nothing about your company, your product, your history.
She’s still extraordinary. Brief her properly and she produces work that justifies every rupee. Show up and say “pick up where we left off” — she can’t. She literally cannot.
So you build systems around her. Structured briefing documents. Careful context curation. Preparation that translates accumulated knowledge into something she can act on in the session.
That entire preparation system? The briefing infrastructure?
That’s the harness. That’s the 95%.
Part 3: Pre-flight and Post-flight
Once this model is clear, the agent design pattern becomes obvious.
Become a Medium member
Every LLM call has two distinct phases that have nothing to do with the LLM. I call them pre-flight and post-flight.
Pre-flight is everything that happens before you call the LLM.
For the resume tailoring agent, pre-flight looks like:
- Fetch the job record from the database
- Scrape the live job listing for fresh, complete text
- Retrieve the user’s full profile — experiences, projects, skills, positioning
- Determine which of 5 resume variants is the right starting point for this role type
- Pull that variant’s positioning statement and summary
- Compute skill overlap between job requirements and user background
- Format all of this into a structured context document
- Load the agent’s system prompt from its dedicated markdown file
Zero LLM involvement. All deterministic code. Testable. Debuggable. Fast.
Post-flight is everything that happens after the LLM responds.
- Parse and validate output against the expected Pydantic schema
- If validation fails: retry with adjusted parameters
- Store the tailored resume in the database, linked to the correct job and profile
- Update the job’s pipeline status
- Trigger the next workflow step
- Log model used, tokens consumed, latency — for cost tracking and debugging
Again — not the LLM. Code.
Here’s the thing that genuinely shifts how you design these systems: 80% of what makes an agent good is in the pre-flight and post-flight, not the prompt.
The prompt matters. Prompt engineering is real and worth doing well. But if the pre-flight is handing the LLM poorly structured, incomplete, or mislabeled context — no prompt technique saves you. And if the post-flight isn’t validating and correctly persisting the output — you’ll have brilliant LLM responses that produce nothing downstream.
The prompt is the 5 minutes with the consultant. The pre-flight is the two hours assembling her briefing. The post-flight is the three hours turning her output into action.
Optimize for the full process.
Part 4: What JobFlow AI Actually Looks Like
The system runs four services:
- FastAPI backend — ~40 endpoints, all business logic, 13 database tables
- Celery worker — picks up background tasks from Redis queue, runs the actual agents
- Redis — message broker and task state tracking
- Next.js 14 frontend — TanStack Query, Tailwind, shadcn/ui
The interaction flow:
Browser
↓
Next.js frontend
↓
FastAPI [accepts request, queues task, returns immediately]
↓
Redis queue
↓
Celery worker [picks up task]
↓
Agent orchestrator [determines what runs, in what sequence]
↓
Agent modules [10 specialized agents, each with one job]
↓
Services layer [Claude API, web scraper, PDF generator]
↓
SQLite [stores everything]
↑
Frontend polls for status updates
The reason the API returns immediately and the agent runs in the background: LLM calls take 3–15 seconds, sometimes longer. Synchronous request-response for AI workloads is the wrong pattern. Queue it, run it, poll for completion.
The 10 agent modules:
Profile Ingest — Takes raw LinkedIn text, resume, GitHub URL, writing samples — extracts structured profile data
Profile Synthesis — Turns structured profile into positioning, career narrative, and 5 resume variants
Job Parser — Reads a listing, extracts requirements, skills, company signals, red flags
Job Scorer — Scores the role against your profile on 7 weighted dimensions
Resume Tailor — Tailored resume using the right variant for this specific role
Cover Letter — Actually specific to this company and this role — not generic
Outreach Writer — Cold outreach draft to the hiring manager
Follow-up Writer — Follow-up email if no response after the right window
Interview Prep — Role-specific questions with suggested answers
Contact Finder — Attempts to identify the hiring manager’s name and direct contact
Each agent does exactly one thing. One input type. One output type. No agent has any knowledge of the others.
This is deliberate. When output is wrong, I know which agent produced it. When I want to improve resume tailoring, I change one thing without touching cover letter generation. When I want to test a new prompt approach, I test it against one agent’s behavior in isolation. Specialization makes everything — debugging, iteration, testing, prompt engineering — tractable.
One thing worth flagging: outreach emails and follow-ups sit in a human approval queue before they go anywhere. The agent drafts. You read, edit if needed, then approve or discard.
This isn’t a gap I’m planning to close by automating further.
An outreach email is a message from a real person to another real person reviewing a real career opportunity. That’s categorically different from generating a tailored resume. The agent should handle the work. The human should make the call on whether to send it and what it says.
Automate the labor. Keep the judgment.
Part 5: Stack Choices and Why
The stack is boring on purpose. When something breaks in a production agent system at 2am, “boring and well-understood” is exactly what you want.
- FastAPI — async Python, type-safe, auto-generated docs I can actually test
- SQLAlchemy + Alembic — mature ORM with migration support; schema evolves without destroying data
- SQLite — zero infrastructure overhead, works perfectly for this use case, one config line from Postgres when needed
- Celery + Redis — industry-tested async task processing, battle-hardened tooling
- LangGraph — the one AI-specific tool in the stack; genuinely useful for defining state machines across multi-step agent pipelines
- Anthropic Claude API directly — raw SDK, no wrappers
On the last point: I use the raw Anthropic SDK across all our agent work at Einstein Labs. Wrappers add abstraction that costs you debugging clarity. When a call fails or produces unexpected output, I want to see exactly what request went out and exactly what came back. Every layer of indirection between me and the API makes production debugging slower. Not worth it.
The interesting deliberate choice: different Claude models for different tasks.
Extraction tasks — parsing a job listing, pulling structured data from a resume — use Sonnet. Fast, economical, more than capable for well-defined extraction with clear schemas.
Synthesis tasks — generating career narrative, building positioning statements, creating the 5 resume variants — use Opus. Slower and more expensive. The quality difference is real and it compounds. Everything downstream of the profile synthesis — every tailored resume, every cover letter, every outreach draft — inherits from that synthesis. Getting it right at the foundation matters.
This is a general principle in agent system design: identify the load-bearing calls and resource them appropriately. The profile synthesis runs once per user setup. The job scorer runs on every job added. Model economics should reflect that asymmetry.
What This Series Covers
Part 2: The Brain — Engineering LLM Integration That Actually Works in Production
The pattern that makes or breaks production LLM integration: structured output via tool-use forcing. The BaseAgent[InputT, OutputT] class. Prompt files as first-class artifacts. Context window management. Why I call different models for different tasks and how I decided which is which.
Part 3: The Swarm — Why 10 Specialized Agents Instead of One
Why a monolithic agent is worse than a swarm of small ones. The design principles behind each agent boundary. The two hardest agents I built (Profile Ingest + Synthesis). The job scoring formula. Human-in-the-loop as intentional design.
Part 4: The Harness — Orchestration, Persistence, and Async Processing
Database design for AI applications — it’s different from regular app databases in ways that matter. LangGraph for workflow state machines. Celery + Redis for background processing. Error handling when LLMs fail in production, which they do.
Part 5: Ship It — Frontend, Deployment, and What I’d Do Differently
Building a UI that makes AI output feel trustworthy. One-command deployment, genuinely no manual steps. Actual token cost numbers. What worked better than expected. What I over-engineered. What I’d do completely differently if I started today.
The full codebase is on GitHub: github.com/abhichat85/jobflow-ai
Every code snippet in this series links to the actual file in the actual repo. Not tutorial code written for the blog. The real thing.
The plan: build this fully, document it in public, then open source it. R gets a working job search agent. Everyone else gets the full architectural blueprint and the code behind it.
One last thing about R.
He’s spent two decades building real things — teams, operations, companies. He’s not behind. He’s just playing an old game on a board that’s been redesigned.
The people who are going to win this job market aren’t the ones who apply to the most jobs. They’re the ones who show up with the right signal for the right role — specific, positioned, clearly relevant — and do it consistently without burning themselves out in the process.
That’s what this agent is designed to do. Give the right people the infrastructure to compete on signal, not volume.
I’ll see you in Part 2. That’s where we get into the engineering.
Building something in the agent space? Disagree with a decision I made here? I’m genuinely interested in both — the agreements are nice, the disagreements are useful.
Top comments (0)