How I orchestrated 6 Gemini agents with FastAPI and AlloyDB to automate job hunting

Risheek Mittal — Thu, 25 Jun 2026 08:22:56 +0000

Most "AI agent" tutorials I see online are glorified single-prompt wrappers. They work for a demo, but they fall apart the second you introduce state, memory, or concurrent execution.

Last week, while pushing 30 commits to my core infrastructure, I realized that building a robust job-hunting engine isn't about the LLM—it’s about the orchestration layer. I built JobHunt.ai using a 6-agent Gemini architecture, and I learned that without a rock-solid database layer like AlloyDB to manage state, your agents are just expensive random number generators.

The Problem: Why Single-Prompt Agents Fail

If you’re building an agent to automate job applications, you aren't just sending a string to an API. You are managing:

Context: What did the previous agent decide?
Persistence: If the process crashes (and it will), can you resume?
Concurrency: Can you parse a resume, scrape a JD, and draft a cover letter simultaneously without blocking the event loop?

In my architecture, I moved away from simple script-based execution to a stateful, asynchronous pipeline.

The Architecture: 6 Agents, 1 Source of Truth

I designed JobHunt.ai around six specialized agents, each triggered by a FastAPI endpoint and coordinated via AlloyDB:

The Scraper: Fetches JD data.
The Resume Parser: Extracts entities via PaddleOCR/Gemini.
The Matching Agent: Scores the fit (Resume vs. JD).
The Tailoring Agent: Rewrites the bullet points.
The Cover Letter Agent: Generates the pitch.
The Validation Agent: Performs a final "sanity check" against the JD requirements.

The Stack

Engine: FastAPI (Asynchronous execution is non-negotiable).
Brain: Google Gemini (via Google ADK).
Database: AlloyDB (PostgreSQL-compatible, enterprise-grade handling of JSONB for agent states).
Orchestration: Python asyncio + n8n for workflow triggers.

Implementation: Managing State in AlloyDB

The secret sauce is how I handle agent handoffs. I don't pass massive objects in memory. I write the "Agent State" to AlloyDB and pass the transaction_id to the next agent.

Here is how I structure the state transition in my FastAPI service:

from fastapi import FastAPI, BackgroundTasks
from sqlalchemy.ext.asyncio import AsyncSession
from models import AgentState

app = FastAPI()

async def run_agent_workflow(job_id: str, db: AsyncSession):
    # Fetch current state from AlloyDB
    state = await db.execute(select(AgentState).where(AgentState.job_id == job_id))

    # Execute Gemini Agent
    response = await gemini_client.generate_content(
        f"Analyze this JD: {state.jd_data}",
        model="gemini-3.5-flash"
    )

    # Update state atomically
    state.status = "TAILORING_COMPLETE"
    state.result = response.text
    await db.commit()

By using AlloyDB, I get the performance of PostgreSQL with the ability to store complex agent logs as JSONB. This allows me to query exactly where an agent failed—crucial when you're debugging 6 concurrent agents.

The Gotchas

The "Hallucination Buffer": Even with Gemini 3.5, agents get creative. I implemented a strict schema validation step using Pydantic models. If the agent returns a JSON that doesn't match the schema, the pipeline halts immediately.
Concurrency Limits: Running 6 agents at once will hit your Google Cloud rate limits faster than you think. I implemented a simple semaphore system in FastAPI to queue requests.
Webpack/Frontend headaches: While building the switchwithai-frontend this week, I ran into a massive js-yaml webpack error. The fix was adding a buffer polyfill. If you’re building AI tools, never underestimate the "glue code" required to make the frontend talk to your Python backend.

Why AlloyDB over standard Postgres?

I chose AlloyDB because of its integration with the Google ecosystem. Since I’m already using Google ADK and Gemini, having the database managed within the same VPC reduces latency between my FastAPI instances and the data store. When you are processing high-volume job data, that 20ms-50ms latency saving adds up across hundreds of API calls.

Moving Forward

My current focus is on closing the loop. I’m currently refining the OCR Pipeline (using PaddleOCR v5) to handle document scanning for the job hunt engine, ensuring the resume ingestion is as accurate as the generation.

This week, I also pushed updates to my rishh-website repo—specifically cleaning up service account debugging—because managing credentials across 6 agents is a nightmare if you don't have a clean workflow.

The takeaway: Don't just build an agent. Build a system that tracks what the agent did. If you don't have a database layer that acts as the "brain's memory," you aren't building an AI application; you're building a prototype.

What's your approach to managing agent state in production? Are you using a dedicated state machine or just relying on database rows? Drop it in the comments.

DEV Community: Risheek Mittal