Rahul Reddy Talatala

Posted on Mar 6

I Got Tired of Filling Out the Same Form 50 Times, So I Built an AI to Do It

#agents #ai #automation #python

Every time I applied for a job, I faced the same ritual. Open the application form. Type my full name. Type my email. Paste my LinkedIn URL. Type my phone number. Select my country from a dropdown. Answer "Are you authorized to work in the US?" for the fifteenth time that week.

The entire process takes about ten minutes per application, and roughly eight of those minutes are spent on fields I have answered hundreds of times before. The two minutes that actually matter, the cover letter, the portfolio link, the thoughtful answers to specific questions, get squeezed into whatever mental energy I have left.

I am a GenAI engineer. I spend my days building systems that make computers do repetitive cognitive work. The irony of manually typing my zip code into yet another Greenhouse form at midnight was not lost on me.

So I built ApplyAI: a Chrome extension that reads a job application form, sends the fields to an AI agent, gets back a fill plan, and applies it to the page in under ten seconds.

🚀 Try it on the Chrome Web Store | 🌐 Web App | 🔗 Github

🎥 Full product demo

This post walks through how it works, the actual architecture, the hard problems, and the design decisions that shaped the final system.

The Stack

Chrome Extension: Vanilla JS with a React popup (Vite + Tailwind)
Backend: FastAPI (Python)
AI Agent: LangGraph StateGraph with Gemini 2.5 Flash
Database and Auth: Supabase (PostgreSQL + Auth + Storage)
Frontend Dashboard: Next.js 14 with TypeScript

Three components. One clear job each. The extension is the hands. The backend is the brain. The frontend is the face.

The Architecture

Here is the data flow for a single autofill run:

User navigates to a job application form (Lever, Ashby, Greenhouse, or any careers page)
The Chrome extension extracts every form field from the live DOM, including dropdown options
The extension sends those fields plus the raw DOM HTML to the FastAPI backend
The backend runs a LangGraph agent that calls Gemini 2.5 Flash with the user's profile, resume, and the extracted job description
Gemini returns a structured JSON answer for every field
The backend assembles a fill plan and sends it back to the extension
The extension applies each value to the correct form element using CSS selectors

The key insight that made the whole thing feasible: the browser is the only environment that can see a fully rendered React form. A server-side scraper sees the HTML skeleton. A real browser running JavaScript sees the actual dropdown options, the dynamic field states, and the ARIA attributes that React Select generates at runtime. So the extraction has to happen inside the browser, and the AI reasoning has to happen on the server where I have access to the user's data.

Part 1: Getting Form Fields Out of a Live Page

This turned out to be the hardest non-AI problem in the whole project.

Job application forms are not plain HTML. They are React components. The dropdowns are typically React Select, which renders options into a floating portal that only exists in the DOM when the dropdown is actually open. If you scrape the page while all dropdowns are closed, you get combobox elements with no options. The AI has no idea what values are valid.

My solution: open every dropdown programmatically before scraping anything.

The extension injects a script into the active tab using chrome.scripting.executeScript. Before touching a single field, it finds every [role="combobox"] element and simulates a real user opening it by dispatching mouse and keyboard events in sequence, then waits 300ms for React to render the options into the DOM.

combobox.dispatchEvent(new MouseEvent('mousedown', { bubbles: true }));
combobox.dispatchEvent(new MouseEvent('click', { bubbles: true }));
combobox.dispatchEvent(new KeyboardEvent('keydown', { key: 'ArrowDown', bubbles: true }));

await sleep(300); // wait for React to paint the listbox

The 300ms wait is not arbitrary. I tested against Lever, Ashby, and Greenhouse. At 200ms the options were missing about 30% of the time. At 300ms the failure rate dropped to near zero.

Once the dropdown is open, options are extracted from the ARIA-controlled listbox. Each field is then serialized into a structured object containing its type, label, CSS selector, required status, and available options. That object is what gets sent to the backend.

The CSS selector is computed at extraction time using the element's id or name attribute. This same selector is used later during the apply step to locate the exact element on the page, so precision matters here.

Label detection uses multiple fallback strategies in order: the for attribute on an associated <label>, a parent <label> element, aria-label, and finally placeholder. When none of those exist, the field name or id becomes the label.

Part 2: The LangGraph Autofill Agent

Once the extension sends the extracted fields to the backend, a LangGraph StateGraph takes over. The autofill pipeline has four clearly separated concerns and a DAG maps to that structure naturally.

START -> initialize -> extract_form_fields -> generate_answers -> assemble_autofill_plan -> END

Each node receives the full shared state, does one job, returns its updates, and passes control to the next node.

Node 1: Initialize

Sets the run ID, page URL, and initial empty collections. Straightforward bookkeeping.

Node 2: Extract Form Fields

Converts the JavaScript field objects from the extension into typed Python FormField dictionaries. This handles type mapping (React Select comboboxes become select type), deduplication by field signature, and one non-obvious enrichment.

If a select field has "country", "nationality", or "citizenship" in its label and has zero extracted options, the backend automatically injects the full list of 196 standard country names. This is a safety net for the cases where the browser-side dropdown opening fails. Some Greenhouse forms use a custom country component that does not respond to standard mouse events. The backend catches this gap and fills in the options so the LLM still has something to work with.

Node 3: Generate Answers

This is where Gemini does the work.

I spent a lot of time on the prompt design. The format that worked best is structured JSON as the prompt body rather than prose. The task description, rules, context, and output format are all keys in a JSON object. This consistently outperformed plain English paragraphs for precision tasks.

prompt_obj = {
    "task": f"Generate answers for ALL {len(fields_spec)} form fields.",
    "critical_rules": [
        "MANDATORY: Set action='autofill' for ALL fields. Never use 'skip' or 'suggest'.",
        "If you don't know an answer, use action='autofill' with value='' and low confidence.",
    ],
    "context": {
        "user_ctx": user_ctx,    # profile fields
        "job_ctx": job_ctx,      # extracted job description
        "resume_ctx": resume_ctx # parsed resume data
    },
    "form_fields": fields_spec,
}

The model treats JSON keys as hard constraints, not suggestions. Prose prompts produced more hedging and more "I cannot fill this" responses. JSON prompts produced precise, consistent output.

Gemini is called with response_mime_type: "application/json" and a response_json_schema derived directly from a Pydantic model:

response = llm.client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt,
    config={
        "response_mime_type": "application/json",
        "response_json_schema": LLMAnswersResponse.model_json_schema(),
    },
)

This eliminates an entire class of failure modes. No regex extraction. No JSON fence stripping. No trying to parse Gemini's explanation text alongside the output. The response is always valid JSON that maps directly to the Pydantic model.

The aggressive autofill rule

Early versions would skip EEO fields ("Race", "Veteran Status") and optional demographics because the LLM would flag them as sensitive. Users found this confusing: they had no idea why some fields were left blank.

The fix was a two-layer no-skip guarantee. The prompt says action='autofill' is mandatory. Then a _normalize_answer() post-processing step converts any skip or suggest action to autofill at plan assembly time, even if the LLM ignored the instruction. If the LLM has no idea what value to use, it returns an empty string with low confidence. The field gets pre-filled with nothing and the user can fix it manually. This is a much better experience than leaving fields blank silently.

Option matching

The LLM might return "United States" when the dropdown says "USA". Or it might get the case right but add extra whitespace. After receiving the LLM response, each answer for a select, radio, or checkbox field goes through a normalizer that strips both the LLM value and each available option down to lowercase alphanumeric characters, then tries an exact match first and falls back to substring containment. This handles most real-world mismatches without any hardcoded synonym maps.

File inputs are handled separately, outside the LLM entirely

File upload fields are detected by input_type == "file". The code checks the label for "cover letter" keywords. If matched, the field is skipped because no one should submit an AI-generated cover letter without reviewing it. Everything else gets value: "resume" with maximum confidence. No LLM call needed.

Node 4: Assemble Autofill Plan

Combines form_fields and answers into the final plan structure, generates a summary with field counts, and writes the completed plan to the database. The plan contains each field's selector, value, action, and confidence score.

Part 3: Applying the Plan Back in the Browser

The backend returns the plan. Now the extension has to actually fill the form.

Another injected script iterates through every field in the plan and uses a different fill strategy based on input type:

Text and textarea: Set value through React's native property setter to trigger synthetic events, then dispatch input and change events so the React component knows the value changed.
Native <select>: Find the matching option by text content. Set selectedIndex. Fire a change event.
React Select (combobox): Type the value into the input, wait for the listbox, find the matching option, click it.
Radio and checkbox groups: Find the label whose text matches the target value and click it.
File inputs: This one deserves its own explanation.

The browser's security model prevents setting input.value on a file input for obvious reasons. The only legitimate way to attach a file programmatically is through the DataTransfer API:

async function fillFileInput(el, fileUrl) {
    const response = await fetch(fileUrl);
    const arrayBuffer = await response.arrayBuffer();
    const file = new File([arrayBuffer], "resume.pdf", { type: "application/pdf" });

    const dataTransfer = new DataTransfer();
    dataTransfer.items.add(file);
    el.files = dataTransfer.files;
    el.dispatchEvent(new Event('change', { bubbles: true }));
}

The fileUrl is a Supabase Storage signed URL generated by the backend at plan time with a one-hour TTL. The extension fetches the user's resume as an ArrayBuffer, wraps it in a File object, and attaches it to the input through the DataTransfer API. This works reliably across Greenhouse, Lever, and Ashby.

Part 4: The Authentication Design

The system runs two completely separate auth systems side by side.

The web frontend uses Supabase's standard JWT (email/password and Google OAuth). The token lives in a cookie and is sent as a Bearer token on API calls.

The Chrome extension cannot participate in cookie-based sessions. It runs in a sandboxed context with no access to the frontend's cookies. So I built a separate auth flow using custom JWTs.

The connection process works like a one-time code exchange. When the user clicks "Connect" in the popup, the frontend generates a 32-character urlsafe random code, stores its SHA-256 hash in the database with a 10-minute expiry, and sends the plaintext code to the extension via window.postMessage(). The extension exchanges this code at POST /extension/connect/exchange for a 7-day JWT signed with a custom secret.

The JWT carries an audience claim set to applyai-extension. The backend validates this on every extension endpoint, so the extension token cannot be replayed against any other part of the API. It also carries an install_id (a UUID stored in chrome.storage.local) for device-level tracking without building a separate device registry.

Part 5: Three Hard Problems I Did Not Expect

1. Chrome extension popups throttle CSS animations

I built the popup with loading spinners and skeleton loaders. They worked fine during development in the browser. Once I loaded the extension and opened the actual popup, every animation was frozen solid.

Chrome throttles JavaScript execution in extension popup contexts to save battery. Tailwind's default animate-spin and animate-pulse classes get paused by this throttling.

The fix is to redefine the keyframes explicitly in CSS and force the play state:

@keyframes spin {
  to { transform: rotate(360deg); }
}

.animate-spin {
  animation: spin 0.85s linear infinite;
  animation-play-state: running !important;
}

The !important on animation-play-state overrides whatever Chrome tries to set. Every animation class in the popup needs this treatment.

2. DOM hashing breaks plan caching

The first version of plan caching computed a SHA-256 hash of the full DOM HTML and used it as part of the cache key. If the hash matched a previous run, return the cached plan.

This broke constantly. Every Greenhouse page load includes a fresh CSRF token embedded in the HTML. Every page view produces a different hash even for the exact same form. The cache was useless.

The fix: cache by job_application_id + normalized_page_url. The DOM hash is still stored in the database but ignored for cache lookups. The same form at the same URL always returns the same cached plan regardless of what changed in the page source between visits.

3. Lever and Ashby split one job across two URLs

On Lever, the job description lives at jobs.lever.co/company/slug. The application form lives at jobs.lever.co/company/slug/apply. These look like two different pages but they represent one job.

If a user extracts the job on the description page and then navigates to the apply page, the backend needs to match the apply URL back to the original job record. The extract_job_url_info() utility detects the /apply or /application suffix and strips it to get the canonical base URL before doing any database lookups.

On the extension side, when the popup detects the user is on a job description page with a saved job, it shows an amber banner telling them to navigate to the application form. For Lever and Ashby, the banner includes a direct link constructed entirely client-side by appending /apply or /application to the current URL. No extra backend call needed.

Part 6: What Is Coming Next

Streaming the autofill plan

Right now the user sees a spinner for 5 to 10 seconds while the full plan generates. Gemini supports streaming and LangGraph supports streaming node outputs. The plan is to emit node completion events to the extension popup as each stage finishes, so users see live progress instead of one long blocking wait.

A retry node in the LangGraph DAG

On very long forms, Gemini occasionally returns fewer answers than there are fields. The current system fills missing fields with empty-string defaults. A better approach is a conditional retry edge in the graph: if the answer count is less than the field count, route back to generate_answers with only the missing field signatures. LangGraph's conditional edges make this a clean addition.

Semantic resume matching

The current resume match feature does a keyword overlap check between the job's required skills and the user's resume skills list. This misses "REST APIs" matching "API development" or "React.js" matching "React". The plan is to replace this with embedding-based similarity using a small vector store per user, which gives a much more accurate match score and surfaces genuinely missing skills rather than just unmatched strings.

Playwright-based extraction fallback

The browser-side dropdown opening works on about 90% of forms. The remaining 10% use heavy custom focus trapping, animated dropdowns with delays over 300ms, or React Select versions that ignore standard mouse events. A Playwright-powered extraction service on the backend would handle these edge cases by running a full headless browser with complete control over the page lifecycle, without requiring any changes to the extension.

Closing Thoughts

The most interesting thing I learned building this is that the Chrome extension is not just a UI layer. It is the only component in the system with access to the live, JavaScript-executed version of the page. That makes it the source of truth for form structure. The AI cannot do its job without what the browser extracts first.

The second thing: structured JSON prompts beat prose prompts for precision tasks. When I need Gemini to return exactly N answers with specific fields and constrained action values, a JSON prompt with rules expressed as an array performs better than a paragraph of instructions. The model treats it like a spec, not a suggestion.

The combination of LangGraph for agent orchestration, Gemini's native JSON schema output mode, and the DataTransfer API for file uploads turned out to be a surprisingly complete toolkit for this problem.

If you are building something similar or have questions about any part of the architecture, drop a comment below.

Built with ❤️ by Rahul

LinkedIn | GitHub | Portfolio

DEV Community