This blog post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
The Problem
I kept missing job opportunities — not because I wasn't qualified, but because I couldn't bring myself to fill out another application on my phone. Tiny fields, endless scrolling, and those dreaded open-ended questions: "Why do you want this role?" I'd open the form on the subway, stare at it, and close the tab.
I built Clara to fix this.
What is Clara?
Clara is a mobile-first AI form-filling companion. Snap a screenshot, upload a PDF, or paste a URL — Clara reads any form and guides you through filling it with a simple conversation.
How I Built It with Google AI and Google Cloud
Gemini Vision for Form Understanding
The core magic is Gemini 2.5-flash with Vision. When you upload a form, Clara sends it to Gemini Vision, which extracts:
- Every field label
- Field types (text, dropdown, checkbox)
- Bounding box coordinates for each field
This lets Clara understand any form — job applications, medical intake, government paperwork — without pre-built templates.
3-Layer Smart Prefill
Clara doesn't just read forms — it fills them intelligently using a 3-layer matching system:
- Learned aliases: If you previously confirmed "Mobile Phone" maps to your phone number, Clara remembers.
- Gemini semantic matching: AI matches new fields to your profile with confidence scores.
- Keyword fallback: Static synonym matching as a safety net.
Open-Ended Answer Coaching
The hardest part of forms? Those open-ended questions. Clara uses Gemini to draft personalized answers based on your profile and the specific role, then lets you approve or edit.
Voice I/O with Gemini TTS
Clara speaks. Using Gemini TTS, Clara reads questions aloud and supports barge-in — interrupt anytime by typing or tapping.
Google Cloud Infrastructure
- Cloud Run: Serverless deployment with auto-scaling
- Firestore: Stores user profiles, sessions, and form progress
- Cloud Storage: Holds uploaded resumes, cover letters, and generated documents
Architecture Overview
Browser SPA (Vanilla JS)
↓ HTTP REST
Flask Backend (~3000 lines)
↓
┌─────────────┬─────────────┬─────────────┐
│ Gemini API │ Firestore │ Cloud │
│ Vision+TTS │ Profiles │ Storage │
└─────────────┴─────────────┴─────────────┘
Challenges
- Field matching ambiguity: Forms label fields inconsistently. The 3-layer system solved this.
- Mobile-first UX: Making a chat interface feel natural for complex forms took iteration.
- PDF annotation: Rendering answers as overlays using Gemini's bounding box coordinates.
What's Next
- Browser extension for direct website form-filling
- Clara Profile API for third-party integrations
- Expanded document generation (tax forms, visa applications)
Try It
Clara is live and deployed on Google Cloud Run. Any form, anywhere, from your phone.
This post was created for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge
Top comments (0)