TL;DR: I built FormPilot, a tool that analyzes any form screenshot and provides field-by-field fill instructions with suggested values — powered by Gemini Vision on Cloud Run.
The Problem
Everyone fills out forms. Government applications, insurance claims, tax documents, HR onboarding — these forms are often confusing, with unclear labels, hidden requirements, and fine print. Small business owners spend 45+ minutes per form, googling terms and calling helplines. One wrong field can delay processing by weeks.
What FormPilot Does
Upload a screenshot of any form, describe your situation in plain English, and FormPilot:
- Detects every field in the form (text inputs, checkboxes, dropdowns, radio buttons)
- Generates fill instructions for each field based on your context
- Suggests values you should enter
- Warns about common mistakes (required fields, format requirements, legal implications)
- Shows field positions as numbered markers overlaid on your form image
How It Works
The pipeline is straightforward:
- User uploads a form screenshot (PNG, JPG, WebP, up to 10MB)
- User optionally describes their situation ("I'm a sole trader, earned $75K, single, no dependents")
- Gemini Vision analyzes the screenshot + context
- Returns structured field-by-field analysis
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
types.Part.from_image(image_data),
types.Part.from_text(analysis_prompt),
],
)
The prompt instructs Gemini to return structured JSON with:
{
"fields": [
{
"field_name": "ABN (Australian Business Number)",
"field_type": "text",
"suggested_value": "12 345 678 901",
"instructions": "Enter your 11-digit ABN. Find it at abr.business.gov.au",
"warning": "Must match your registered business name exactly",
"position": {"x": 30, "y": 15}
}
],
"summary": "This is a BAS (Business Activity Statement) form..."
}
Architecture
Browser (Next.js) Google Cloud
│ ┌──────────────────┐
├─ Upload screenshot │ Cloud Run │
│ + context text │ (FastAPI) │
│ → POST /api/analyze ─────► │ │ │
│ │ Gemini 2.5 Flash│
│ │ Vision API │
│ │ │ │
│ ◄── field analysis ◄────── │ Field detection │
│ │ + suggestions │
├─ View annotated form │ + warnings │
│ (numbered markers) │ │ │
├─ Step-by-step checklist │ SQLite DB │
└─ Analysis history │ Uploads dir │
└──────────────────┘
Frontend Features
- Drag-and-drop upload with image preview
- Position overlays — numbered markers on the form image showing where each field is
- Step-by-step checklist — check off fields as you fill them, with completion tracking
- Analysis history — browse previous analyses with thumbnails
- Warning highlights — fields with potential issues are flagged
Google Cloud Services
| Service | Purpose |
|---|---|
| Cloud Run | Backend hosting (auto-scaling, serverless) |
| Cloud Build | Container image building |
| Secret Manager | API key storage |
| Generative Language API | Gemini Vision form analysis |
Infrastructure as Code
One script deploys everything:
export GOOGLE_API_KEY="your-key"
export GOOGLE_CLOUD_PROJECT="your-project-id"
./deploy.sh
Automates: GCP API enablement, Secret Manager secret creation, container build, Cloud Run deploy, and optional Vercel frontend deployment.
Mock Mode
Without a Gemini API key, FormPilot returns realistic mock analysis data — the full UI works for development. The UI labels mock results as "Sample data (no API key)."
Get a free Gemini API key at https://aistudio.google.com/apikey to unlock real analysis.
Try It
- GitHub: https://github.com/astraedus/formpilot
- Live Demo: https://formpilot-api-93135657352.us-central1.run.app
This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon (#GeminiLiveAgentChallenge). The project demonstrates AI-powered form navigation using Gemini Vision using Google AI models and Google Cloud infrastructure.
Built with Gemini 2.5 Flash Vision API, FastAPI, Next.js, and Cloud Run.
If you're building AI agents for production, check out my book Production AI Agents on Amazon Kindle. It covers architecture patterns, tool design, multi-agent coordination, and deployment strategies.
Top comments (0)