DEV Community

Ahmad Garba Adamu
Ahmad Garba Adamu

Posted on

ClearForm — AI Form & Document Helper for Low-Literacy Users

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

ClearForm is an offline-capable Progressive Web App (PWA) designed to help individuals with low literacy navigate official forms and complex legal contracts using natural voice interaction, plain language, and real-time guidance—powered entirely by Gemma 4.

The Problem

Millions of people struggle with rental applications, medical intake forms, utility sign-ups, and dense terms & conditions. Traditional solutions rely on rigid OCR tools or heavy, cloud-dependent software that fails on older hardware or spotty mobile connections.

Our Solution

ClearForm acts as a compassionate, local digital assistant. It breaks down complex documents into a one-question-at-a-time conversational interface, reads text aloud, accepts voice inputs, and instantly translates dense legal jargon into language a 10-year-old can easily understand.

🔗 Live Link: https://formhelper-ten.vercel.app

🔗 Source Code: https://github.com/rufatronics/formhelper

Demo

Watch the walkthrough to see the app perform real-time form field extraction and natural language document comparisons.

How I Used Gemma 4

ClearForm doesn't just treat AI as a wrapper; Gemma 4 is baked directly into the architectural pipeline of the application across multiple modalities.

🧠 Strategic Model Selection: gemma-4-26b-a4b-it (MoE)

For a real-time accessibility app, high latency breaks user trust immediately. We chose the Mixture-of-Experts (MoE) architecture because it selectively activates a fraction of its total parameters per token. This gives us near-31B reasoning capabilities with the snappy, low-latency performance required to power conversational voice loops on standard mobile networks.

👁️ Native Vision vs. Rigid OCR

Instead of forcing users to rely on fragile client-side OCR engines that fail on handwritten text or poorly lit smartphone photos, paper form uploads are passed directly as inline_data to Gemma 4. The model natively parses the unstructured visual data, maps the form fields, and translates them into an interactive schema.

💭 Deep Document Reasoning with Thinking Mode

When analyzing complex documents like Terms & Conditions, the app utilizes Gemma 4’s thinkingConfig with a strict 512-token budget. This allows the model to process a multi-step internal monologue to catch hidden clauses or predatory conditions before compiling a structured JSON diff for the UI.

⚡ Technical Implementation Highlights

  • Streaming Responses (SSE): Chat responses stream token-by-token. On fluctuating 3G/4G connections, this ensures the app feels immediate and alive rather than stalled.
  • Strict JSON Structuring: Form fields extraction and structural breakdowns enforce a low temperature (0.1) coupled with strict JSON schemas embedded in the system prompt to prevent UI breaking or structural drift.

json
// Example of the clean JSON schema generated by Gemma 4 from a raw form photo:
{
  "field_name": "Full Name",
  "field_type": "text",
  "conversational_prompt": "What is your full name as it appears on your ID?",
  "required": true
}
Technical Stack
Frontend: React 18 + Vite
Styling & Typography: Tailwind CSS (Featuring Syne and Instrument Sans for high accessibility readability scores)
AI Orchestration: gemma-4-26b-a4b-it via OpenRouter (Primary) + Google AI Studio (Failover)
Voice & Audio Processing: Web Speech API (Client-side speech-to-text) + SpeechSynthesis API (Text-to-speech)
Local Storage & Service Workers: IndexedDB (handling multi-megabyte document stores bypassing localStorage limits) + vite-plugin-pwa (Workbox) for offline resiliency.
Challenges and What I Learned
1. Beating the Vercel Serverless Timeout
The Issue: Google AI Studio's free-tier rate limits occasionally caused response lags that breached Vercel’s 10-second hobby-tier function execution limit.
The 'Street Smart' Fix: Implemented a resilient, dual-routing setup. OpenRouter serves as the primary gateway due to its global edge routing optimization, paired with an automated, silent client-side fallback directly to Google AI Studio if a request hangs. A live visual badge in the header ensures complete system transparency.
2. Taming the Internal Monologue Leaks
The Issue: During complex reasoning tasks, Gemma 4 would occasionally leak its internal thinking blocks directly into the conversational text stream, confusing the user interface.
The Fix: Configured precise response filtering to programmatically strip parts tagged with thought: true on the backend API layer while maintaining a strict meta-commentary ban in the system instructions.
3. Progressive PWA Installs across Operating Systems
The Issue: PWA installation mechanics vary wildly between platforms (beforeinstallprompt on Android vs. manual Safari execution on iOS).
The Fix: Built an intelligent platform detection modal. If a user is on iOS, the native "Install App" action transforms dynamically into a step-by-step visual overlay directing them exactly how to use Safari's "Add to Home Screen" mechanism.
Conclusion
Building ClearForm proved that Gemma 4's native multimodal capabilities fundamentally disrupt standard software pipelines. Eliminating heavy OCR libraries, pre-processing servers, and rigid fixed templates in favor of a single, highly flexible, resource-efficient open model opens up unprecedented possibilities for building accessible, localized software.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)