I Built a Voice Widget That Fills Any HTML Form — Here's How
Filling forms on mobile is still broken.
15 fields. Tiny keyboard. Autocomplete suggesting my ex's address. Date pickers that scroll to 1923.
So I built a widget that lets users speak — and maps that input into any HTML form automatically.
Baymard Institute estimates $260B in recoverable sales lost annually to poor checkout UX. Forms are a big part of that problem.
Picture this:
- A patient filling a 20-field intake form on a tablet — with people waiting behind them in line.
- Someone buying car insurance on their phone in the rain.
Maybe I'm not the only one who hates forms.
What I Built
TypelessForm — an npm package that adds voice input to any existing HTML form. It's a web component built with Lit. You add one custom element to your page, a mic button appears, users speak, AI fills the fields.
No form redesign. No backend changes. React, Vue, Angular, WordPress, plain HTML — doesn't matter.
Not sure if it's genuinely useful or just a cool gimmick. If it works, it might make filling long forms on mobile less painful. That's what I'm testing.
How It Works Under the Hood
The pipeline:
1. Audio capture. Browser's MediaRecorder API captures audio when the user clicks the mic button. Standard Web API — no plugins, no downloads. Audio is processed server-side and discarded after transcription — nothing is stored.
2. Speech-to-text. Audio is sent to OpenAI Whisper for transcription. Supports 25+ languages out of the box. User can speak in English, Spanish, German, Japanese — same widget.
3. Field mapping. This is where it gets interesting. The widget scans the page and builds a schema of all form fields — reading labels, placeholders, input types, name attributes, and nearby text. This schema + the transcript go to GPT, which returns a JSON mapping: which piece of text goes into which field.
Example: user says "My name is John Smith, email john@example.com, I need a room for two nights starting March 15th."
GPT receives:
{
"transcript": "My name is John Smith, email john@example.com, I need a room for two nights starting March 15th",
"fields": [
{"id": "first_name", "label": "First Name", "type": "text"},
{"id": "last_name", "label": "Last Name", "type": "text"},
{"id": "email", "label": "Email", "type": "email"},
{"id": "nights", "label": "Number of Nights", "type": "number"},
{"id": "checkin", "label": "Check-in Date", "type": "text"}
]
}
And returns:
{
"first_name": "John",
"last_name": "Smith",
"email": "john@example.com",
"nights": "2",
"checkin": "March 15"
}
For fields like "full name", the model splits values into first/last based on context.
4. DOM injection. The widget sets input values and dispatches input, change, and blur events so that frameworks (React, Angular, Vue) pick up the changes correctly. This was one of the trickier parts — React's synthetic event system doesn't respond to simple .value = assignments.
// Simplified version of how we trigger framework-compatible updates
const nativeInputValueSetter = Object.getOwnPropertyDescriptor(
window.HTMLInputElement.prototype, 'value'
).set;
nativeInputValueSetter.call(input, newValue);
input.dispatchEvent(new Event('input', { bubbles: true }));
input.dispatchEvent(new Event('change', { bubbles: true }));
5. Confidence & fallback. The model returns confidence signals per field. If confidence is low, the field is highlighted for review instead of auto-filled silently. Users see exactly which fields were filled confidently and which ones need a manual check — before anything gets submitted.
Security: Sensitive fields are auto-excluded. The widget detects fields with type="password" or labels containing "card number", "CVV", "SSN", "social security" — and never sends them to any API. This happens client-side before any data leaves the browser.
Cost: Each fill triggers 2 API calls (Whisper + GPT). Cost is roughly a few cents per form depending on speech length and model.
Live Demo
You can try it right now at typelessform.com — no signup needed. 5 demo forms in 4 languages:
- Contact form (6 fields)
- Hotel booking (10 fields)
- Dental appointment (15+ fields — the real stress test)
- Product review
- Registration
Click the mic button in the bottom-right corner, speak naturally, and watch.
Try It Yourself
npm install typelessform-widget
Then in your HTML:
<script type="module">
import 'typelessform-widget';
</script>
<typeless-form api-key="YOUR_KEY"></typeless-form>
That's it. The widget auto-detects all text inputs and textareas on the page.
Get a free API key (200 voice fills, no credit card) at webappski.com/en/portal.
The widget is ~50KB gzipped. It loads asynchronously and doesn't block your page render.
Honest Limitations
Text fields only. No selects, checkboxes, radio buttons, file uploads. The widget fills <input type="text">, <input type="email">, <input type="tel">, <textarea>, and similar. Dropdowns and toggles are faster to tap than to dictate anyway.
Web only. Mobile browsers — yes. Native iOS/Android apps — no. This is a web component that lives in the DOM.
Browser inconsistency. MediaRecorder API and microphone permissions vary across browsers. Safari on iOS has historically been problematic with audio capture. Chrome and Firefox on both desktop and mobile work reliably.
Latency. The full pipeline (audio → Whisper → GPT → DOM) takes roughly 3-5 seconds depending on the length of speech and number of fields. Not instant. But the user speaks everything at once — even a 15-field dental form is one round trip. Still faster than typing 15 fields on a phone, but the wait is noticeable.
It might just be a gimmick. I genuinely don't know yet if this solves a real problem at scale or just looks cool in demos. That's why I'm testing it on real sites with real users. But if the core idea is right — that voice is a better input method for forms on mobile — then accuracy and latency are engineering problems, not fundamental flaws. That's the bet.
Architecture Decisions & Trade-offs
Why Lit? Lightweight (~5KB), framework-agnostic web components. The widget needs to work on any site — React, Angular, Vue, WordPress, Shopify, plain HTML. A React component wouldn't help a WordPress site. A web component works everywhere.
Why not Web Speech API for transcription? Browser's built-in speech recognition is free but inconsistent across browsers, doesn't support all languages equally, and has no reliable way to handle long-form dictation. Whisper gives consistent results across 25+ languages with better accuracy on natural speech patterns like email addresses, phone numbers, and mixed-language input.
Why server-side processing? The alternative would be running Whisper locally in the browser (via WASM or WebGPU). We explored this — model size (~75MB for tiny, ~500MB for base) makes initial load impractical for a widget that should be invisible until needed. Server-side also lets us upgrade models without requiring users to update anything.
Why GPT for field mapping instead of rule-based matching? Early prototypes used regex + fuzzy matching. It worked for English forms with clear labels. It broke on forms with ambiguous labels, multiple languages, or fields like "Additional notes" where the content could map to several fields. GPT handles these cases surprisingly well because it understands context, not just keywords.
What's Next
Looking for 5 websites with long forms to test this in production. Free pilot — I'll help integrate and measure completion rates before/after. Only 5 — I want to work closely with each one.
If interested — demo at typelessform.com or npm install typelessform-widget.
I'm also running a public challenge: $0 to first paying customers in 60 days, no ad budget. Every Tuesday I publish real numbers on LinkedIn — traffic, signups, revenue (or lack of it). No filters.
Would love feedback — especially around browser compatibility and the field-mapping approach. What breaks? What's annoying? What would you do differently? That's the useful stuff.
TypelessForm is built by Webappski. The npm package is typelessform-widget. Demo: typelessform.com.
Top comments (1)
Happy to answer any technical questions — especially around field mapping and browser quirks. Curious what you'd break first.