Building a Voice-First Workspace: What We Learned Shipping VoiceTables

#startup #webdev #ai #buildinpublic

Most developer tools start with a UI mockup. Ours started with a microphone.

We just launched VoiceTables -- a workspace where you describe what you need and the AI builds it. Say "I need a client tracker with status and revenue" and you get a structured table with the right columns, sample data, and connected docs. No forms, no drag-and-drop, no 20 minutes of manual setup.

Here's what we learned building it.

The core problem: voice is messy, tables are not

The hardest part wasn't the speech-to-text. That's basically a solved problem in 2026. The real challenge was turning fuzzy human language into clean, structured data.

When someone says "track my invoices," they don't specify column types, data formats, or relationships. They just... say it. And they expect the result to make sense.

So the actual engineering challenge is the middle layer: understanding intent, inferring schema, generating realistic sample data, and presenting it all in under 60 seconds.

How the pipeline works

At a high level:

Voice capture -- browser-native Web Speech API for real-time transcription. We tried a few third-party APIs early on but the latency was killing the experience. Native browser speech recognition is faster and good enough for workspace commands.
Intent parsing -- the transcribed text goes through an LLM layer that extracts the workspace type, expected fields, and data relationships. This is where most of the prompt engineering lives. We tested dozens of system prompts before landing on one that reliably generates sensible schemas from vague descriptions.
Schema generation -- the parsed intent becomes a typed schema (column names, types, constraints). We generate sample data that actually looks real, not "lorem ipsum" rows but things like "Acme Design, $2,400, Paid."
Real-time rendering -- the table appears progressively as the AI generates it. No loading spinner. You see the columns form, then the rows fill in. It feels like someone is building it in front of you.

Conversational refinement was the surprise hit

We built the initial "voice to table" flow and thought that was the product. But during testing, people kept talking to it. "Add a priority column." "Filter by overdue items." "Summarize my revenue."

So we leaned into it. VoiceTables now supports ongoing conversation with your workspace. You can restructure, filter, add docs, ask questions about your data, all without touching a menu.

The technical bit here: we maintain a conversation context that includes the current schema state. Each new command is interpreted relative to what already exists. It's basically a stateful agent loop where the workspace IS the memory.

Three-in-one was intentional

We ship tables, docs, and AI chat in one workspace. Not because "more features = better" but because in practice, data and context always live together.

When a freelancer tracks invoices, they also need a note about payment terms. When a team lead sets up a sprint board, they need a doc for the retrospective. Forcing people into three separate tools (Notion for docs, Airtable for tables, ChatGPT for questions) is a workflow tax nobody wants to pay.

What's next

We're in beta right now. Free tier, $19/mo Plus plan, custom Enterprise. No credit card needed to start.

The honest truth: we don't know yet which use cases will stick. Freelancers tracking clients? Small teams running projects? Field workers logging inventory by voice? We built it broad on purpose and we're watching what people actually do with it.

If you want to try it: voicetables.com

I'm Jakub, building VoiceTables and a portfolio of AI products at Inithouse. Other things we've shipped recently: Be Recommended (check if AI chatbots recommend your brand), Audit Vibecoding (security audit for AI-generated code), and Watching Agents (AI prediction platform). Follow along for more build logs.