From Brain Dump to Markdown: Structure Ideas as You Speak

#ai #agents #llm #memory

Written by Speech To Markdown

Voice input is faster than typing — but speed alone isn't the problem. The real challenge is structure. It's surprisingly hard to organise your thoughts on the fly and say something coherent. AI assistants like Gemini have adopted voice input, yet so much of what gets transcribed ends up as unstructured noise that takes longer to clean up than it saved.

That's the problem I set out to solve for myself.

Why I Built This

I wanted a way to speak freely — braindump style — and have an AI turn it into clean, structured Markdown in real time. No editing, no typing, no context switching. Just think out loud and get a document back.

The result is a Speech-to-Markdown [stmd] tool built into TaskSquad.

How It Works

When you start recording, stmd transcribes your speech locally using Whisper models (downloaded on the fly — I use the large variant). The transcript is buffered, aggregated, and sent chunk by chunk to a model of your choice. You can pause recording at any point to think before continuing.

You can connect it to:

A local harness available on TaskSquad (powered by a sub-agent)
A direct API or local model via a custom prompt (oMLX, Ollama, etc.)

Model Selection	Session Setup	Ready State

I currently use Claude Code as my main harness and have also tested Gemma 4 with oMLX extensively — both work well at comparable speed.

Two Modes

stmd works in 2 modes: append and edit.

Append mode — each spoken chunk is cleaned up and appended to the existing Markdown document. Great for brain-dumping a first draft.

Edit mode — your spoken words become edit commands. Instead of adding content, the agent modifies what's already there. Say "make the intro shorter" or "replace the second bullet" — no keyboard required.