DEV Community

Cover image for Why I built Typist - lightning-fast AI audio transcription app
Alexander Zuev
Alexander Zuev

Posted on • Originally published at iamtypist.dev

Why I built Typist - lightning-fast AI audio transcription app

Let's talk about things. There are things you need. There are things you don't need. There are things you don't know you need. And there are things you don't know you don't need.

Now, of the things you need, I bet one thing that won't come to mind is lightning-fast AI audio transcription. So it falls squarely in the round hole of our needs.

Things you need

I recall when I was a student (ages ago, it seems 🀯), there were lots of lectures I recorded but never got around to actually re-visiting. Or when I was working in consulting, recording an important event with a client (strictly with their permission, of course) allowed me to then re-listen and identify insights that would otherwise be missed. Recently, I was
learning something new for ClaudePortable.dev, and needed to watch an hour-long YouTube video to understand how it works.

Naturally, my inclination was not to spend 1 hour of my precious time listening to the whole video, but instead feeding into Claude and demanding it was used for our conversation. Imagine my surprise when I realized that downloading the video and getting an accurate transcription is somewhat of a struggle today (August 2025 AD)1. I did a brief search, and realized that the service I was using still ran on outdated whisper-v2 model. I then discovered that Groq hosts the latest whisper-v3-large model. And this point it all clicked. All the hours of lectures, client calls and unwatched youtube guides flashed before my eyes like in a fever dream.

I realized I have to build something out of it.

Things you don't need

This was my 5th project this year, after building everything from the Supabase MCP server with 700+ GitHub stars to ClaudePortable.dev - an app that lets you run Claude Code from Slack. Experience taught me what I didn't need: multiple features, unfamiliar tech, or overcomplicated architecture.

Instead, I stuck to hard-earned principles. Every project needs a killer reason to exist. The stack should stay mostly unchanged. One core feature must work flawlessly. And the UX has to be good enough that I'd actually use it myself.

Looking back after launch, I'm satisfied with how it turned out. Typist transcribes an hour of audio in under 15 seconds - an order of magnitude faster than other services, with slightly better accuracy thanks to Groq's whisper-v3-large model. The stack remained familiar (more on that below). Both video and audio work reliably with decent error handling. The UI is clean enough, though I know I should make it prettier.

Things you don't know you need

Zero To One

After transcribing 20 billion hours of audio by hand, typing 3 million lines of code blindfolded and writing 3 thousand lines of documentation with my hands tied behind my back, I can confidently say that coding is not the hardest part of building a product. Especially so with the wide adoption of LLMs.

Here's what you actually don't know you need until it's too late:

  • Distribution from Day One: I could build the fastest, most accurate transcription service in the world, but if no one knows about it, it might as well not exist.

  • Speed as THE Feature: Not just fast - so absurdly fast it changes behavior. Users don't know they need 15-second transcription until they experience it.

  • Time Constraints: I actually naively thought I could build it in 3 days. It took me 3 weeks.

Things you don't know you don't need

Now that Typist exists, here's what you can stop doing:

  • Waiting 10+ Minutes for Transcripts: Other services make you wait. And wait. And wait some more. You don't need patience anymore. Upload to Typist and get your transcript before your coffee gets cold.

  • Manually Transcribing "Just This One Part": We've all been there - "I'll just quickly type out this 2-minute section." Ten days later, you're still rewinding and typing. Stop torturing yourself. Let Typist handle it.

  • Expensive Subscriptions for Occasional Use: $100/month for transcribing one meeting every few weeks? You don't need another subscription eating away at your credit card. Typist gives you 3 free transcriptions daily - perfect for real humans with real needs.

The Stack That Makes It Possible

In 2025, you can build anything. The tech stack is no longer the moat - anyone can spin up the same infrastructure. The moat is speed, distribution, and actually shipping. The moat is knowing what NOT to build.

Here's the entire Typist stack:

  • βš›οΈ React SPA serves the UI
  • πŸ”₯ Hono API handling business logic
  • ☁️ Both SPA and API are served from Cloudflare Workers
  • πŸ”„ Cloudflare Workflows orchestrate the multi-step transcription process
  • πŸš€ Groq API does the transcription
  • πŸ—„οΈ D1 a primary relational database
  • πŸ“¦ R2 as object storage (for audio & media files)
  • πŸ” Better Auth for auth logic (and later Stripe integration)

Nothing revolutionary. You could copy this stack tomorrow. But you won't. Trust me. You are not that guy. You are not that guy, pal.2

What's Next

No one knows what's next for Typist. If anyone tells you they know - don't trust them. Even I don't know what's next for Typist. We'll see.

After launch comes the most fun part - learning what users actually do with that speed.

Some early patterns I'm seeing:

  • Students transcribing entire semesters of lectures in minutes
  • Podcasters creating searchable archives of their shows
  • Researchers processing interview data at scale
  • Non-native speakers using transcripts to better understand content

Each use case reveals new opportunities. But I'm resisting the urge to build everything. Typist will stay focused on its core promise: the fastest, most accurate transcription available.

Try It Yourself

The best way to understand Typist is to experience it. Upload any audio or video file and watch it transform into accurate, formatted text in seconds. No credit card, no complex setup - just pure speed.

Start transcribing at iamtypist.dev β†’

Questions, Feedback, Comments

What was the first thing that came to mind when reading this story? Write down in the comments below.



  1. AD stands for Anno Domini, a Latin phrase meaning "in the year of our Lord." It's used to denote years after the birth of Jesus Christ in the Gregorian calendar. The counterpart is BC (Before Christ) for years before that point. ↩

  2. You have to know the classics. ↩

Top comments (0)