<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shu</title>
    <description>The latest articles on DEV Community by Shu (@metsk-net).</description>
    <link>https://dev.to/metsk-net</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3044128%2F195e67cf-e95e-4dcf-ab7c-f084f1fd143f.JPG</url>
      <title>DEV Community: Shu</title>
      <link>https://dev.to/metsk-net</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/metsk-net"/>
    <language>en</language>
    <item>
      <title>Whisper + Gradio on Colab: Speech-to-Text in Minutes</title>
      <dc:creator>Shu</dc:creator>
      <pubDate>Tue, 28 Oct 2025 09:26:06 +0000</pubDate>
      <link>https://dev.to/metsk-net/whisper-gradio-on-colab-speech-to-text-in-minutes-2nlg</link>
      <guid>https://dev.to/metsk-net/whisper-gradio-on-colab-speech-to-text-in-minutes-2nlg</guid>
      <description>&lt;h1&gt;
  
  
  What you’ll learn?
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;How to transcribe speech into text using &lt;strong&gt;OpenAI Whisper&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;How to build a &lt;strong&gt;web-based transcription app&lt;/strong&gt; using &lt;strong&gt;Gradio&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;How to run everything for free on &lt;strong&gt;Google Colab’s GPU runtime&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Who this article is for ?
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Developers interested in ChatGPT’s Audio mode
&lt;/li&gt;
&lt;li&gt;Anyone curious about building &lt;strong&gt;AI-powered Audio tools&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Engineers who want to try &lt;strong&gt;Whisper&lt;/strong&gt; or &lt;strong&gt;Gradio&lt;/strong&gt; without local setup
&lt;/li&gt;
&lt;li&gt;Beginners looking to prototype an app quickly using &lt;strong&gt;free Colab GPU&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Environment
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Google Colab (Free Tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU&lt;/td&gt;
&lt;td&gt;NVIDIA T4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;3.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Key Libraries&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;openai-whisper&lt;/code&gt;, &lt;code&gt;gradio&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup Time&lt;/td&gt;
&lt;td&gt;~5 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  Step1: Setup the Environment
&lt;/h1&gt;

&lt;p&gt;Run the following cell in Colab to install all required packages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;whisper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="n"&gt;gradio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invalidate_caches&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/usr/local/lib/python3.12/site-packages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;gradio&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;whisper&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Whisper loaded successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you see the line &lt;strong&gt;&lt;code&gt;Whisper loaded successfully&lt;/code&gt;&lt;/strong&gt;, you’re good to go. Even on Colab’s free &lt;strong&gt;T4 GPU&lt;/strong&gt;, Whisper performs smoothly for short recordings.&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 2: Load the Whisper Model
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;whisper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;"small"&lt;/code&gt; variant provides a good balance between &lt;strong&gt;accuracy&lt;/strong&gt; and &lt;strong&gt;speed&lt;/strong&gt;, and it works particularly well for Japanese. It downloads once (~460 MB) and then loads instantly from cache afterward.&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 3: Create a Gradio Web App!!
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ja&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Interface&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filepath&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Whisper Test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Record or upload audio and get Japanese transcription&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running the code, a &lt;strong&gt;Gradio web interface&lt;/strong&gt; appears like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7a713pd9o2lgwvjhcju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7a713pd9o2lgwvjhcju.png" alt=" " width="800" height="261"&gt;&lt;/a&gt;&lt;br&gt;
You can record your voice directly from the microphone, or upload an audio file from your device. Then click &lt;strong&gt;Submit&lt;/strong&gt;, and your transcribed text will appear in the &lt;strong&gt;Output&lt;/strong&gt; box. When running on Colab, Gradio automatically provides a temporary &lt;code&gt;.gradio.live&lt;/code&gt; URL so you can test the app from your phone or another computer — free of charge.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Whisper&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Converts speech to text using transformer-based acoustic modeling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gradio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates a web UI and handles audio I/O&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Colab&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provides free GPU compute for model inference&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Together, these form a lightweight, end-to-end &lt;strong&gt;speech-to-text pipeline&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;.gradio.live&lt;/code&gt; URL is &lt;strong&gt;temporary&lt;/strong&gt; and &lt;strong&gt;public&lt;/strong&gt; (no authentication). Don’t share it if your audio contains private data.&lt;/li&gt;
&lt;li&gt;Once the Colab runtime stops, the URL expires automatically.&lt;/li&gt;
&lt;li&gt;For a persistent deployment, consider using &lt;strong&gt;RunPod&lt;/strong&gt;, &lt;strong&gt;Hugging Face Spaces&lt;/strong&gt;, or &lt;strong&gt;Render&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;In just about &lt;strong&gt;20 lines of Python&lt;/strong&gt;, you now have a fully working &lt;strong&gt;Japanese speech-to-text web app&lt;/strong&gt;. This setup is ideal for experimenting with &lt;strong&gt;AI transcription&lt;/strong&gt;, audio notes, or even meeting summaries — all without spending a single dollar.&lt;/p&gt;

&lt;h2&gt;
  
  
  About me
&lt;/h2&gt;

&lt;p&gt;I’m an &lt;strong&gt;SRE engineer&lt;/strong&gt; working mainly on infrastructure design and automation. Recently, I’ve been exploring the intersection of &lt;strong&gt;AI and speech technology&lt;/strong&gt;, focusing on how to develop &lt;strong&gt;custom speech-enabled LLMs&lt;/strong&gt;. My main stack includes &lt;strong&gt;Python, FastAPI, Next.js, and AWS&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Motivation
&lt;/h2&gt;

&lt;p&gt;I wrote this article because I want to &lt;strong&gt;develop my own custom speech-enabled LLM&lt;/strong&gt;. As a ChatGPT Plus user, I often rely on the &lt;strong&gt;audio mode&lt;/strong&gt;, but I wish I could use it freely for longer sessions throughout the day. Speaking helps me organize my thoughts and trigger new ideas — so I decided to recreate that experience myself. I’ll keep sharing articles about &lt;strong&gt;speech AI&lt;/strong&gt; and &lt;strong&gt;LLM integration&lt;/strong&gt;, so follow along if this project resonates with you.&lt;/p&gt;

</description>
      <category>python</category>
      <category>openai</category>
      <category>whisper</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
