DEV Community

Cover image for OpenAI Whisper: Transcribe in the Terminal for free
Chris Pennington
Chris Pennington

Posted on

OpenAI Whisper: Transcribe in the Terminal for free

Journaling can be beneficial for a lot of reasons—memory recall, processing, mental health, and more!

I’ve always aspired to be a “journaler,” but as a dev, I had a few preferences:

  • Audio input: I process better by talking and it's faster, so my preference is audio-first.
  • Text output: I am never going to listen to audio of myself talking, so I need to transcribe the audio to text.
  • Digital storage: easy of entry, mobility, option to search, and backups—need I say more?

Prefer a video version?

The Problem and How to Fix it

I started using the built-in dictation on my computer, but found I spent all my time fixing errors—I just couldn’t help myself!

Recently, I ran across Whisper AI—a free machine-learning transcription tool built by Open AI. Play with AI and spend 6 hours automating?!👋 Sign me up!

In this article, I’ll show you how to:

  1. Record audio IN YOUR TERMINAL(!).
  2. Transcribe it as a journal entry.
  3. Backup the entry by pushing it to Github.

It’s the ultimate nerd journaling experience. And it's free.

(And for macOS users, we’ll set up a native macOS notification to confirm the journal entry.)

Step 1: Record Audio

To get started, we’ll need to install FFmpeg, the all-in-one tool for working with audio and video. We’ll use it to record audio and Whisper AI will use it to create a transcription.

Install ffmpeg

While you can install it in many ways, the easiest is using a package manager like Homebrew for macOS or chocolatey for Windows.

Homebrew
brew install ffmpeg

Chocolatey
choco install ffmpeg

Note: You’ll need brew (for macOS) or cholcolatey (for Windows) before using these commands.

Locate your inputs

FFmpeg can list all inputs on your machine. Since it uses local tools based on your operating system, you’ll need to enter the right command for your machine.

macOS

ffmpeg -f avfoundation -list_devices true -i ""
Enter fullscreen mode Exit fullscreen mode

Windows

ffmpeg -f dshow -list_devices true -i dummy
Enter fullscreen mode Exit fullscreen mode

Linux

ffmpeg -f alsa -list_devices true -i ""
Enter fullscreen mode Exit fullscreen mode

The commands above will return a list of inputs—both video and audio. Make note of the output, noting the exact name of the audio input you need.

With this output, I’ll choose “MacBook Pro Microphone”:

AVFoundation audio devices:
[AVFoundation indev @ 0x14b004460] [0] MacBook Pro Microphone
[AVFoundation indev @ 0x14b004460] [2] Chris phone Microphone
Enter fullscreen mode Exit fullscreen mode

Record audio

With the name of your desired input, you can tell FFmpeg to record audio. We’ll add a few other nice-to-haves:

  • Timing: The -t flag sets a predetermined number of seconds to record. I’ll set mine to 60 to record a minute.
  • File name: You can set an output filename based on today’s date with $(date +'%Y%m%d').mp3 to produces an mp3 entitled YYYYmmdd.mp3 (i.e., based on the day of the recording).
  • Location: Set the full file path for the output to make the command executable from any directory.

Here’s what we have so far:

macOS

ffmpeg -f avfoundation -i ":MacBook Pro Microphone" -t 60 "/Users/chris.pennington/journal/$(date +'%Y%m%d').mp3"
Enter fullscreen mode Exit fullscreen mode

Windows

ffmpeg -f dshow -i ":Windows Microphone" -t 60 "/Users/chris.pennington/journal/$(date +'%Y%m%d').mp3"
Enter fullscreen mode Exit fullscreen mode

Linux

ffmpeg -f alsa -i ":Linux Microphone" -t 60 "/Users/chris.pennington/journal/$(date +'%Y%m%d').mp3"
Enter fullscreen mode Exit fullscreen mode

Step 2: Transcribe

You’ve got an audio recording, so now it’s time to transcribe it.

Install Whisper AI

The magic tool here is the Whisper AI Python library, so you’ll need three items:

1. Install Python 3.8–3.11

You can check your version of python with python3 -V. If you don’t have it, or you don’t have 3.8–3.11, head to python.org and download the latest 3.11 version.

Once it downloads, install Python like any other program on your machine. Both Windows and macOS require an extra step:

  • Windows: If you’re on Windows, make sure you check "Add Python.exe to PATH" during the installation process.
  • macOS: For On macOS, you’ll also need to install the security commands for Python to allow secure network requests. Finder should open automatically, showing the files associated with Python on your machine. Drag the Install Certificates file to your terminal and press Return to run the install command.

2. Install pip

You’ll also need pip, the package manager for Python, as we’ll use it to install Whisper AI.

It should come installed with Python, but you can double-check with python3 -m pip --version.

If you don’t have pip, run this command to install it:
python3 -m pip install --upgrade pip

3. Install Whisper AI

Finally, the magic sauce, Whisper AI. This command installs both Whisper AI and the dependencies it needs to run.

pip install -U openai-whisper

Transcribe your audio

Whisper makes audio transcription a breeze. Type whisper and the file name to transcribe the audio into several formats automatically. That’s it!

You can, however, provide more instructions to Whisper AI. Here are a few I’ll add:

  • Language: By default, Whisper detects your language, but you can provide it with the --language flag (dozens are supported!).
  • Model: Whisper offers several levels of transcription quality. By default, it uses the small model, but you can get slightly better (although slower) results with the --model medium flag (for English speakers, I’d recommend --model medium.en).
  • File type: While Whisper can output several file types (e.g., srt, json, etc.), for journaling, I’ll want a txt file, so I’ll add the --outout_format txt flag.

Here’s the end result:

whisper "/Users/chris.pennington/journal/$(date +'%Y%m%d').mp3" --model medium.en --language English --output_format txt
Enter fullscreen mode Exit fullscreen mode

Note: My MacBook Pro does cannot perform inference, so I also add the --fp16 FALSE flag to save Whisper AI the trouble of attempting inference.

Putting it together

The full command will 1) record audio, 2) transcribe the audio, and 3) then delete the audio file. Since all three commands use the same file path, we can extract it into its own variable and reference it in each command to clean up the code slightly.

AUDIO_FILE="/Users/chris.pennington/journal/$(date +'%Y%m%d').mp3"

ffmpeg -f avfoundation -i ":MacBook Pro Microphone" -t 60 "$AUDIO_FILE"

whisper "$AUDIO_FILE" --model medium.en --language English --fp16 False --output_format txt

rm "$AUDIO_FILE"
Enter fullscreen mode Exit fullscreen mode

Note: Remember to alter the first command for FFmpeg if you are on a Windows or Linux machine.

Unless you want to type this every day, I’d recommend creating an alias. In my case, I’m using Warp, so I’ll right-click the command and choose Save as Workflow to save my script as a workflow. Warp AI will even help me autofill the title and description and detect variables.

Workflow in Warp with title, description, and parameters filled out

Note: You can alternatively add an alias for your shell pointing to this shell script.

Step 3: Backup to GitHub

Next, let’s push these files to a repository online automatically to save your journal. Create a local repo (git init) and commit the current files.

Next, create a remote and push the files.

Then append the following to your script.

git add -A

git commit -m "$(date +'%Y%m%d') journal entry"

git push
Enter fullscreen mode Exit fullscreen mode

Step 4: macOS Notification

On macOS, you can show a notification using AppleScript. I’ll add the following at the end of my script:

osascript -e 'display notification "Transcription Complete!" with title "Whisper AI"'

macOS notification showing transcription complete

Final Code

Here’s my final code on macOS. Since I’ve set up a repository, I’ll need to cd into the directory to commit, so I’ll alter the script to start by moving to the right directory. Remember to change the audio recording command if you are on Windows or Linux machines.

TODAY=$(date +'%Y%m%d')

cd /Users/chris.pennington/journal

ffmpeg -f avfoundation -i ":MacBook Pro Microphone" -t 60 "./$TODAY.mp3"

whisper "./$TODAY.mp3" --model medium.en --language English --output_format txt

rm "./$TODAY.mp3"

osascript -e 'display notification "Transcription Complete!" with title "Whisper AI"'

git add -A

git commit -m "$TODAY journal entry"

git push
Enter fullscreen mode Exit fullscreen mode

Top comments (0)