Tonya Sims for Deepgram

Posted on Nov 18, 2022 • Originally published at blog.deepgram.com

How to Loop Through a Podcast Episode List using Async IO with Python

#python #speechtotext #transcription #asyncio

After reading this brief tutorial you’ll have a better understanding of how to transcribe a podcast episode list using a speech-to-text provider, Async IO, and looping through it with Python. To see the full code sample, scroll down to the bottom of this post. Otherwise, let’s walk through step-by-step what you’ll accomplish.

Working with the asyncio library in Python can be tricky, but with some guidance, it’s less painful than one would imagine. With the help of Deepgram’s speech-to-text Python SDK, we can loop through all the podcast episode files and transcribe them using our prerecorded transcription.

Before doing any AI speech recognition you might wonder why we might need to use Asynchronous IO and those pesky async/await Python keywords.

In the next section, we’ll discover the use of Python’s Async IO and the difference between asynchronous and synchronous code.

High-Level Overview of Asynchronous and Synchronous Code in Python

Running code synchronously or asynchronously are two different programming concepts that are important to understand, especially when it comes to Async IO. Whether a task is asynchronous or synchronous depends on how and when tasks are executed in a program. To understand each method, let’s dive in a bit at a high level.

We can think of synchronous programming as running discrete tasks sequentially: step-by-step, one after another. There is no overlap in the tasks, so they are being run sequentially. Imagine we are baking a cake and we’re following the recipe instructions. The following steps would be executed in order, without skipping a step or jumping ahead:

Pre-heat the oven to 350 degrees
Mix flour, baking powder and salt in a large bowl
Beat butter and sugar in a bowl
Add in the eggs
Add in the vanilla extract
Mix all the ingredients
Pour cake batter into a sheet pan or spring mold
Bake the cake for 30 minutes until golden brown

With asynchronous programming, we can imagine we’re multitasking or doing more than one task at the same time, instead of doing things sequentially.

Following the same example above, here's what asynchronous cake baking could look like, stepwise:

Pre-heat the oven to 350 degrees
While the oven pre-heats, mix flour, baking powder, and salt in a large bowl AND Beat butter and sugar in a bowl
Add in the eggs AND Add in the vanilla extract
Mix all the ingredients
Pour cake batter into a sheet pan or spring mold
Bake the cake for 30 minutes until golden brown

As we can see, in steps 2 and 3 you are doing multiple tasks at once. You may have heard the term “concurrency” in programming. This is the basis for asynchronous programming in Python, which means the task can run in an overlapping manner (e.g. "concurrently," or "in parallel" alongside another task).

You probably also noticed that there are fewer steps in the asynchronous programming recipe example than in the synchronous one. Since you can run multiple tasks simultaneously, synchronous code normally runs faster than its synchronous counterpart.

This is where Async IO splashes into the picture. We use the asyncio Python library to write concurrent code using async/await syntax in our asynchronous code.

In the next section, let's dive into the code for looping through a podcast episode list using the asyncio library with Python. You’ll see how to transcribe each of the episodes using a speech-to-text AI provider and have a clearer understanding of the async/await Python keywords.

Transcribing Podcast Audio with Python and Speech-to-Text Using AI

Here's how to use Deepgram to transcribe our prerecorded audio files. Deepgram is a speech recognition provider that can transcribe audio from real-time streaming sources or by batch-processing one or more pre-recorded files. Podcasts are generally distributed as pre-recorded audio files, so that's how we're going to proceed.

First off, we’ll need to grab a Deepgram API Key here to use our Python SDK. It’s super easy to sign up and create. You can either log in with Google, GitHub, or your email.

Once we have our API key let’s open up one of our favorite code editors. That could be something like Visual Studio Code, PyCharm, or something else.

Next, we proceed to make a directory called transcribe-audio-files. We’ll transcribe sports speeches from a podcast, so create a Python file called transcribe_speeches.py.

Let’s also create a folder inside the project called speeches, which is where we’ll put our audio MP3 files. (Note: MP3s are the traditional audio format for podcasts. Deepgram works with over 100 different audio codecs and file formats.)

It’s also recommended that we create a virtual environment with the project so our Python dependencies are installed just for that environment, rather than globally. (Don't worry though: this is more of a "best practice" than a requirement. You do you.)

We’ll need to install the Deepgram speech-to-text Python package. To do so, install it with pip like this:

pip install deepgram-sdk

Let’s take a look at the code now.

The Python Code with Async IO Keywords (async/await)

Use the below code and put it in your Python code file:

from deepgram import Deepgram
import asyncio, json
import os

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file



async def main(file):
   print(f"Speech Name: {file}")
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())

The Python Code and Explanation with Async IO Keywords(async/await)

Let’s walk through the code step-by-step to understand what’s happening.

Here we are importing Deepgram so we can use its Python SDK. We’re also importing asyncio and json. We need asyncio to tap into Async IO and json because later in the code we’ll convert a Python object into JSON using json.dumps.

from deepgram import Deepgram
import asyncio, json
import os

Next, we take the Deepgram key we created earlier and replace the placeholder text, YOUR_DEEPGRAM_API_KEY , with your API key.

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

For example, if your API KEY is abcdefg1234 then your code should look like this:

DEEPGRAM_API_KEY="abcdefg1234"

In the below Python code snippet, we are just looping through the audio files in the speeches folder and passing them to the main function so they can be transcribed. Notice the use of the async/await keywords here.

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file

To make a function asynchronous in Python we need to add async in the function definition. So instead of def get_audio_files() which is synchronous, we use async def get_audio_files().

Whenever we use the async Python keyword, we also use await if we’re calling another function inside. In this line of code await main(audio_file), we are saying call the main function and pass in the audio file. The await tells us to stop the execution of the function get_audio_files and wait on the main function to do whatever it is doing, but in the meantime, the program can do other stuff.

async def main(file):
   print(f"Speech Name: {file}")

   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())

Now to the speech-to-text Python transcription. We initialize Deepgram and pass in the API KEY.

Then we open each file as an audio and read in the bytes in this line with open(file, 'rb') as audio. We create a Python dictionary called source to store the buffer as audio and the mimetype as audio/mp3.

Next, we do the actual prerecorded transcription on this line response = await deepgram.transcription.prerecorded(source, {'punctuate': True}). We pass the source and the punctuate:True parameter, which will provide punctuation in the transcript.

Now, we can print out the response so we can receive our transcript print(json.dumps(response, indent=4)).

Lastly, we run our program using asyncio.run(get_audio_files()).

Conclusion

Hopefully, you have a better understanding of transcribing audio using voice-to-text and looping through a podcast episode list using Async IO with Python. If you have any questions or need some help, please feel free to reach out to us on our Github Discussions page.

Full Python Code Sample of Looping Through a Podcast Episode

from deepgram import Deepgram
import asyncio, json
import os

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file


async def main(file):
   print(f"Speech Name: {file}")
   # Initializes the Deepgram SDK

   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})

       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())