After reading this brief tutorial you’ll have a better understanding of how to transcribe a podcast episode list using a speech-to-text provider, Async IO, and looping through it with Python. To see the full code sample, scroll down to the bottom of this post. Otherwise, let’s walk through step-by-step what you’ll accomplish.
Working with the asyncio
library in Python can be tricky, but with some guidance, it’s less painful than one would imagine. With the help of Deepgram’s speech-to-text Python SDK, we can loop through all the podcast episode files and transcribe them using our prerecorded transcription.
Before doing any AI speech recognition you might wonder why we might need to use Asynchronous IO and those pesky async/await
Python keywords.
In the next section, we’ll discover the use of Python’s Async IO and the difference between asynchronous and synchronous code.
High-Level Overview of Asynchronous and Synchronous Code in Python
Running code synchronously or asynchronously are two different programming concepts that are important to understand, especially when it comes to Async IO. Whether a task is asynchronous or synchronous depends on how and when tasks are executed in a program. To understand each method, let’s dive in a bit at a high level.
We can think of synchronous programming as running discrete tasks sequentially: step-by-step, one after another. There is no overlap in the tasks, so they are being run sequentially. Imagine we are baking a cake and we’re following the recipe instructions. The following steps would be executed in order, without skipping a step or jumping ahead:
- Pre-heat the oven to 350 degrees
- Mix flour, baking powder and salt in a large bowl
- Beat butter and sugar in a bowl
- Add in the eggs
- Add in the vanilla extract
- Mix all the ingredients
- Pour cake batter into a sheet pan or spring mold
- Bake the cake for 30 minutes until golden brown
With asynchronous programming, we can imagine we’re multitasking or doing more than one task at the same time, instead of doing things sequentially.
Following the same example above, here's what asynchronous cake baking could look like, stepwise:
- Pre-heat the oven to 350 degrees
- While the oven pre-heats, mix flour, baking powder, and salt in a large bowl AND Beat butter and sugar in a bowl
- Add in the eggs AND Add in the vanilla extract
- Mix all the ingredients
- Pour cake batter into a sheet pan or spring mold
- Bake the cake for 30 minutes until golden brown
As we can see, in steps 2 and 3 you are doing multiple tasks at once. You may have heard the term “concurrency” in programming. This is the basis for asynchronous programming in Python, which means the task can run in an overlapping manner (e.g. "concurrently," or "in parallel" alongside another task).
You probably also noticed that there are fewer steps in the asynchronous programming recipe example than in the synchronous one. Since you can run multiple tasks simultaneously, synchronous code normally runs faster than its synchronous counterpart.
This is where Async IO splashes into the picture. We use the asyncio
Python library to write concurrent code using async/await
syntax in our asynchronous code.
In the next section, let's dive into the code for looping through a podcast episode list using the asyncio
library with Python. You’ll see how to transcribe each of the episodes using a speech-to-text AI provider and have a clearer understanding of the async/await
Python keywords.
Transcribing Podcast Audio with Python and Speech-to-Text Using AI
Here's how to use Deepgram to transcribe our prerecorded audio files. Deepgram is a speech recognition provider that can transcribe audio from real-time streaming sources or by batch-processing one or more pre-recorded files. Podcasts are generally distributed as pre-recorded audio files, so that's how we're going to proceed.
First off, we’ll need to grab a Deepgram API Key here to use our Python SDK. It’s super easy to sign up and create. You can either log in with Google, GitHub, or your email.
Once we have our API key let’s open up one of our favorite code editors. That could be something like Visual Studio Code, PyCharm, or something else.
Next, we proceed to make a directory called transcribe-audio-files
. We’ll transcribe sports speeches from a podcast, so create a Python file called transcribe_speeches.py
.
Let’s also create a folder inside the project called speeches
, which is where we’ll put our audio MP3 files. (Note: MP3s are the traditional audio format for podcasts. Deepgram works with over 100 different audio codecs and file formats.)
It’s also recommended that we create a virtual environment with the project so our Python dependencies are installed just for that environment, rather than globally. (Don't worry though: this is more of a "best practice" than a requirement. You do you.)
We’ll need to install the Deepgram speech-to-text Python package. To do so, install it with pip
like this:
pip install deepgram-sdk
Let’s take a look at the code now.
The Python Code with Async IO Keywords (async/await)
Use the below code and put it in your Python code file:
from deepgram import Deepgram
import asyncio, json
import os
DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
async def get_audio_files():
path_of_the_speeches = 'speeches'
for filename in os.listdir(path_of_the_speeches):
audio_file = os.path.join(path_of_the_speeches,filename)
if os.path.isfile(audio_file):
await main(audio_file)
return audio_file
async def main(file):
print(f"Speech Name: {file}")
# Initializes the Deepgram SDK
deepgram = Deepgram(DEEPGRAM_API_KEY)
# Open the audio file
with open(file, 'rb') as audio:
# ...or replace mimetype as appropriate
source = {'buffer': audio, 'mimetype': 'audio/mp3'}
response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
print(json.dumps(response, indent=4))
asyncio.run(get_audio_files())
The Python Code and Explanation with Async IO Keywords(async/await)
Let’s walk through the code step-by-step to understand what’s happening.
Here we are importing Deepgram so we can use its Python SDK. We’re also importing asyncio
and json
. We need asyncio
to tap into Async IO and json because later in the code we’ll convert a Python object into JSON using json.dumps.
from deepgram import Deepgram
import asyncio, json
import os
Next, we take the Deepgram key we created earlier and replace the placeholder text, YOUR_DEEPGRAM_API_KEY , with your API key.
DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
For example, if your API KEY is abcdefg1234 then your code should look like this:
DEEPGRAM_API_KEY="abcdefg1234"
In the below Python code snippet, we are just looping through the audio files in the speeches folder and passing them to the main function so they can be transcribed. Notice the use of the async/await keywords here.
async def get_audio_files():
path_of_the_speeches = 'speeches'
for filename in os.listdir(path_of_the_speeches):
audio_file = os.path.join(path_of_the_speeches,filename)
if os.path.isfile(audio_file):
await main(audio_file)
return audio_file
To make a function asynchronous in Python we need to add async in the function definition. So instead of def get_audio_files()
which is synchronous, we use async def get_audio_files()
.
Whenever we use the async Python keyword, we also use await if we’re calling another function inside. In this line of code await main(audio_file)
, we are saying call the main function and pass in the audio file. The await tells us to stop the execution of the function get_audio_files
and wait on the main function to do whatever it is doing, but in the meantime, the program can do other stuff.
async def main(file):
print(f"Speech Name: {file}")
# Initializes the Deepgram SDK
deepgram = Deepgram(DEEPGRAM_API_KEY)
# Open the audio file
with open(file, 'rb') as audio:
# ...or replace mimetype as appropriate
source = {'buffer': audio, 'mimetype': 'audio/mp3'}
response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
print(json.dumps(response, indent=4))
asyncio.run(get_audio_files())
Now to the speech-to-text Python transcription. We initialize Deepgram and pass in the API KEY.
Then we open each file as an audio and read in the bytes in this line with open(file, 'rb') as audio
. We create a Python dictionary called source to store the buffer as audio and the mimetype as audio/mp3
.
Next, we do the actual prerecorded transcription on this line response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
. We pass the source and the punctuate:True
parameter, which will provide punctuation in the transcript.
Now, we can print out the response so we can receive our transcript print(json.dumps(response, indent=4))
.
Lastly, we run our program using asyncio.run(get_audio_files())
.
Conclusion
Hopefully, you have a better understanding of transcribing audio using voice-to-text and looping through a podcast episode list using Async IO with Python. If you have any questions or need some help, please feel free to reach out to us on our Github Discussions page.
Full Python Code Sample of Looping Through a Podcast Episode
from deepgram import Deepgram
import asyncio, json
import os
DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
async def get_audio_files():
path_of_the_speeches = 'speeches'
for filename in os.listdir(path_of_the_speeches):
audio_file = os.path.join(path_of_the_speeches,filename)
if os.path.isfile(audio_file):
await main(audio_file)
return audio_file
async def main(file):
print(f"Speech Name: {file}")
# Initializes the Deepgram SDK
deepgram = Deepgram(DEEPGRAM_API_KEY)
# Open the audio file
with open(file, 'rb') as audio:
# ...or replace mimetype as appropriate
source = {'buffer': audio, 'mimetype': 'audio/mp3'}
response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
print(json.dumps(response, indent=4))
asyncio.run(get_audio_files())
Top comments (0)