By Observing the current market trend, anybody can judge that AI is the present and the future. Hence, you need to learn and understand AI as soon as possible to secure your future. Most of the crowd who wants to come into this field, try to learn AI by directly jumping on to BuzzWords like Deep Learning, Computer Vision, Natural Language Processing, and Generative AI especially in the present scenario, however by following this process, they ignore to find the answers to utmost important questions, which are as follows:
- What is the actual working and use case of the algorithms they are trying to learn?
- What after Model creation?
- How to retrain the model if the model becomes obsolete, or data drift is encountered?
- How to deploy the model?
- etc…
To be an unreplaceable person, you need to have strong fundamentals and an understanding of the concepts. Hence, this blog will help you gain the accurate steps toward building a project in AI that will take you closer to a real ML Engineer. Since, single technology cannot create even a basic product, hence, in this project as well, we will be using multiple technologies to create a proper product.
This blog covers the generation of text from an audio file in a batch way, my next blog in this series will cover the text generation from an audio in real-time.
Let's begin!
Objective/Goal of the Blog
To create a container image that will act as an API to transcribe (convert audio to text) an Audio File recorded in English language using Flask in Python.
To achieve the goal of this blog, certain prerequisites need attention & are listed below.
Pre-requisites
- Experience in any container engine or container management tool like Docker, Kubernetes, Podman, Openshift, etc.
- Experience in Python.
- Experience in Flask.
- Experience in AWS.
- Basic understanding of Machine Learning and Natural Language Processing.
Implementation
The implementation of our goal involves multiple steps, to fulfill them, let's first understand the architecture of the system that we are going to develop. Below is the diagram of the system architecture, which contains 6 steps to fulfill our goal.
6 Steps for our goal!
The 6 steps required to meet our goal should be completed in sequence, each step is explained below with its code (compiled code for the complete project is present at the end of this blog).
Note: Proper Exception Handling is done in the complete code, and the link to the docker image for this functionality is also mentioned at the end of this blog.
Note 2: In each step below, only the code corresponding to them is mentioned, the complete code (driver code) is mentioned at the end of the explanation of all the 6 steps.
Step 1: Audio File sent to the API
In this step, the user will send the Audio File to the API for which it wants the transcription. (It doesn't matter where the API is hosted, the only requirement is that the user should be able to connect with that API).
The code for this step is:
# Importing the required libraries
from flask import Flask, request, jsonify
from utilitties import (
upload_audio_file_to_s3,
transcribe_audio_file,
download_transcript_from_s3,
)
from datetime import datetime
# Defining the App with the Name
app = Flask(__name__)
# Creating a error handler for the Internal Server Error
@app.errorhandler(500)
def error_505(error):
return "There is some problem with the application!", 400
# Creating a default route to translate the text into the destination language
@app.route("/", methods=["POST"])
def transcribe():
try:
# Step 1: Taking the Audio File from the User through our API
audio_file = request.files.get("audio_file")
if audio_file is None:
return jsonify({"message": "Blank Audio File Uploaded", "status": 400}), 400
In the above code, a few custom-made functions are used, they are included in the complete code present at the GitHub Repository (link mentioned at the end of the blog) as well as they are mentioned in the block of their corresponding steps below.
Step 2: Audio File uploaded to S3
The API developed will upload the audio file sent by the user to S3 because AWS Transcribe will take files from AWS S3 only in case of batch transcriptions.
The code for this step is:
import boto3
import os
# Defining the AWS S3 Client to perform operations with S3!
s3 = boto3.client(
service_name="s3",
region_name=os.getenv("S3_REGION"), # Remember to provide the AWS S3 Region in the environment variabe!
aws_access_key_id=os.getenv("aws_access_key"), # Remember to provide the AWS Access Key in the environment variabe!
aws_secret_access_key=os.getenv("aws_secret_key"), # Remember to provide the AWS Secret Key in the environment variabe!
)
def upload_audio_file_to_s3(audio_content):
try:
s3.put_object(
Body=audio_content.read(),
Bucket="harshitdawar-audio-files", # You can replace the name of the bucket with the one you are going to use!
Key=audio_content.filename,
)
return "File Uploaded Successfully", 200
except Exception as e:
print("Exception in Uploading the Audio File to S3: " + str(e))
return "Exception in Uploading the Audio File to S3", 400
In the above code, a function is defined that will be used to upload a file to S3, & this function will be called in the main driver function (mentioned after all 6 steps explanation).
Step 3: Transcribing Audio File using AWS Transcribe
AWS Transcribe will pick the file from S3 and will start generating the text based on the setting selected (In the present case, only English language support is added, in my other blogs of this series, I will cover how to transcribe audio files having language other than English for one speaker, and for multiple speakers as well. In addition to this, I will also cover transcribing an audio file in multiple languages).
The Code for the step 3 & 4 is:
import boto3
import os
import time
# Defining the AWS Transcribe client to interact with this service!
transcribe_client = boto3.client(
service_name="transcribe",
region_name=os.getenv("S3_REGION"), # Remember to provide the AWS S3 Region in the environment variabe!
aws_access_key_id=os.getenv("aws_access_key"), # Remember to provide the AWS Access Key in the environment variabe!
aws_secret_access_key=os.getenv("aws_secret_key"), # Remember to provide the AWS Secret Key in the environment variabe!
)
def transcribe_audio_file(audio_filename, job_name):
try:
transcribe_client.start_transcription_job(
TranscriptionJobName=job_name,
Media={"MediaFileUri": f"s3://harshitdawar-audio-files/{audio_filename}"}, # You can replace the bucket name that you will be going to use!
MediaFormat="mp3",
LanguageCode="en-US",
OutputBucketName="harshitdawar-audio-transcriptions", # In this bukcet, the transcription of the audio file will be saved, make sure to change the name of the bucket with the one that you are going to use!
OutputKey=audio_filename.split(".")[0] + ".txt", # The name of the transcription file is mentioned here, I explicitly made that a txt file!
)
# This loop will run a maximum of 100 times at an interval of 10 seconds to check whether the Transcription job is completed or not, you can change the number of retries based on your requirement!
max_tries = 100
while max_tries > 0:
max_tries -= 1
job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
if job_status in ["COMPLETED", "FAILED"]:
print(f"Job {job_name} is {job_status}.")
if job_status == "COMPLETED":
print(
f"Download the transcript from\n"
f"\t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}"
)
return "Transcription Successful", 200
else:
print("Audio Transcription Job Failed!")
return "Transcription Failed", 400
else:
print(f"Waiting for {job_name}. Current status is {job_status}.")
time.sleep(10) # Interval of 10 seconds
except Exception as e:
print("Exception in Transcribing the Audio File: " + str(e))
return "Exception in Transcribing the Audio File", 400
In the above code, a function is defined that will be used to transcribe the audio file and save its corresponding transcription to the S3 bucket mentioned for the same.
Step 4: Saving Transcription to S3
AWS Transcribe will save the transcription of the audio file to the S3 Bucket as specified in the configuration.
Step 5: Downloading the Transcript from S3
API will download the transcript from S3 to local storage.
The code for this step is:
import boto3
import json
import os
# Defining the AWS S3 Client to perform operations with S3!
s3 = boto3.client(
service_name="s3",
region_name=os.getenv("S3_REGION"), # Remember to provide the AWS S3 Region in the environment variabe!
aws_access_key_id=os.getenv("aws_access_key"), # Remember to provide the AWS Access Key in the environment variabe!
aws_secret_access_key=os.getenv("aws_secret_key"), # Remember to provide the AWS Secret Key in the environment variabe!
)
# Function to download the script from S3 to local storage
def download_transcript_from_s3(filename):
try:
s3.download_file(
"harshitdawar-audio-transcriptions", # You can replace the name of the bucket with the one you are going to use!
filename, # This parameter corresponds to the filepath in S3 Bucket (excluding the bucket name)!
"./" + filename, # This parameter corresponds to the filepath in local storage!
)
with open("./" + filename, "r") as f:
content = json.loads(f.read())
return content["results"]["transcripts"][0]["transcript"], 200
except Exception as e:
print("Exception in Downloading the Audio File to S3: " + str(e))
return "Exception in Downloading the Audio File from S3", 400
In the above code, a function is defined that will be used to download the audio file transcription from the S3 bucket to the local storage.
Step 6: Returning the response to the User
API will return the audio file transcription (text generated for the Audio File) to the user.
The Code for this step is:
# Step 2: Uploading the Audio File to S3
message, status = upload_audio_file_to_s3(audio_content=audio_file)
if status == 200:
# Step 3 & 4: Starting Audio File Transcription and saving that into the S3 Bucket
transcription_message, transcription_status = transcribe_audio_file(
audio_filename=audio_file.filename,
job_name=datetime.now().strftime("%d-%m-%Y_%H-%M-%S")
)
if transcription_status == 200:
# Step 5: Downloading the Transcription of the Audio File from S3
download_message, download_status = download_transcript_from_s3(
filename=audio_file.filename)
if download_status == 200:
# Step 6: Returning the Transcribed content to the user
return (
jsonify({"message": download_message, "status": 200}),
200)
The above code is just a snippet to return the transcription obtained by following a sequence of steps to meet the goal.
Complete Driver Code!
This section contains the complete driver code (main API File) that is used to obtain this complete functionality. All the functions used here are mentioned above in their respective steps.
# Importing the required libraries
from flask import Flask, request, jsonify
from utilitties import (
upload_audio_file_to_s3,
transcribe_audio_file,
download_transcript_from_s3,
)
from datetime import datetime
# Defining the App with the Name
app = Flask(__name__)
# Creating a error handler for the Internal Server Error
@app.errorhandler(500)
def error_505(error):
return "There is some problem with the application!", 400
# Creating a default route to translate the text into the destination language
@app.route("/", methods=["POST"])
def transcribe():
try:
# Step 1: Taking the Audio File from the User through our API
audio_file = request.files.get("audio_file")
if audio_file is None:
return jsonify({"message": "Blank Audio File Uploaded", "status": 400}), 400
else:
# Step 2: Uploading the Audio File to S3
message, status = upload_audio_file_to_s3(audio_content=audio_file)
if status == 200:
# Step 3 & 4: Starting Audio File Transcription and saving that into the S3 Bucket
transcription_message, transcription_status = transcribe_audio_file(
audio_filename=audio_file.filename,
job_name=datetime.now().strftime("%d-%m-%Y_%H-%M-%S"),
)
if transcription_status == 200:
# Step 5: Downloading the Transcription of the Audio File from S3
download_message, download_status = download_transcript_from_s3(
filename=audio_file.filename
)
if download_status == 200:
# Step 6: Returning the Transcribed content to the user
return (
jsonify({"message": download_message, "status": 200}),
200,
)
else:
return (
jsonify(
{
"message": download_message,
"status": download_status,
}
),
400,
)
else:
return (
jsonify(
{
"message": transcription_message,
"status": transcription_status,
}
),
400,
)
else:
return jsonify({"message": message, "status": status}), 400
except Exception as e:
print(str(e))
# Running the app on port 80 & on any host to make it accessible through the container
app.run(host="0.0.0.0", port=80)
Sample Output of the API Call!
The below screenshot showcases the output received from the API when a sample audio file is sent that is recorded in the English language.
GitHub Link for the code!
Docker Hub Link for this Image!
Make Sure to read the description of the image before using it, as it requires you to set the value of some environment variables.
I hope my article explains each and everything related to the topic with all the detailed concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to please follow me on Medium, GitHub, dev.to, & LinkedIn for more amazing content on multiple technologies and their integration!
Do let me know your views/questions in the comments!
Top comments (0)