Chirag Lunagariya for BoTreeTechnologies

Posted on Nov 19, 2019

Voice To Text Using AWS Transcribe With Python

#aws #python #speechtotext

Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of Amazon Web services like S3 and EC2.

Amazon Transcribe is an automatic speech recognition (ASR) service that is fully managed and continuously trained that generates accurate transcripts for audio files. It makes it easy for developers to add speech to text capability to their applications. You can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech using the Amazon Transcribe API. Reach out to learn more about the web development agency in New York for the various ways to improve or build the quality of projects and across your company.

You need to grant permissions to use the Amazon Transcribe console for the actions shown in the following policy:

	{
	"Version": "2012-10-17",
	"Statement": [
	{
	"Action": [
	"transcribe:*"
	],
	"Resource": "*",
	"Effect": "Allow"
	}
	]
	}

view raw policy.json hosted with ❤ by GitHub

1. Initialize Client:

To run transcribe job initialize client using boto3 with AWS credentials.

	import boto3
	import time
	import urllib
	import json

	AWS_ACCESS_KEY_ID = 'your_aws_access_key_id'
	AWS_SECRET_ACCESS_KEY = 'your_aws_secret_access_key'

	job_name = 'job name'
	job_uri = 'https://s3.amazonaws.com/bucket_name/file_name.mp3'

	Transcribe = boto3.client('transcribe', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, region_name='us-east-1')

view raw credentials.py hosted with ❤ by GitHub

2. Run Transcribe Job:

start_transcription_job(**kwargs) it will start an asynchronous job to transcribe speech to text.
Required parameters to run transcribe job are TranscriptionJobName, Media, MediaFormat ('mp3'|'mp4'|'wav'|'flac'), LanguageCode ('en-US'|'es-US'|'en-AU'|'fr-CA'|'en-GB'|'de-DE'|'pt-BR'|'fr-FR'|'it-IT').

transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': job_uri}, MediaFormat='mp3', LanguageCode='en-US')

view raw transcribe_job.py hosted with ❤ by GitHub

3. Check Job Status:

Due to the asynchronous nature of the transcription job, we need to check the job status. The time taken to run the transcription job is depending on the length and complexity of your recordings.

	while True:
	status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
	if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
	Break
	print("Not ready yet...")
	time.sleep(2)
	print(status)

view raw check_job_status.py hosted with ❤ by GitHub

When the transcription job status is completed, the result links to an Amazon S3 presigned URL that contains the transcription in JSON format:

	{
	"jobName":"job ID",
	"accountId":"account ID",
	"results": {
	"transcripts":[
	{
	"transcript":" that's no answer",
	"confidence":1.0
	}
	],
	"items":[
	{
	"start_time":"0.180",
	"end_time":"0.470",
	"alternatives":[
	{
	"confidence":0.84,
	"word":"that's"
	}
	]
	},
	{
	"start_time":"0.470",
	"end_time":"0.710",
	"alternatives":[
	{
	"confidence":0.99,
	"word":"no"
	}
	]
	},
	{
	"start_time":"0.710",
	"end_time":"1.080",
	"alternatives":[
	{
	"confidence":0.87,
	"word":"answer"
	}
	]
	}
	]
	},
	"status":"COMPLETED"
	}

view raw job_result.json hosted with ❤ by GitHub

4. Retrieve Text:

On complete transcription, the job result contains the transcription in JSON format. Using python library load JSON response and get the text from the result.

	if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
	response = urllib.urlopen(status['TranscriptionJob']['Transcript']['TranscriptFileUri'])
	data = json.loads(response.read())
	text = data['results']['transcripts'][0]['transcript']
	print(text)

view raw get_job_text.py hosted with ❤ by GitHub

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (1)

Sobin Flex • Jul 17 '23

Text to voice download refers to the process of obtaining software or applications that can convert written text into spoken words. These downloads enable users to convert various forms of written content, such as articles, documents, or emails, into audio files. By utilizing text-to-voice download tools, individuals can listen to text-based information instead of reading it, which can be beneficial for those with visual impairments or those who prefer auditory learning. Examples of popular text-to-voice download options include applications like NaturalReader, Balabolka, and Google Text-to-Speech.

DEV Community

Voice To Text Using AWS Transcribe With Python

Amplify your impact where it matters most — building exceptional apps.

Top comments (1)

Transform Your Cloud Infrastructure

Okay