Using AWS Amazon transcribe for audio transcriptions
The last year I worked in a little project, the requirement was help with technology to the users that make with job manually, listen audios all day long and in some cases causing some health issues.
For example an audio from a meeting of 4 hours of duration it
takes maybe 3 full days for a single person to transcribe.
Making some research there're a few actors that makes this task a little bit more easy for us, the developers.
Google Cloud Services, with speech to text, IBM Watson Speech to Text and AWS Transcribe. We choose the last one for we already have an AWS console for other services, for the price and the AWS SDK and documentation it's really good.
- First of all you will need an AWS account or an IAM (Identify access management)
- AWS Credencials, ACCESS KEY ID and SECRET ACCESS KEY
- One or two S3 buckets. I have one bucket por the audio files and other for the result of the job of the transcription it self.
gem install aws-sdk-s3
gem install aws-sdk-transcribeservice
Sending your audio file to Amazon transcribe
require 'aws-sdk-transcribeservice'
# Aws connection
Aws.config.update({
region: 'us-east-2',
credentials: Aws::Credentials.new(ACCESS_KEY, SECRET_ACCESS_KEY)
})
# Client connection
client = Aws::TranscribeService::Client.new(region: 'us-west-2')
# The URL of your audio that you want transcribe
s3_audio_file = 'https://your-s3-bucket-name.s3.amazonaws.com/uploads/video/audio_original/7/audio-16000.mp3'
resp = client.start_transcription_job({
transcription_job_name: "NameOfTheJob", # required
language_code: "es-ES", # required, accepts en-US, es-US, en-AU, fr-CA, en-GB, de-DE, pt-BR, fr-FR, it-IT, ko-KR, es-ES, en-IN, hi-IN, ar-SA, ru-RU, zh-CN, nl-NL, id-ID, ta-IN, fa-IR, en-IE, en-AB, en-WL, pt-PT, te-IN, tr-TR, de-CH, he-IL, ms-MY, ja-JP, ar-AE
media_sample_rate_hertz: 16000,
media_format: "mp3", # accepts mp3, mp4, wav, flac
media: { # required
media_file_uri: s3_audio_file,
},
output_bucket_name: "the-bucket-transcription-result"
})
This piece of code above will create an Transcribe Job with the file that you need transcribe, and the maximum duration of an audio file most to be the 2 hours and the quality of the file most be 16000 Mhz.
For a 2 hour audio amazon transcribe does the job in less than 10 minutes.
Pretty awesome right!
Obviously amazon uses machine learning behind the transcription service.
After the job is finished, AWS Transcribe generates a json file with the full text of the transcription and you can use it however you want.
List all the jobs of AWS Transcribe
require 'aws-sdk-s3'
require 'json'
require 'stringio'
# You define this bucket
bucket = 'bucket-of-transcription-result'
s3 = Aws.config.update({
region: 'us-east-2',
credentials: Aws::Credentials.new(ACCESS_KEY, SECRET_ACCESS_KEY)
})
client = Aws::S3::Client.new(region: 'us-west-2')
# GET ALL THE FILES OF THE BUCKET
resp = client.list_objects_v2({
bucket: bucket
})
# CONVERT THE RESULT IN A HASH
list_of_files = resp.to_h
file_names = []
list_of_files[:contents].each do |v|
file_names << v[:key]
end
Well it my first post here in Dev, I hope this code can help someone.
Cheers!
Top comments (2)
Hey thank you so much, this is the only article on the Internet ever, you saved me.
Great to hear that was useful for someone 🧠🤖