Using AWS Amazon transcribe with Ruby

#ruby #aws #amazon #webservices

Using AWS Amazon transcribe for audio transcriptions

The last year I worked in a little project, the requirement was help with technology to the users that make with job manually, listen audios all day long and in some cases causing some health issues.

For example an audio from a meeting of 4 hours of duration it
takes maybe 3 full days for a single person to transcribe.

Making some research there're a few actors that makes this task a little bit more easy for us, the developers.

Google Cloud Services, with speech to text, IBM Watson Speech to Text and AWS Transcribe. We choose the last one for we already have an AWS console for other services, for the price and the AWS SDK and documentation it's really good.

First of all you will need an AWS account or an IAM (Identify access management)
AWS Credencials, ACCESS KEY ID and SECRET ACCESS KEY
One or two S3 buckets. I have one bucket por the audio files and other for the result of the job of the transcription it self.

gem install aws-sdk-s3
gem install aws-sdk-transcribeservice

Sending your audio file to Amazon transcribe


require 'aws-sdk-transcribeservice'

# Aws connection 

Aws.config.update({
  region: 'us-east-2',
  credentials: Aws::Credentials.new(ACCESS_KEY, SECRET_ACCESS_KEY)
})

# Client connection 
client = Aws::TranscribeService::Client.new(region: 'us-west-2')

# The URL of your audio that you want transcribe 
s3_audio_file = 'https://your-s3-bucket-name.s3.amazonaws.com/uploads/video/audio_original/7/audio-16000.mp3'

resp = client.start_transcription_job({
  transcription_job_name: "NameOfTheJob", # required
  language_code: "es-ES", # required, accepts en-US, es-US, en-AU, fr-CA, en-GB, de-DE, pt-BR, fr-FR, it-IT, ko-KR, es-ES, en-IN, hi-IN, ar-SA, ru-RU, zh-CN, nl-NL, id-ID, ta-IN, fa-IR, en-IE, en-AB, en-WL, pt-PT, te-IN, tr-TR, de-CH, he-IL, ms-MY, ja-JP, ar-AE
  media_sample_rate_hertz: 16000,
  media_format: "mp3", # accepts mp3, mp4, wav, flac
  media: { # required
    media_file_uri: s3_audio_file,
  },
  output_bucket_name: "the-bucket-transcription-result"
})

This piece of code above will create an Transcribe Job with the file that you need transcribe, and the maximum duration of an audio file most to be the 2 hours and the quality of the file most be 16000 Mhz.

For a 2 hour audio amazon transcribe does the job in less than 10 minutes.

Pretty awesome right!

Obviously amazon uses machine learning behind the transcription service.

After the job is finished, AWS Transcribe generates a json file with the full text of the transcription and you can use it however you want.

List all the jobs of AWS Transcribe


require 'aws-sdk-s3'
require 'json'
require 'stringio'

# You define this bucket 
bucket      = 'bucket-of-transcription-result'

s3 = Aws.config.update({
      region: 'us-east-2',
      credentials: Aws::Credentials.new(ACCESS_KEY, SECRET_ACCESS_KEY)
})

client = Aws::S3::Client.new(region: 'us-west-2')

# GET ALL THE FILES OF THE BUCKET
resp = client.list_objects_v2({
      bucket: bucket
})

# CONVERT THE RESULT IN A HASH
list_of_files = resp.to_h

file_names = []

list_of_files[:contents].each do |v|
   file_names << v[:key]  
end