DEV Community

Revathi Joshi for AWS Community Builders

Posted on • Edited on

1

AWS service - AWS Transcribe

In this article, I am going to show you how to use Amazon Transcribe (automatic speech recognition service), to create a text transcript of a pre-recorded speech file in English, after uploading it to a S3 bucket using the AWS Management Console.

Amazon Transcribe

  • It is a very easy and a useful tool for creating transcriptions of audio data, either a media file uploaded in an Amazon S3 bucket or a media stream, and converts it to text data.

  • It can transcribe public speeches, business meeting notes, customer calls, broadcast TV, on-demand videos, class lectures and perform medical transcription in real-time.

  • You can use Amazon Transcribe as a standalone service or to add speech-to-text capabilities to any application.

  • You can transcribe from these languages list

  • Transcription jobs are of 2 types:

  • Batch transcription jobs - Media files stored in an Amazon S3 bucket

  • Streaming transcription jobs - Media streams in real time.

Please visit my GitHub Repository for S3 articles on various topics being updated on constant basis.

Let’s get started!

Objectives:

1. Create a S3 bucket

2. Upload an audio file into S3 bucket

3. Create transcription job

4. Review transcription results

Pre-requisites:

  • AWS user account with admin access, not a root account.
  • Create an IAM role, with AmazonS3FullAccess

Resources Used:

Amazon Transcribe

IAM Access Policy

S3 Bucket

Steps for implementation to this project:

1. Create a S3 bucket

On Amazon S3 console / Create bucket / Under General configuration /

Bucket name: - oprah-audio

AWS Region: - US East (N. Virginia) us-east-1

  • Take all defaults and Create bucket

Image description

2. Upload an audio file into S3 bucket

  • Click on your bucket’s name to navigate to the bucket / On the Buckets Home page / Select Upload / Add files / Upload the oprah-audio.mp3 file

Upload

Image description

  • Select oprah-audio.mp3 file / Under Properties / For Object overview / Copy the S3 URL / Save it for future use

s3://oprah-audio/oprah-audio.mp3

Image description

3. Create transcription job

  • From the top menu bar, select Services then begin typing Transcribe in the search bar and select Amazon Transcribe to open the service console.

  • On the Amazon Transcribe Console / Transcription jobs page, click Create job / Under Specify job details / Job settings /

Name: - oprah-audio-transcribe-job

Language: - English,US (en-US)

Input data / Input file location on S3: - s3://oprah-audio/oprah-audio.mp3

Output data location type: take the default - Service-managed S3 bucket.

Subtitle type format: -

  • Amazon Transcribe supports WebVTT (VTT) and SubRip (SRT) file types.

  • In the Subtitle file format field, you can choose either or both file types for output.

  • If you select both types, you get two files that are exported to the same S3 bucket.

  • I am not not using either formats.

Next

  • On the Configure page / Under Customization / Custom vocabulary

This feature helps you to recognize words and phrases that are specific to your application. I am not choosing this feature as I am not using any application.

  • Create job

4. Review transcription results

  • After the Transcription jobs - shows Complete / Click on oprah-audio-transcribe-job / Under Transcription preview / Text

  • you can see the following transcribed text

  • If the transcribed text is long, you have to to scroll down to the Transcription panel to view the whole transcription job output.

Image description

Cleanup

  • Delete the audio file - oprah-audio.mp3

  • Delete the S3 bucket

  • Delete the Transcription job

What we have done so far

We have used Amazon Transcribe to create a text transcript of a pre-recorded speech file in English, after uploading it to a S3 bucket using the AWS Management Console.

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay