Elizabeth Fuentes L for AWS

Posted on Jun 26, 2023 • Edited on Dec 6, 2023 • Originally published at buildon.aws

All the things that Comprehend, Rekognition, Textract, Polly, Transcribe, and Others Do

#machinelearning #aws #ai #python

Original: https://www.buildon.aws/posts/all-the-things-that-comprehend-rekognition-textract-polly-transcribe-and-others-do

Developers - those who provide solutions to computer problems, establish base procedures, program, and maintain solutions - are indeed programmers, but that doesn't automatically make them experts in everything related to code. Take, for example, the creation of an ML-dependent function: it necessitates familiarity with models and algorithm training, knowledge that isn't common among all programmers.

Fortunately, there are ready-to-use APIs that leverage existing, previously trained models to execute ML functions, and they can be used without the need for ML knowledge. Additionally, they ensure the security of the information shared with them. Up next, I'll introduce you to some specific ML API services and four use cases to get you familiar with them and let your imagination run wild.

How Do Ready-to-Use ML-Function APIs Work? Just Follow These 3 Simple Steps:

Define the input, the location of the object in an Amazon S3 bucket or text.
Invoke the API using this input.
Output will be in JSON format.

Let's Take a Look at the APIs

AWS offers a variety of ML and AI services designed to expedite their implementation in your applications. These services range from those that equip you with the necessary infrastructure to train your own models to those that come as ready-to-use, pre-trained API calls. Let's now focus specifically on some examples of the latter:

API Type	How you can do	Service Name
🔎 Analysis of images (.png, .jpg) /videos (.mp4)	Label detection (predefined or custom) Image propereties and moderation. Facial detection, comparasion and analysis. Face search People paths. Personal Protective Equipment Celebrities recognition. Text in image Inappropriate or offensive content	Amazon Rekognition
🔎 Detection and analysis of text in documents (PNG, JPG, PDF or TIFF)	Processes individual or bundled documents. Detect typed and handwritten text Recognize documents, like financial reports, medical records, ID document (drivers licenses and passports) and tax forms. Extract text, forms, and tables from documents with structured data.	Amazon Textract
🔎 Natural Language Processing (NLP) and text analysis	Processes documents and extracts information such as: Entities Events Key phrases Dominant language Sentiment Targeted sentiment Syntax analysis. Custom classification and entity recognition. Managing custom models.	Amazon comprehend
🔎 Text to speech	Supports multiple languages and includes a variety of lifelike voices. Includes a number of Neural Text-to-Speech (NTTS) voices, delivering ground-breaking improvements in speech quality through a new machine learning approach, thereby offering to customers the most natural and human-like text-to-speech voices posible. Neural TTS technology also supports a Newscaster speaking style that is tailored to news narration use cases.	Amazon Polly
🔎 Speech to Text	Convert audio (Supported formats) to text. Transcribe media in real time (streaming) or you can transcribe media files located in an Amazon S3 bucket (batch). Improve accuracy for your specific use case with language customization, filter content to ensure customer privacy or audience-appropriate language, analyze content in multi-channel audio, partition the speech of individual speakers	Amazon Transcribe
🔎 Translate	Translate unstructured text (UTF-8) documents or to build applications that work in multiple languages	Amazon Translate

🚀 Use Cases

The most effective way to learn programming is by solving problems through code development. The same principle applies when learning how to use a service: you need to actively use it to understand it. The following four use cases are examples of both real and hypothetical problems that I tackled during my learning process.

Use case 1: Create subtitles and translate them into the language you want ⏯️ 🍿.

If you're passionate about utilizing video as a tool for education, it would be ideal to reach as many people as possible. One common barrier to this is language. This application enables you to create subtitles and translate them into any desired language to remove this barrier.

Upload the .mp4 video to an Amazon s3 bucket.
A Lambda Function makes the call to Transcribe API.
Subtitles file in the original language are downloaded to S3 Bucket.
A Lambda Function makes the call to Translate API.
Subtitles file in the new language is downloaded to S3 Bucket.

Here's the code to create this solution.

Use case 2: Detect entities and sentiment from a document 🔎 📄.

Many people possess piles of documents at home, ranging from letters from past lovers to medical records, children's school memorabilia, and bank statements, etc. Wouldn't it be convenient to neatly store these in the cloud? Explore and learn about the functionalities of Textract and Comprehend with this app.

Upload the document (PNG, JPG, PDF or TIFF) to an S3 Bucket.
A Lambda Function makes the call to Textract API.
With the response from Textract, Lambda Function makes the call to Comprehend API.
A Lambda Function makes the call to the Translate API.
The response is saved in an S3 bucket.

Here's the code to create this solution

Use case 3: Make Polly Talk 🦜

I was curious how an Italian speaking Chinese sounded, and since Polly has native voices for each language I created this notebook to play 😂.

From a Jupyter Notebook make the call to Polly API.
Polly stores the result in a S3 bucket.
Retrieves the audio.

Here's the code to create this solution

Use case 4: Video content moderation ⏯️ 🔫 🚬

I´m fan of action movies and wanted to try Rekognition with the trailer of Die Hard, so I created this application and wow! Each dataframe is pure violence 🫣... I invite you to try it with a trailer of your favorite movie.

Upload the .mp4 video to an s3 bucket.
A Lambda Function makes the call to Rekognition API.
Once the video review is finished, a new Lambda Function retrieves the result and stores it in an s3 bucket.

Here's the code to create this solution

Conclusion

You've now learned that AIML can be used via an API call to perform a variety of tasks such as analyzing images and videos, detecting and analyzing text in scanned documents, and leveraging Natural Language Processing (NLP) to extract sentiment from dominant languages, among many other things. In addition, you have the capability to convert text to speech and vice versa, and to utilize a language translator, all within the reach of a single API call.

This just scratches the surface of what can be achieved by leveraging AIML applications via API calls.

No doubt, there's a real or hypothetical problem you'd like to address using one of these services. Even if you don't have one in mind, I've provided these links for you to continue experimenting and learning: