In this article, I would be discussing yet another wonderful feature of AWS which is Amazon Rekognition which has come to reduce the burden in computer vision tasks like object detection, image classification, etc.
What really is Amazon Rekognition?
Amazon Rekognition is one of the amazing services introduced by Amazon to integrate and process image and video processes that can be added to your applications.
The images and videos are submitted to applications that perform some analysis to identify images, people, or contents in both the images and videos.
Amazon Rekognition utilizes the deep learning technology with a large amount of labeled ground truth images and videos which has been trained by a convolution neural network. We all know that training a model from scratch is very computationally expensive and requires a lot of computing resources and time, so with the introduction of Amazon Rekognition, these problems are taking off you so that you can only focus on the application side of it.
The Amazon Rekognition API is divided into two based API's:
1) Image API
2) Video API
The Amazon Rekognition is also divided further into two aspects based on storage:
1) Storage API based operations
2) Non-storage API based operations
Let us now discuss in detail the two APIs that we have in the Amazon Rekognition service.
1) The Image Processing API
The image processing APIs store the input image in either an Amazon s3 bucket or encoded in Base 64 and it comprises of the following API's:
The facial detection API returns a list of 100 possible faces that matches that of an input image which includes also the facial composition and attributes of the face. The returned list consists of the emotional state of the face: if the person is either smiling or angry if the individual is wearing glasses or not, the range of age of the individual with their respective confidence score.
With the face comparison API, it returns if the person in an image is the same person in another image. It would return an ordered list of the 100 matched largest faces detected which is compared to the original image. The bounding boxes of the source and target images are returned with the confidence score that matches the original image.
The celebrity API is used to detect which celebrity a certain input image is mapped with. It returns the bounding box of the identified celebrity face, the id of the celebrity, the confidence score of the match with the celebrity, and a URL that has extra information about the celebrity.
The text extraction API allows one to extract textual features from an image, for example, it can be used to extract the information of a person from an identity card, extract the details from a train ticket. It extracts the text with the geometry of the bounding box where the text was located in the image and also the confidence score of the text.
The content moderation API is used to identify whether an image consists of explicit or inappropriate content like explicit nudity or suggestive. This API returns if the input image consists of the inappropriate content and the confidence score.
The feature extraction API is used to extract objects which are contained in an input image. The different objects in the input image are returned with the confidence score of the identified objects and thier names.
2) Video Processing API
The video processing API is used to process streams of videos and it does its processing in an async manner because of the computing resources that are involved in the processing streams of videos which are stored in an s3 bucket. Below are the various API's that are involved in the processing of videos with the Amazon Rekognition:
The person tracking API is utilized to track people or persons in streams of videos. The async process firstly involves a StartPersonTracking which initiate the Job and sends a notification when it's done, then the GetPersonTracking utilizes the Job Id to detect the people or person in the stream of video which includes the facial features of the individual with the confidence score, the bounding box of the detected person per time within the video.
The face detection API is utilized to detect faces in streams of videos. The async process firstly involves a StartFaceDetection which initiate the Job and sends a notification when it's done, then the GetFaceDetection utilizes the Job Id to detect the face in the stream of video which includes the facial features of the individual with the confidence score, the bounding box of the detected face per time within the video.
The label detection API is utilized for objects in streams of videos. The async process firstly involves a StartLabelDetection which initiate the Job and sends a notification when it's done, then the GetLabelDetection utilizes the Job Id to detect the features or objects in the stream of video which includes the name of the object, the bounding boxes, and the confidence score per time within the video.
The celebrity detection API is used to detect celebrities in streams of video. The async process firstly involves a StartCelebrityDetection which initiates the Job and sends a notification when it's done, then the GetCelebrityDetection utilizes the Job Id to detect the celebrities in the stream of video with the confidence score, bounding box per time within the video.
The moderation detection API is used to detect explicit or inappropriate content in a stream of input video. The async process firstly involves a StartContentModeration which initiates the Job and sends a notification when it's done, then the GetContentModeration utilizes the Job Id to detect inappropriate content in the stream of video with the confidence score and bounding box per time within the video.