Amazon SageMaker GroundTruth

#aws #machinelearning

We will review possible methods provided by Amazon GroundTruth to label our data. Amazon GroundTruth is a service within Amazon Sagemaker that labels datasets for further use in building machine learning models. Three options are available when using this service:

Mechanical Turk
Private labelling workforce
Vendor

Mechanical Turk workforce is a team of global, on-demand workers from Amazon that work around the clock on labelling and human review tasks. Your data should be free of any personally identifiable information (PII) as this is a public workforce. You should use this workforce if you want to save time on the labelling work which anyone could do and if there are no PII within your data.

Private labelling workforce is a team of workers which you choose. They could be employees of your company or a group of subject matter experts. For example, if you have a dataset containing X-ray images and you want to classify those images whether they contain a certain disease or not. Another situation is when your data contains PII, and you want a private workforce to label them.

Vendor workforce is a selection of experienced vendors who specialize in providing data labelling services. They could be found at the AWS marketplace.

Let us now take a look at the different types of labelling jobs available for the image data type:

Images

Image classification (Single label)

In this task, the employees are categorising images into individual classes (1 class per image).
In this example we are either choosing Basketball OR Soccer as a label for this image.

Image classification (Multi-label)

In this task, the employees are categorising images into one or more classes.
In this example we are choosing ALL labels present within the image.

Bounding box

In this task, the employees should draw bounding boxes around specified objects in the images.
In this example we want to specify the location of the birds within the image by drawing bounding boxes which surrounds them.

Semantic segmentation

In this task, the employees should draw pixel level labels around specific objects and segments in the image.
In this example we are classifying EACH PIXEL within the image. So you can see that the pixels of the plane are coloured in red and the rest are in black.

Label verification

In this task, the employees should verify existing labels in the dataset. This could be used to check prior work by human workers or automated labeling jobs.
In this example we want to verify the car's label as being correct or incorrect.

DEV Community

Amazon SageMaker GroundTruth

Top comments (0)