Amazon Machine Learning Key Concepts
Data sources
| Term | Definition |
|---|---|
| Attribute | A unique, named property within an observation. In tabular-formatted data such as spreadsheets or CSV files |
| Datasource Name | A unique name for a dataset |
| Input Data | Collective name for all the observations that are referred to by a datasource. |
| Location | Amazon ML can use data that is stored within Amazon S3 buckets, Amazon Redshift databases, or MySQL databases in Amazon Relational Database Service (RDS) |
| Observation | A single data point that is part of a datasource |
| Schema | The information needed to interpret the input data, including attribute names and their assigned data types, and names of special attributes. |
| Statistics | Summary statistics for each attribute in the input data |
| Status | Indicates the current state of the datasource, such as In Progress, Completed, or Failed. |
| Target Attribute | the target attribute is the attribute whose value will be predicted by a trained ML model |
ML Models
| Term | Definition |
|---|---|
| Regression | ML model to predict a numeric value |
| Multiclass | ML model to predict values that belong to a limited, pre-defined set of permissible values. |
| Binary | ML model to predict values that can only have one of two state |
| Model Size | ML models capture and store patterns. The more patterns a ML model stores, the bigger it will be. ML model size is described in Mbytes. |
| Number of Passes | he number of times that you let Amazon ML use the same data records is called the number of passes. |
| Regularization | Regularization is a machine learning technique that you can use to obtain higher-quality models |
Evaluations
| Term | Definition |
|---|---|
| Model Insights | Amazon ML provides you with a metric to evaluate the predictive performance of your model. |
| Precision | the number of positive class predictions that actually belong to the positive class. |
| Recall | the number of positive class predictions made out of all positive examples in the dataset. |
| AUC | Area Under the ROC Curve (AUC) measures the ability of a binary ML model to predict a higher score for positive examples as compared to negative examples |
| Accuracy | Accuracy measures the percentage of correct predictions. |
| F1-score | The macro-averaged F1-score is used to evaluate the predictive performance of multiclass ML models. |
| RMSE | The Root Mean Square Error (RMSE) is a metric used to evaluate the predictive performance of regression ML models. |
| Cut-off | The cut-off is the threshold that you use to determine whether a predicted value is correct or not. |
Batch Predictions
| Term | Definition |
|---|---|
| Output Location | The results of a batch prediction are stored in an S3 bucket output location. |
| Manifest File | This file relates each input data file with its associated batch prediction results. It is stored in the S3 bucket output location. |
Real-time Predictions
Real-time predictions are for applications with a low latency requirement, such as interactive web, mobile, or desktop applications.
| Term | Definition |
|---|---|
| Real-time Prediction API | The Real-time Prediction API accepts a single input observation in the request payload and returns the prediction in the response. |
| Real-time Prediction Endpoint | To use an ML model with the real-time prediction API, you need to create a real-time prediction endpoint. Once created, the endpoint contains the URL that you can use to request real-time predictions. |
AWS WhitePaper Summary

Top comments (0)