Understanding the concepts underpinning machine learning is critical as technological advances provide access to tools professionals can deploy to facilitate decision-making.
What is Supervised Learning?
Simply put, supervised learning involves teaching a model using labeled data for later use in predicting the labels for new data.
Labeled data comprises raw data with added features or informative labels that provide information that is the basis for training an ML model. The labels provide context for the model to learn the underlying patterns to help it correctly predict a certain output based on new data fed into the model. This process is known as model training.
Understanding Classification in Supervised Learning
Supervised learning comprises two broad techniques: Classification and Regression.
Classification deals with data assigned to specific categories or classes, unlike regression, which deal with numerical data to make predictions.
Types of Classification with Examples
- Binary classification- Sorts data into two categories or classes. For instance, email spam filtering ("Spam" or "Not Spam").
- Multiclass classification - Sorts data into more than two classes or categories. For example, an image recognition model classifying images using labels such as bus, car, and motorcycle.
- Multilabel classification - Sorts data into multiple labels. For instance, content recommendation algorithms that classify a song title into multiple genres.
The most popular classification algorithms include logistic regression, decision tree, random forest, and K-nearest neighbors (KNN). Each model suits certain scenarios and data to provide useful predictions.
Notes on Effectively Applying Supervised Learning & My Perspective on Classification
Providing the labeled input data provides a reference point for the model to associate the data with a certain predicted output. Therefore, the model's prediction acuity is as good as the data one feeds into it. A model trained using incomplete, biased, or incorrectly labeled data will yield unreliable results. Such a model cannot achieve the same degree of prediction accuracy as one relying on clean data. Data cleaning and preprocessing can help detect anomalies before model training to optimize the model and improve reliability.
Learning about classification and its use cases in real-world applications of supervised learning has been enlightening. However, encountering near-perfect datasets to apply classification was a major challenge, given that overfitting contributes to biased conclusions when predictions of overly perfect data are used in decision-making. Practicing applying metrics such as accuracy, precision, recall, F1 score, and the confusion matrix will be crucial going forward.
Top comments (0)