Charles

Posted on Apr 13

Machine Learning Basics: What is ML? Supervised vs Unsupervised, Features vs Labels

#datascience #machinelearning #beginners #tutorial

In this article, we will cover:

What ML really is
The difference between supervised and unsupervised learning
What features and labels are – and why they matter

What is machine learning?

Machine learning (ML) is the subset of artificial intelligence (AI) focused on algorithms that can learn the patterns of training data and, subsequently, make accurate inferences about new data. This pattern recognition ability enables machine learning models to make decisions or predictions without explicit, hard-coded instructions.

Examples of Machine Learning.

1. Personal assistants and voice assistants.
ML powers popular virtual assistants like Amazon Alexa and Apple Siri. It enables speech recognition, natural language processing (NLP), and text-to-speech conversion. When you ask a question, ML not only understands your intent but also searches for relevant answers or recalls similar past interactions for more personalized responses.

2. Email Filtering and Management.
ML algorithms in Gmail automatically categorize emails into Primary, Social, and Promotions tabs while detecting and moving spam to the spam folder. Beyond basic rules, ML tools classify incoming emails, route them to the right team members, extract attachments, and enable automated personalized replies.

3. Transportation and Navigation.

Machine Learning has transformed modern transportation in several ways:

Google Maps uses ML to analyze real-time traffic conditions, calculate the fastest routes, suggest nearby places to explore, and provide accurate arrival time predictions.
Ride-sharing apps like Uber and Bolt apply ML to match riders with drivers, dynamically set pricing (surge pricing), optimize routes based on live traffic, and predict accurate ETAs.
Self-driving cars (e.g., Tesla) rely heavily on computer vision and unsupervised ML algorithms. These systems process data from cameras and sensors in real-time to understand their surroundings and make instant driving decisions.

Types of machine learning

Machine Learning generally falls into two main learning paradigms: Supervised Learning and Unsupervised Learning. These differ based on the type of data they use and the objective they aim to achieve.

1. Supervised Learning

Supervised learning trains a model using labeled data — where every input example is paired with the correct output (label). The goal is to learn the mapping between inputs and outputs so the model can accurately predict outcomes on new, unseen data.

Common Tasks:

Classification — Predict discrete categories (e.g., spam/not spam, cat/dog, approve/reject loan)
Regression — Predict continuous values (e.g., house price, temperature, sales forecast)

How it works:
In supervised learning, the model learns from examples where the answers are already known. It is given inputs (features) together with the correct outputs (labels), and over time it identifies patterns in the data. As it trains, it continuously adjusts itself to reduce the difference between its predictions and the actual answers.

Real-world examples:

Spam detection,
Image classification,
Credit risk scoring.

Analogy:
Think of a student learning with a teacher. The teacher shows examples and clearly labels them — “this is a cat,” “this is a dog.” Over time, the student begins to recognize the differences and can correctly identify new animals on their own.

2. Unsupervised Learning

Unsupervised learning works with unlabeled data. The model must discover hidden patterns, structures, or groupings on its own — without any “correct answers” provided.

Common tasks:

Clustering — grouping similar data points together (e.g., customer segmentation)
Association — finding relationships in data (e.g., people who buy X also buy Y)
Dimensionality reduction — simplifying data while keeping the most important information

Real-world examples:

Customer segmentation in retail (grouping shoppers based on buying habits),
Fraud detection in mobile money or banking (flagging unusual transactions),
Product recommendations on e-commerce sites (suggesting items similar to what you’ve viewed),
Music or movie suggestions based on what you like (Spotify, Prime Video).

Supervised vs Unsupervised Learning

Aspect	Supervised Learning	Unsupervised Learning
Data used	Labeled (features + answers)	Unlabeled (just features, no answers)
Goal	Predict an output / category	Find hidden patterns or groupings
Task types	Classification & regression	Clustering, association, dimensionality reduction
How hard to evaluate	Easy – you have ground truth to compare	Trickier – no "right answer" to check against
Real‑world examples	Spam detection, price prediction	Customer segments, fraud detection
Complexity	Generally simpler	More complex (no teacher to guide)

Key Takeaway:

Use Supervised Learning when you have labeled historical data and want to make predictions.
Use Unsupervised Learning when you have lots of raw data and want to discover insights or patterns you didn’t already know.

Modern systems often combine both. For example, many Large Language Models (LLMs) use self‑supervised learning during pre‑training, followed by supervised fine‑tuning and RLHF (reinforcement learning from human feedback).

Features vs Labels

If you're doing supervised learning, you'll run into two terms constantly: features and labels. Here's what they actually mean.

What is a Feature?
A feature is any piece of information you feed the model – a clue that helps it make a prediction. Features are also called independent variables, predictors, or attributes

Examples of Features:

In house price prediction: square footage, number of bedrooms
In spam detection: length of email, number of capital letters

Features can be numerical (age, price), categorical (gender, color), or text-based.

What is a Label?

A label is the answer the model tries to guess – the output or correct answer. Also called target or dependent variable

Examples of Labels:

House price prediction --> Actual sale price (Kshs)
Spam detection --> “Spam” or “Not Spam”

Labels are only available in supervised learning because they represent the ground truth.

Features vs Labels – Quick Comparison

Aspect	Features (the inputs)	Label (the answer)
What it is	What the model uses to learn	What the model tries to guess
Other names	Independent variables, predictors	Target variable, dependent variable
Do you always have it?	Yes – in any dataset	Only in supervised learning
House price example	Size, bedrooms, location	The price tag

Key Takeaway:

Features = clues. Label = the answer.
When preparing data for a supervised model, split it into X (features) and y (label).
Garbage in --> garbage out: bad features or wrong labels will ruin your model.

Conclusion

Machine Learning lets computers learn from data without hard‑coded rules.
Supervised learning uses labeled data to predict outcomes (spam detection, prices).
Unsupervised learning finds hidden patterns in unlabeled data (customer segments, fraud).
Features are the clues you feed the model. Labels are the answers you want to predict.