This is a type of machine learning that learns the relationship between the input and the output.
It is defined by its utilization of labeled data which is a dataset that contains a lot of examples of features and target.
- Input : Better known as features or the x-variables
- Output : Referred to as the target or y-variables
Labelled data is the type of data that contains both the features and target/ Dataset that contains a lot of examples of features and target
Labelled data is the key difference between supervised and unsupervised ML
The process of learning the relationship of features and target from the dataset is identified as Training or Fitting
Each input is paired with the correct output.
The goal is to make accurate predictions when given new unseen data.
The two types of supervised learning algorithms are Classification and Regression
Classification: Where the output is a categorical variable (such as spam vs. non-spam emails, yes vs. no).
Regression: Where the output is a continuous variable (such as predicting house prices, stock prices).
CLASSIFICATION
Here algorithms learn from the data to predict an outcome or event in the future,
In a bank scenario whereby they would like to know if a customer will default on a loan. The historical data on the said customer will entail.
- Features : Attributes of the customer such as the credit history, loans, investments.
- Target : Represent whether a particular customer has defaulted in the past.
Represented by | |
---|---|
1 OR 0 | |
True Or False | |
Yes Or No |
They are used for predicting discrete outcomes.
If the outcome can take two possible values then it is Binary Classification.
Whenever the outcome contains more than two possible values it is Multiclass Classification.
REGRESSION
It is a supervised ML where algorithms learn from the data to predict continous values like salary, temperature, weight.
HOW SUPERVISED MACHINE LEARNING WORKS
It works through two key concepts Training data and Learning Process
FORM | DESCRIPTION |
---|---|
Training Data | The model is provided with a training dataset that includes features(input data) and corresponding target variables (output data) |
Learning Process | The algorithm processes the training data therefore learning the relationships between the input features and the output labels |
After training the model is evaluated using a test dataset to measure its accuracy and performance.
The models performance is optimized by adjusting parameters using cross-validation techniques to balance bias and variance.
Different Model Algorithms under Supervised Learning.
LINEAR REGRESSION
Used to predict the continous value (dependent variable) based on the features (independent variable) in the training dataset.
Linear regression is best known as the line of best fit. The value of the dependent variable which represents the effect is influenced by changes in value of the independent variables.
It is easy to understand, interpret and performs well for linearly separable data.
LOGISTIC REGRESSION
Classification algorithm used to predict binary outcomes.
The target variable (y) is categorical such as (yes/no, 0/1, True or False)
Logistic regression uses a logit function to make predictions about the probability that a a binary event will occur.
They experience difficulty capturing complex relationships.
DECISION TREE
It is a supervised learning algorithm used for classification and regression tasks.
Continously separates data in order to categorize or make predictions depending on the results of the previous set of questions.
Decision trees do not require a lot of data preparation like some linear models require.
K NEAREST NEIGHBORS
Statistical method that evaluates the proximity of one data point to another so as to decide whether or not the two data points can be grouped together.
The proximity of the data points represents the degree to which they are comparable to one another.
They are used in classification and regression tasks sensitive to noisy data.
RANDOM FOREST
An algorithm built on trees similar to decision trees.
A decision tree consists of a single tree however a random forest employs a number of decision trees so as to make judgments.
Can be used for both regression and classification.
GRADIENT BOOSTING
An ensemble learning method utilized for classification and regression tasks to improve prediction accuracy.
Utilizes boosting algorithm hence it combines multiple weak learner so as to create a strong predictive model. It sequentially trains models whereby each model tries to correct the errors made by its predecessor.
BOOSTING it is an ensemble learning technique which combines multiple weak classifiers to create a strong classifier.
NAIVE BAYES
It is based on applying bayes' theorem with the naive assumption that features are independent of each other given the class label.
BAYES THEOREM is a mathematical formula used to determine the conditional probability of an event based on prior knowledge and new evidence.
- Bayes theorem adjusts probabilities when new information comes in and helps make better decisions in uncertain situations.
FINAL THOUGHTS
In supervised machine learning the expected solution to a problem may not be known for future data but may be known and captured in a historical dataset and the task of supervised learning algorithms is to learn that relationship from historical data to predict an outcome, event or a value in future
Top comments (1)
Well put with simplicity of understanding.