DEV Community: Oluwafemi Paul Adeyemi

Ensemble Models: A Comprehensive Overview

Oluwafemi Paul Adeyemi — Fri, 11 Jul 2025 04:08:27 +0000

Ensemble models are a class of machine learning algorithms that combine the predictions of multiple base models to improve overall performance and robustness. By leveraging the strengths of individual models, ensemble methods can often achieve better results than any single model.

Types of Ensemble Models

Bagging: Bagging involves training multiple instances of the same model on different subsets of the training data. The final prediction is typically made by averaging or voting the predictions of individual models.
Boosting: Boosting involves training models sequentially, with each subsequent model focusing on the errors of the previous model. The final prediction is made by combining the predictions of individual models.
Stacking: Stacking involves training a meta-model to make predictions based on the predictions of multiple base models.

Benefits of Ensemble Models

Improved accuracy: Ensemble models can often achieve better performance than individual models by reducing overfitting and improving generalization.
Robustness: Ensemble models can be more robust to outliers and noisy data by averaging out the predictions of individual models.
Handling complex data: Ensemble models can handle complex data by combining the strengths of individual models.

Popular Ensemble Algorithms

Random Forest: A bagging-based ensemble algorithm that combines multiple decision trees to improve performance and robustness.
Gradient Boosting Machines (GBM): A boosting-based ensemble algorithm that combines multiple weak models to create a strong predictive model.
AdaBoost: A boosting-based ensemble algorithm that combines multiple weak models to create a strong predictive model.

Applications of Ensemble Models

Classification: Ensemble models can be used for classification tasks, such as image classification, text classification, and sentiment analysis.
Regression: Ensemble models can be used for regression tasks, such as predicting continuous outcomes.
Feature selection: Ensemble models can be used for feature selection by evaluating the importance of individual features.

Challenges and Limitations

Computational complexity: Ensemble models can be computationally expensive to train and evaluate.
Overfitting: Ensemble models can suffer from overfitting if the individual models are overfitting to the training data.
Interpretability: Ensemble models can be difficult to interpret due to the complexity of the individual models and the ensemble structure.

Example Python Code

Import necessary libraries
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

#Generate a sample classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=3, random_state=42)

#Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Define individual models
model1 = LogisticRegression()
model2 = RandomForestClassifier()
model3 = SVC(probability=True)

#Create an ensemble model using VotingClassifier
ensemble = VotingClassifier(estimators=[('lr', model1), ('rf', model2), ('svm', model3)])

#Train individual models and the ensemble model
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)
ensemble.fit(X_train, y_train)

#Make predictions using individual models and the ensemble model
y_pred1 = model1.predict(X_test)
y_pred2 = model2.predict(X_test)
y_pred3 = model3.predict(X_test)
y_pred_ensemble = ensemble.predict(X_test)

#Evaluate the performance of individual models and the ensemble model
print("Accuracy of Logistic Regression:", accuracy_score(y_test, y_pred1))
print("Accuracy of Random Forest:", accuracy_score(y_test, y_pred2))
print("Accuracy of SVM:", accuracy_score(y_test, y_pred3))
print("Accuracy of Ensemble Model

:", accuracy_score(y_test, y_pred_ensemble))

This code demonstrates the use of ensemble models by combining the predictions of logistic regression, random forest, and support vector machine (SVM) models. The ensemble model is created using the VotingClassifier class from scikit-learn, which combines the predictions of individual models using voting.

Best Practices

Choose the right ensemble method: Select an ensemble method that is suitable for the problem and data.
Select diverse base models: Select base models that are diverse and complementary to improve the performance of the ensemble.
Tune hyperparameters: Tune the hyperparameters of individual models and the ensemble structure to optimize performance.

By following these best practices and understanding the benefits and limitations of ensemble models, practitioners can effectively use ensemble methods to improve the performance and robustness of their machine learning models.

Unsupervised Learning

Oluwafemi Paul Adeyemi — Sat, 27 Jul 2024 00:03:47 +0000

Unsupervised Learning involves a set of algorithm that are used to learn patterns from a data without targets (labels). Unsupervised Learning does not require that each that data point in the dataset be This is Contrarily to Supervised Learning that requires that each data point should have a label, which means that the dataset consists of features and targets.
Basically, in unsupervised learning, the features are not labelled. That means that there is no target to be predicted and there is an interest in finding some patterns in the features. Fundamentally, unsupervised learning involves clustering or dimension reduction. However, there are five types of unsupervised learning. Dridi ^[1] discussed four types of task that can be carried out in unsupervised learning which include: Clustering, association, anomaly detection and autoencoders. But on examination, principal component analysis (pca) and autoencoders are only some approach to carrying out dimensionality reduction which is a task that can be carried out using unsupervised learning:

Clustering

The interest is to put the unlabeled data into categories called clusters, such that objects (data points) in a cluster are more similar than objects in other clusters. A cluster is thus, a collection of objects (data points) with similar characteristics, such objects in these collection are dissimilar to objects in other collections. The types of clustering are: partition, hierarchical, overlapping (fuzzy set) and probabilistic clustering.
a. Partition: Here, an object can belong to one and only one cluster
b. Hierarchical: starts with all the data points as a separate cluster (which means that if the dataset has 1000 data points, then we initially will have 1000 clusters) and then ends with fewer number of clusters, iteratively by taking points that are close by, until the required number of clusters is reached.
c. Overlapping: Each datapoint can belong to two or more clusters with some degree of membership to each cluster.
d. Probabilistic: This method uses some probability to assign data points to clusters.

Types of partitioning clusters

K-means: this algorithm divides a group of data points into non-overlapping clusters using centroids. With an assumption that the clusters are of equal sizes, the joint distribution of features have equal variance, and that there are independent features with similar cluster density^[2]. Basically, K-centroids are chosen and points that are close to each centroid are put in a cluster with the centroid. The measure of closeness is determined by some measure of distance between points, such as the Euclidean distance. Then these two steps written below follow: (a) A new centroid is thereafter obtained for each cluster by finding the average of the points in the cluster, (b) Using the distance measure metric again, points that are close to the centroid are put in same cluster with the new centroid ^[2]
(a) and (b) continue until the difference between the centroids from a preceding iteration and the next iteration (the iteration that immediately follows) is lower than a threshold value.

For small K's computation is usually time saving, but with an increase in K, the computation becomes complex and slower. This algorithm is probably the most popular clustering algorithm in unsupervised learning. There are other variants of this algorithm like the mini-batch k means ^[2]
Mini-Batch-K-Means: This is variant of K-means which uses subsets of the input data which are randomly sample for every iteration; this subsets are called mini-batches. This technique saves computation time and may only have slightly worse results when compared to the traditional K-means^[2].
Bisecting-K-means: This algorithm starts with a single centroid which implies a single cluster for the whole data and then splits each cluster into two until the desired number of clusters is reached^[2].
Mean Shift: Here, instead taking all the centroid at once as in K-means, a centroid is taken, within a circle of radius, r of points, and only points in the radius, r are considered and used to compute a new centroids, and as the centroid shifts, the circle shifts until it converges, which is the birth of the first cluster; then an entirely new centroid is obtained and undergoes the process by which the first cluster was obtained and the process goes on and on until there is the desired number of clusters^[2].

Association

The interest here is to find a defining rule which represents a relationship. E.g If A is connected to B and B is connected to C, then A is connected to B, Customers who purchase item A also purchase item B e.t.c. Association includes:
1. Market basket analysis
2. Customer clustering
3. Price bundling
4. Assortment decisions
5. Cross selling e.t.c^[1]

Anomaly detection

This is concerned with detecting outliers. This is useful in detecting hacked network (by anomaly network traffic), military surveillance, data cleaning and crime detection^[1].

Dimensionality reduction

This can be carried out using pca and autoencoders. Here, the data is reduced to a lower dimension ^[1].

Principal Components Analysis: Principal Components Analysis (pca), is a data reduction technique used when there is multicollinearity between the features. The features are reduced into components using Eigen vectors. The corresponding Eigen values are used to determine the variation in the data. The sum of the Eigen values = trace of variance-covariance matrix which is the total variation in the data. Hence the variation explained by a component = Eigen value of the component/sum of the Eigen values. Pca can be used for facial recognition, image compression, moving recommendation system, and optimizing the power allocation in various communication e.t.c.

Reference

Dridi, S. (2022, April 4). Unsupervised Learning - A Systematic Literature Review. https://doi.org/10.31219/osf.io/kpqr6
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, Fabian, Mueller, A.,Grisel, O., … Ga"el Varoquaux. (2013). API design for machine learning software: experiences from the scikit learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning (pp. 108–122) (Check the Documentation User Guide)

R is Heavily Functional and Classy too

Oluwafemi Paul Adeyemi — Sat, 19 Aug 2023 22:58:25 +0000

I believe that R was primarily made functional because it was intended to be used by Statisticians who are used to functions. For instance, whereas python gives a pandas dataframe object a head method, base R uses a head function to get the first few rows of the dataset been worked on. By base R, it is meant the basic codes, the ground functionalities of R upon which other programs (or packages) can be built.

# Python
data_head = data.head()

# R
data_head = head(data)

This is the case with most of the base R in-built features, however, R still allows for object oriented programming too. Meaning that you can also create classes and create objects from them. But suppose you wanted to make a class, how do you do that?

First, the easiest way to write a class is to use the reference class.

# R
library(methods)

Vehicle = setRefClass("Vehicle", 
  fields = list(
     name = "character", 
     engine = "character", 
     weight = "numeric"
  ), 
  methods = list(
     horn = function(){
        # some codes
     },
     move = function(){
        # some codes
     }
  )
)
car1 = Vehicle(name = "Toyota", engine="s8", weight=45)
# print the name of the vehicle
car1$name
# Let the vehicle horn
car1$horn()
# Let the vehicle move
car1$move()

Another variant of classes in R is the R6 class illustrated below.

# R
library(R6)

Vehicle <- R6Class("Vehicle",
  public = list(
    name = NULL,
    engine = NULL,
    weight = NULL
    initialize = function(name = NA, engine = NA, weight = NA) {
      self$name <- name
      self$engine = engine
      self$weight = weight
      self$horn()
    },
    horn = function(val) {
      # some codes
    move = function() {
      # some codes
    }
  )
)

car1 = Vehicle$new(name = "Toyota", engine="s8", weight=45)
# print the name of the vehicle
car1$name
# Let the vehicle horn
car1$horn()
# Let the vehicle move
car1$move()

I guess you understood that in both cases instead of using the . operator as in data.head() in Python, we are using the $ operator in R: as in car1$name, car1$horn() e.t.c. and that the major differences are that Vehicle$new was used to create a new Vehicle object ( a car named Toyota), a public access modifier was used and an initializer (constructor) was added in the R6 class. while that was not the case with the reference class. An object oriented version of Shiny (for building web apps) called Tidymodules uses the R6 class in its design. See 1 and 2 for more details on the classes available in R.

R may not have the Python, Java, C++, C# e.t.c type of classes, but similar concepts are addressed in its classes. R still very much supports object oriented programming, even though it is heavily functional.

Scikit-Learn, from Python to JavaScript

Oluwafemi Paul Adeyemi — Sat, 19 Aug 2023 22:22:31 +0000

You may have found Scikit-learn to be a great package for your traditional Machine Learning tasks, especially if you are not creating your own implementations of the algorithms you use - going by a good tip in programming, which says: do not reinvent the wheel. Truth be told, you may feel you have a great tool, until you are about to switch to an entirely different programming language. The problem is how do you carry your Scikit-learn tool to another language?

If you are switching to R, there is a way around that: you just need to use a package called Reticulate to link Python and R, and then you can still use your Scikit-learn within R, if you really want to stick to Scikit-learn. But what if you were to switch to JavaScript? Well, fortunately, there are two solutions I can recommend.

1. Scikit.js

It takes very much after its mother, Scikit-learn from Python, except for four basic differences. With reference to the codes below, these differences are: (1) in JavaScript, while every other function takes in positional arguments (as in Python), class constructors take in objects (in the JavaScript code below, object = { fitIntercept: true }). Python, however, allows constructors (initializers) to take in positional arguments.

Python

from sklearn.linear_model import LinearRegression

x = [[3, 2, 5, 7],
     [1, 2, 5, 2], 
     [2, 8, 9, 7]]
y = [45, 32, 52]
model = LinearRegression(fit_intercept = True)
model.fit(x, y)
print(model.coef_)

JavaScript

import * as tf from '@tensorflow/tfjs'
import { LinearRegression, setBackend } from 'scikitjs'
setBackend(tf)

let x = [[3, 2, 5, 7],
         [1, 2, 5, 2], 
         [2, 8, 9, 7]]
let y = [45, 32, 52]
let model = new LinearRegression({ fitIntercept: true })
await model.fit(x, y)
console.log(model.coef)

(2) While camel case is used in JavaScript, underscore case are used in Python: you can see this from the fitIntercept (which is a camel case) and fit_intercept (which is an underscore case) attributes of JavaScript and Python respectively. (3) In JavaScript, there is always an await keyword written before calls to the fit method. Python does not await function calls. This can also be seen in the codes above.
(4) JavaScript demands a new keyword when creating objects while Python does not. This is illustrated in the codes above.

See Scikit.js for more details.

2. Mljs

This package may not offer the exact functionalities of Scikit-learn, however it is easy to see some similarities in fitting of models and prediction as illustrated below:

import SimpleLinearRegression from 'ml-regression-simple-linear';

const x = [[3, 2, 5, 7],
           [1, 2, 5, 2], 
           [2, 8, 9, 7]]
const y = [45, 32, 52]
const model = new SimpleLinearRegression(x, y);
console.log(model.coefficients)

See Mljs package for more details.

SL Classification using Python and R

Oluwafemi Paul Adeyemi — Wed, 09 Aug 2023 13:39:26 +0000

I will be considering the following Supervised Learning classification Algorithms: logistic regression, support vector machine (SVM), k-nearest neighbors(KNN), naive-bayes, decision tree, random forest and extremely randomized trees (also called extraTrees). I will be implementing them in Python and R.

Using Python

I will be using the package called Scikit-Learn which is easy to learn and also friendly to developers who do not major in ML.

First, I import the dataset I want to use - the iris dataset.

from sklearn.datasets import load_iris

Note that if you want to use a dataset that is not built in then you write

import pandas as pd
data = pd.read_csv('filename.csv', sep='symbol')

sep is usually a comma, ',' but it can also be a different symbol. You can do well to investigate the data.

Next, I import the classes that contain the models I want to use.

from sklearn.linear_model import LogisticRegression 
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

Finally, I import the function train_test_split used to split a dataset into the training and testing sets and the function cross_val_score used to find a cross validated model accuracy.

from sklearn.model_selection import train_test_split, cross_val_score

Now, I need to obtain the data

data = load_iris()

# You can check the structure of the data
print(data)

Then I separate the data to features (x) and the target (y)

x = data.data
y = data.target

Next, I split the data into the testing and training sets for x and y. I am taking 80% of the original dataset to be the training data and the remaining 20% to be the test data.

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8)

Now I can fit the models. Note that the predicted_y is a a one-dimensional numpy array of the predicted values given x_test. Note also that test_accuracy is actually the accuracy of the model's prediction on the basis of the test data while cross_validated_accuracy is the accuracy of the model's prediction based on a number of subsets of the training data (I use 5 subsets of the training data by setting cv = 5 in the cross_validation_accuracy).

1. Logistic Regression

# logistic
lr_model = LogisticRegression(max_iter=500)
lr_model.fit(x_train, y_train)

predicted_y = lr_model.predict(x_test) 
print(predicted_y)

cross_validated_accuracy = cross_val_score(lr_model, X=x_train, y=y_train, cv=5)
print(cross_validated_accuracy)

test_accuracy = lr_model.score(x_test, y_test)
print(round(test_accuracy, 4))

2. SVM

svm_model = SVC()
svm_model.fit(x_train, y_train)

predicted_y = svm_model.predict(x_test)
print(predicted_y)

test_accuracy = svm_model.score(x_test, y_test)
print(round(test_accuracy, 4))

3. KNN

knn_model = KNeighborsClassifier()
knn_model.fit(x_train, y_train)

predicted_y = knn_model.predict(x_test)
print(predicted_y)

test_accuracy = knn_model.score(x_test, y_test)
print(round(test_accuracy, 4))

4. Naive Bayes

nb_model = MultinomialNB()
nb_model.fit(x_train, y_train)

predicted_y = nb_model.predict(x_test)
print(predicted_y)

test_accuracy = nb_model.score(x_test, y_test)
print(round(test_accuracy, 4))

5. Decision Tree

dt_model = DecisionTreeClassifier()
dt_model.fit(x_train, y_train)

predicted_y = dt_model.predict(x_test)
print(predicted_y)

test_accuracy = dt_model.score(x_test, y_test)
print(round(test_accuracy, 4))

6. Random Forest

rf_model = RandomForestClassifier()
rf_model.fit(x_train, y_train)

predicted_y = rf_model.predict(x_test)
print(predicted_y)

test_accuracy = rf_model.score(x_test, y_test)
print(round(test_accuracy, 4))

7. Extremely Randomized Trees

et_model = ExtraTreesClassifier()
et_model.fit(x_train, y_train)

predicted_y = et_model.predict(x_test)
print(predicted_y)

test_accuracy = et_model.score(x_test, y_test)
print(round(test_accuracy, 4))

Using R

The caret package will be used for this purpose. Note that the cross validation accuracy for each model will be displayed when the model is printed.

First, I import the dataset package which contains the iris dataset

library(datasets)

Metrics is a package containing functions that return certain metrics among which is a function called accuracy that returns the accuracy of a model's prediction.

library(Metrics)

and the dataset package

library(datasets)

Note that if you wanted to use a dataset that is not built in then you write

data = read.csv('filename.csv', sep ='symbol')

sep is usually a comma, ',' but it can also be a different symbol. You can do well to investigate the data

Next, I import the caret package which uses a number of other packages (you may not necessarily know) to do stuffs in machine learning.

library(caret)

Now, I need to obtain our data

data = iris

# You can check the structure of the data
print(data)

Using the function createDataPartition, I am taking 80% of the original dataset to be the training data and the remaining 20% to be the test data.

train_index = createDataPartition(y=data$Species, p=0.8, list=FALSE)

Then, I separate the data into training and testing sets.

train_data = data[train_index,]
test_data = data[-train_index,]

Then, I convert the target variable to a factor, because it was initially a string of characters. But after making it a factor, it is discretized.

train_data$Species = factor(train_data$Species)
test_data$Species = factor(test_data$Species)

Next, I control the computational nuance of the train function which I will be using soon.

# The trainControl function uses 5 k_folds for cross validation, hence number = 5
control = trainControl(method = "cv", number=5)

Next, I fit the models. Note that in the train function, preProcess = c("center", "scale") processes the data such that for each variable, the mean is subtracted from each data point due to center and for each variable also, it divides all the data points by the standard deviation due to scale - this means that using preProcess = c("center", "scale") standardizes the data. Note also that tuneLength is actually an integer denoting the amount of granularity in the tuning parameter grid.

Note that the predicted_y is a vector of the predicted values given x_test. Note also that test_accuracy is the actually the accuracy of the model's prediction on the basis of the test data while cross_validated_accuracy is the accuracy of the model's prediction based on some subsets of the training data.

1. Logistic Regression


logistic_model = train(Species ~.,
                       data = train_data,
                       method = "multinom",
                       trControl=control,
                       preProcess = c("center", "scale"),
                       tuneLength = 10)

print(logistic_model)

predicted_y = predict(logistic_model, test_data)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

2. Suport Vector Machine

svm_model = train(Species ~., 
                  data = train_data, 
                  method = "svmLinear",
                  trControl=control,
                  preProcess = c("center", "scale"), 
                  tuneLength = 10)

print(svm_model)

predicted_y = predict(svm_model, test_data)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

3. KNN

knn_model = train(Species ~.,
                  data = train_data, 
                  method = "knn", 
                  trControl=control, 
                  preProcess = c("center", "scale"), 
                  tuneLength = 10)

print(knn_model)

predicted_y = predict(knn_model, test_data)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

4. Naive Bayes

nb_model = train(Species ~.,
                 data = train_data, 
                 method = "nb", 
                 trControl=control, 
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

print(nb_model)

predicted_y = predict(nb_model, test_data)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

5. Decision Tree

decision_tree_model = train(Species ~., 
                            data=train_data,  
                            method = "rpart",
                            trControl=control,  
                            preProcess = c("center", "scale"),
                            tuneLength = 10)

print(decision_tree_model)

predicted_y = predict(decision_tree_model, test_data)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

6. Random Forest

random_forest_model = train(Species ~., 
                            data=train_data,
                            method = "rf",
                            trControl=control,
                            preProcess = c("center", "scale"),
                            tuneLength = 10)

print(random_forest_model)

predicted_y =  predict(random_forest_model, test_data)
print(y_predicted)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

7. Extremely Randomized Trees

et_model = train(Species ~., 
                 data=train_data,
                 method = "ranger",
                 trControl=control,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

print(et_model)

predicted_y = predict(et_model, test_data)
print(y_predicted)

test_accuracy = accuracy(y_predicted, test_data[, ncol(test_data)])
print(sprintf('Test accuracy = %f', test_accuracy))

And that is it. I think it is super easy to write these codes, if only you see the repetitive patterns in the codes for both Python and R - actually, each language has its own unique pattern. Thanks for reading.
New to R? See:R, beyond Statistical Programming

New to Machine Learning? See:
Introduction to Machine Learning

Not sure which program to use for your ML? See:Best Programming Language for ML

Supervised Learning

Oluwafemi Paul Adeyemi — Mon, 07 Aug 2023 09:58:45 +0000

Image above from www.unsplash.com

Supervised Learning are machine learning methods that involve some features (input variables) and at least one target variable. A single feature can be used in machine learning, but the true picture is not likely to be seen. There are two types of supervised learning based on the quantitative (also called numeric targets and qualitative (also called categorical) targets. The target variables are the output variables. If the target is qualitative supervised learning is said to be classification. However, it said to be regression when the target is quantitative ^[1]. There exist many types of supervised learning algorithm for classification and regression. A number of them will be discussed below.

1. Linear Regression:

The Linear Regression model is actually similar to the linear function: $\ + \ c$ , usually taught in high school, where y is the target (dependent) variable, x is the feature (independent variable). But in linear regression, an error term is accounted for, so that the straight line equation above becomes $\ + \ c + e$ , which is called simple linear regression. Hence, $y = m_1x_1 \ + \ m_2x_2 + \ ... \ + m_ix_i + \ c \ + e_i$ , is an extension of the same concept, which is called multiple linear regression where y is the target as before and the $x_i$ 's are the features (independent variables). Here the idea is that there is a linear relationship between the target and the features, i.e. as the independent variable(s) change in quantity or quality, there is an approximate linear increase or decrease in the target variable. If that is the case, a linear regression can be used to build a model for the given dataset ^[2]. This algorithm cannot be used with categorical targets.

2. Logistic Regression

The logistic regression predicts the probability that an object belongs to a class given that there are at least two classes using the supplied set of features. For instance, during an admission process, a school may use the gender of the students, weight, intelligent quotient, hobby e.t.c as features (input variables) to predict whether a candidate should be admitted or not. Those admitted may be said to belong to class A while those not admitted may be said to belong to class B. Hence the target can either be A or B. Suppose that the probability that an object belongs to class A is $p_i$ , then the probability that it belongs to class B is $q_i = 1 - p_i$ . Logistic regression uses the formula $Σβixip_i = \frac{1}{1 \ + \ e^{-\beta_0 \ + \ \Sigma \beta i x_i}}$ and $\bigg( \frac{p}{1-p} \bigg) \ = \beta_0 + \Sigma \beta _i x_i$ . Hence, given a set of features, the logistic regression predicts two probabilities $p_i$ and $q_i = 1 - p_i$ . The larger of the two probabilities will determine the class of the object, so that class A is the predicted class if $p_i$ is larger than $q_i = 1 - p_i$ ; on the other hand class B is the predicted class if $q_i = 1 - p_i$ is larger. This is clearly a two class logistic regression called _binary logistic regression which is a simple case of multinomial logistic regression, where there can be more than two classes ^[3]. This algorithm cannot be used when the target of interest is quantitative.

3. Support Vector Machine (SVM)

This uses a hyperplanes to divide a given dataset into two categories in order to classify data points. A margin is obtained for each hyperplane such that a line is drawn from the hyperplane to the nearest points ( which are called support vectors ) on either sides of the hyperplane. see ^[4]. The hyperplane with the maximum margin is called the optimal hyperplane. Such hyperplane which is iteratively generated minimizes the classification error. SVM aims at obtaining a maximum marginal hyperplane (MMH) (which maximizes the margin) by separating the datasets into categories.^[5] If we have a two class SVM, Hyperplanes are decision boundaries ^[6].

A two class SVM can be Linear, in which case, the data can be separated into two classes using a single straight line, as shown in the scatter plot above; such a SVM is said to be non-linear if it cannot be separated into two classes using a straight line. A kernel function is used with the SVM algorithm when the data is not linearly separable ^[4]. SVM can be used when the target variable has two or more classes. This algorithm can be used for regression and classification purposes.

4. Naive Bayes

This procedure uses the Bayes probability to predict classes. The bases probability is $p(yi/x)=p(x/yi)∑i=0np(x/yi)p(yi)p(y_i/x)= \frac{p(x/y_i)}{ \sum_{i=0}^n{p(x/y_i)p(y_{i})}}$ where $p(y_i/x)$ is the probability that the target falls into the jth class given a set of features x. Hence $p(y_i/x)$ is calculated for all the classes and the image is classified as $y^=argmax p(yi/x)\hat{y} = argmax \ p(y_i/x)$ . There are three basic types of Naive Bases, viz: bernoulli, multinomial and gaussian. The bernoulli and multinomial naive bayes are used when we have categorical features. The bernoulli is the simplest case of the multinomial naive bayes. The gaussian naive bayes is used for continuous features; in this case $p(x/yi)=1σi2πe−12σ(x−μi)2p(x/y_i) = \frac{1}{ \sigma_i \sqrt{2\pi} }e^{ - \frac{1}{ 2\sigma}( x- \mu_i)^2}$ is first determined for all the variables and then x’s. ^[7]. This algorithm can only be used for classification purposes.

5. Decision Tree

Decision Tree is a tree which uses conditions (internal nodes or decision nodes) to split a tree (drawn upside down, with its root up) into branches (edges) until the end of a branch does not split anymore, i.e. until a leaf (decision) is arrived. The order of features, for building the tree can be selected using information index or gini index ^[5]. This algorithm can be used for classification and regression purposes.

6. Random Forest

This is an extension of the decision tree - as you know, a forest is made up trees. Here, a number of decision trees are
developed and the average value of the outcomes from the trees are taken for a regression task while the modal outcome is taken for a classification task. This algorithm uses a bagging scheme which can be one of two forms. The first kind of bagging called bootstrap aggregating random forest or bagging random forest, allows for the selection of a subset of the training data (i.e the training dataset is repeatedly sampled) without replacement and building a tree from each subset. The second kind of bagging allows for the selection of a subset of features for each candidate split ( i.e each tree to be built, where each tree is built from a subset of the training data ), hence it is called feature bagging random forest. The idea behind random forest is that sampling a subset of features or of the training data or both reduces the prediction variance of the model, since sub-sampling brings about less correlated trees. See ^[5], ^[7] and ^[8]. This algorithm can be used for classification
and regression purposes.

7. Extremely Randomized Trees

This is an extension of the random forest algorithm. Here, each tree is built using all the training data but with only a subset of the features and splitting of nodes is randomly done. See ^[7] and ^[8]. This algorithm can be used for classification and regression purposes.

8. K-Nearest Neighbours

This classifies a data point by selecting k neighbours which are the nearest to this data point, based on the measured distance between them and the data point of interest. Meaning that the smaller the distance, the closer the neighbour. Suppose that a neighbours belong to class A while k - a belong to class B and k - a > a, then the data point to be classified belongs to class B. This algorithm is non-parametric and very slow with large number of datasets because it calculates the distance between the point and every other points around it. This algorithm can be used for classification and regression purposes ^[7].

Next: SL for Classification using Python and R

Previous: Introduction to Machine Learning

References

Liu, Q., & Wu, Y. (2012). Supervised Learning. in: Seel, N.M. (eds) Encyclopedia of the Sciences of Learning. Springer, Boston MA. https://doi.org/10.1007/978-1-4419-1428-6_451
Maulad D.H. & Abdulazez A.M. (2020). A Review of Linear Regression Comprehensive in Machine Learning. Journal of Applied Science and Technology Trends, 1(4), 140 - 147
Peng J. (2002). An introduction to Logistic Regression Analysis and Reporting.The Journal of Educational Research, 96(1), 3 - 14, DOI:10.1080/00220670209598786
Ruscica, T. (2019, November 23). Python Machine Learning & AI Mega Course - Learn 4 Different Areas of ML and AI [Video]. YouTube. https://www.youtube.com/watch?v=WFr2WgN9_xE
Tutorialspoint (2019), Machine Learning with Python. Tutorialspoint.
Wikipedia(2023, April 29). In Wikipedia. https://en.wikipedia.org/wiki/Decision_boundary#:~:text=A%20decision%20boundary%20is%20the,are%20not%20always%20clear%20cut.
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, Fabian, Mueller, A.,Grisel, O., … Ga"el Varoquaux. (2013). API design for machine learning software: experiences from the scikit learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning (pp. 108–122) (Check the Documentation User Guide)
Wikipedia (2023, July 25). In Wikipedia. https://en.wikipedia.org/wiki/Random_forest

R, beyond Statistical Programming

Oluwafemi Paul Adeyemi — Sun, 06 Aug 2023 21:34:56 +0000

Image from www.unsplash.com

R was specifically created for Statistical Programming (for Data Analysis) and is currently great for Data Science and Machine Learning. However, what it can be used for is far beyond those mentioned above. The following are examples of what R can be used for.

1. Games

Wait a minute R? Yes, R! R may look like the most unlikely program for such a thing, but it can be used to develop games. Well, what type of games? You can see some games developed in R here.

2. Web Applications

Ever heard of Shiny? Yes, with the R package called Shiny, you could build your own website. Although it is true that JavaScript frame works like: Angular and React (for frontend development) and Express.js (which is based on Node.js for backend) are great and specifically designed for web development, but you can take advantage of R to build a website that will serve a similar purpose. However, the draw back is that you may not be able to separate concerns, which was the main idea behind the frontend and backend developments, so that: full stack = frontend + backend. With the help of some other packages, you can also write web APIs and manage your Database. Find out more about shiny here.

3. Animations

While you may not be able to make animated movies like those from Disney, you could really make a lot of real animations using R. You could animate graphs, charts, pictures e.t.c. Find out more about R animations here.
Finally, while there are programming languages that were primarily designed to handle the three things discussed above, using R is of great advantage to an R programmer who may want to do such things, as the time that would have been spent learning a new language can be saved, provided the programmer is not under compulsion to use any particular tools. Now you know that R is really useful beyond Arithmetic.

If you find this article useful, please leave your comments and share with other people and Thank you.

Best Programming Language for ML

Oluwafemi Paul Adeyemi — Sun, 06 Aug 2023 21:25:31 +0000

Image above from www.unsplash.com

There is no single best programming language for Machine Learning. As with programming in general, whether you use R or Python or any other language, as a Machine Learning Engineer, there is a need to first understand the concepts you want to work on and the underlying principles by which the algorithms you write or use work. Many new programmers get confounded, learning languages upon languages just because some jobs out there are demanding for two or more of these programs. Such learning is not necessary, at least not at the beginning. Just pick one and start the journey.
It may take a while before you master the required concepts, but learning such concepts is worth your time. Just like anyone can use a scissors, any programmer can use a programming language. However, just like only a skilled tailor can do a tailor's job professionally with a scissors, because he understands the tailoring concepts, when you understand the concepts in and the underlying principles behind Machine Learning, you can do a professional job.

As you master the concepts in programming and in Machine Learning, with time, it becomes easier to switch languages if need be. You might then be like the tailor who decides to used a different kind of scissors for the appropriate occasion. So that as a tailor defines his tools, you define yours. You use the language you want or the one that is required of you. Experience, nonetheless is key, but start somewhere. Those who are called professionals today were novices yesterday

On a general note, for a start, if you will like to be involved in things like web development, android app development e.t.c. in future, apart from Machine Learning/Data Science, I strongly recommend that you go ahead with python but if you want to do Machine Learning/Data Science only, I strongly recommend R. See my Introduction to Machine Learning for more details on which programming language may serve you better.

Introduction to Machine Learning

Oluwafemi Paul Adeyemi — Sun, 06 Aug 2023 20:53:13 +0000

Image from www.unsplash.com

Machine Learning is a branch of Artificial Intelligence (AI) that deals essentially with computers learning from data. The idea is that computers should be able to learn from data, by finding patterns, remembering them and using them to make decisions when they are later fed with previously unseen data.

In Machine Learning the computer is fed with some input from which it obtains a pattern and based on this pattern, it can make decisions from time to time - hence, the computer may be said to make its own decisions whereas in Traditional Programming the computer is fed with a specific input for each desired output.

Five branches of Machine Learning are briefly discussed below.

1. Supervised Learning

In Supervised Learning, there is a target (output variable) for each set of features (input variables). Examples of Supervised Learning algorithms include: Linear Regression, Logistic Regression, Support Vector Machine.

2. Unsupervised Learning

In Unsupervised Learning, there is no target but instead some underlying patterns or characteristics are obtain from the features.
Examples of Unsupervised Learning algorithms include: K-means, Principal Component Analysis and Hierarchical Clustering.

3. Semi-Supervised Learning

Semi-Supervised Learning is somewhat between Supervised and Unsupervised Learning. Basically, it is used when the available dataset consists of a few labeled data points and much more unlabeled data points - you obtain labels for the unlabeled data points by some machine learning mechanism and then use the new dataset which consists of those which were originally labeled and those which have just been labeled to make predictions on new datasets. Examples of Semi-Supervised Learning algorithms include: Self Training, Co-Training and Label Propagation.

4. Neural Networks

Neural network is a form of machine learning that mimics the working of a human brain. Not that it is a human brain, but as with neurons in the brain there are neural networks of neurons so that data can be passed in, processed and then an output (information) is obtained. Typically, a neural network has an input layer and an output layer. However, it may also have one or more hidden layers. The higher the number of layers a network has, the deeper the network - hence the name Deep Learning. You do Deep Learning, when there are one or more hidden layer(s). Examples of Deep Learning algorithms include: Recurrent Neural Network, Convolutional Neural Network and Multi-Layer Perceptron.

5. Generative AI

This is one of the most recent developments in the field of AI and Machine Learning to be specific. Generative AI uses a set of specific Deep Learning models to find complex patterns. It can be used to generate new text, images, videos and audio. Some types of generative AI algorithms are: Generative Adversarial Network, Diffusion Model and Transformer-Based-Models.

In Machine Learning, computers make decisions just like men do. Every man you have seen has two hands, legs and so on. When you see a being and he has has these characteristics, you immediately say, yes I see a man. However computers cannot think like we do. We remember things like sounds, pictures e.t.c, but in order for the computer to even start working on them, these information (sound, videos, pictures, text, measurements e.t.c) must first be converted to numbers. Hence Machine Learning involves some Mathematics, Statistics and Computer Science (programming). You do well to say that the computer reasons in machine language, but you do not have to communicate with the computer using such a low level language.

There are many High Level Programming Languages that can be used for programming as a Machine Learning Engineer: They include:

1. Python:

This programming language is usually referred to as the simplest programming language for beginners because of the closeness of its syntax to everyday language. It can be used for many other purposes apart from Machine Learning or Data Analysis - it is a general purpose language. The following packages can be used for Machine Learning in Python:

Sklearn
Keras
Tensorflow
Skflow
Pytorch
Theano

2. R:

This programming language uses similar syntax with the S and S+ programming languages. R was specifically designed for Statistics and is now a great language for Data Scientist and Machine Learning Engineers. Its syntax may not look like Python's or any other programming language that share some resemblance in style with Python's object oriented programming (like Java and C++), however, it is also easy to learn. The following packages can be used for Machine Learning in R:

Caret

3. Java:

If you are not new to programming, then you have heard about the language called Java. It is a popular language like Python. However, it is faster than python and is a great choice if you are already a Java programmer but are new to Machine Learning.

4. C++:

As with many things in life, change is inevitable. C++ use to be one of the most sought after languages. But as with the production of automatic cars which brought about less preference for manual cars, other simpler languages have been used as substitutes. C++ is a great language for programmers who build operating systems, gaming systems and those who prefer to have more control or do things from the scratch. It is the fastest of the four languages discussed here.

5. Other Languages:

There are other languages like
Mathlab, Scala, Rust and JavaScript e.t.c that can be used for Machine Learning. However, that it is better to select any of the four languages (R, Python, Java or C++) discussed above for a start in Machine Learning because of the availability of libraries that you can leverage on, community support and employability.

If you are an absolute beginner, I suggest you use Python, which will be easier to learn with a less steep learning curve. But if you are involved in Data Analysis and do not intend to go into other aspects of tech like: web development, android app development e.t.c, I suggest you learn R. But if you will like your take off to be more challenging, either java or C++ should do. However, you may choose to stay with any other language you have learnt even if they are not part of the languages discussed here, but have provisions for Machine Learning. Nonetheless, I recommend you learn any of these four giants: R, Python, Java or C++ regardless of your level of experience.

Next : Supervised Learning