DEV Community

Cover image for Comparison of Machine Learning Algorithms...
Ertugrul
Ertugrul

Posted on

Comparison of Machine Learning Algorithms...

Önemli*Makalenin Türkçe versiyonu için Linke tıkalyın*

Türkçe:https://dev.to/ertugrulmutlu/makine-ogrenme-algoritmalarinin-karsilastirilmasi-4o0d

In this article we will compare SVM - DecisionTree - KNN algorithms.

The Features we will compare:

  • Accuracy: The ratio of total correct predictions to total data. That is, the ratio of correct predictions to the total number of predictions.

  • Macro avg precision Score: The average of the precision for each class. Precision is the ratio of correct positive predictions to total positive predictions. This shows how accurately a class is identified.

  • Macro avg Recall Score: The average of the precision for each class. Precision is the ratio of true positive predictions to the total number of true positives. This indicates how successfully a class was detected.

  • Macro avg F1 Score: The average of the F1 score for each class. The F1 score is the harmonic mean of precision and sensitivity. This combines the model's classification ability into a single metric.

  • Weighted avg precision Score: The average of the weighted precision based on the sampling rate of each class. This provides a measure of precision weighted by the importance of each class.

  • Weighted avg Recall Score: The average of the weighted precision based on the sampling rate of each class. This provides a measure of precision weighted by the importance of each class.

  • Weighted avg F1 Score: The average F1 score weighted by the sampling rate of each class. This provides a measure of the F1 score weighted by the importance of each class.

First the definitions of algorithms.

Instead of giving definitions, I found it more appropriate to give you a source that explains them more properly.


  • KNN (K-Nearest-Neighborn):

KNN Photo

Source:


Video: https://www.youtube.com/watch?v=v5CcxPiYSlA
Article: https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761


  • DT(Decision tree):

Decision Tree Photo

Source:


Video: https://www.youtube.com/watch?v=ZVR2Way4nwQ
Article: https://medium.com/@MrBam44/decision-trees-91f61a42c724


  • SVM (Support Vector Machine):

SVM Photo

Source:


Video: https://www.youtube.com/watch?v=1NxnPkZM9bc
Article: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47

Now we can get started..

First let's take a look at the Database I will use

Database features:


Here, we will analyze our CSV using the Pandas library.

import pandas as pd
csv = pd.read_csv("glass.csv")
print(csv.head)
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We import the Pandas library.
  2. We read the CSV file with the Pandas library.
  3. Finally, we write the "head" command to get an overview of the CSV file.

The output of this code:

Output of the head command

As you can see, it gave us a general information about the content of the CSV file. It also gave us information about the number of rows and columns.

In this CSV file:

-214 Row
-10 Column
It is.


Now let's get the names of the columns:

import pandas as pd
csv = pd.read_csv("glass.csv")
print(csv.columns)
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We import the Pandas library.
  2. We read the CSV file with the Pandas library.
  3. Finally, we write the "columns" command to get an overview of the CSV file.

The output of this code:
Output of the columns script

As you can see, we got the names of the COlumns of the CSV file and then we learned the Type of this data.

In this CSV file:

-RI (Refractive index)
-Na (Sodium)
-Mg (Magnesium)
-Al (Aluminum)
-Si (Silicone)
-K (Potassium)
-Ca (Calcium)
-Ba (Barium)
-Fe (Iron)
-Type (Glass type)
is located.

In the light of this data, different types of glass were identified based on the refractive index of the glass and the chemical substances it contains.

Note: For more detailed information, please visit the Source site.

Source

The site where I downloaded the CSV file:
https://www.kaggle.com/datasets/uciml/glass

Now let's move on to our plan:

What We Know

  • Data in CSV files needs to be shaped for use in Algorithms

-Algortimas need to be written using a Library.

-Results need to be extracted graphically

Let's do the data preparation part.

Preparation of Data

First, let's count the libraries I will use:

  1. Sklearn
  2. Pandas
  3. Numpy
data = pd.read_csv(self.url, sep=",")
X = np.array(data.drop([columns[len(columns)-1]], axis=1))
y = np.array(data[columns[len(columns)-1]]) 
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,y, test_size= 0.2) 
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We read our CSV file by separating it with ',' (We use the PANDAS library for this operation)
  2. The 'X' data contains the properties of the data we want to predict (Type). With this code, we remove the 'Type' Column from the data and make all the data an array using the 'Numpy' library.
  3. 'y' data is the data we want to predict (i.e. 'Type'). We array it using the 'Numpy' library just like the 'X' data. 4.Finally, we divide this data into test and train. The reason for this is in the simplest terms to train algorithms with train data. With test data, determine the accuracy rate of the algorithm and take action. (We set this rate as 20% with the test_size command, but you can change it if you wish.)

Note: In larger databases or more complex Algorithms you may need validation data, but we don't need it here because we are doing a small and simple application.

Yes, our data is ready...

Integration of Algorithms:

Here we will integrate our algorithms with the Sklearn library.

-KNN**

from sklearn.neighbors import KNeighborsClassifier 
KNN = KNeighborsClassifier(n_neighbors=9)
KNN.fit(x_train,y_train) 
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We call the KNeighborsClassifier module from Sklear.neighbors.
  2. KNN is integrated. With the n_neighbors parameter, it is decided how many nearest neighbors to look at. (This value may vary according to the project and database.)
  3. Train the model with .fit command with x_train and y_train data.

-SVM

from sklearn import svm 
Svm = svm.SVC(kernel=linear)
Svm.fit(x_train,y_train)
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We call the svm module from sklearn.
  2. We call the Support Vector Classification function in Svm. (Briefly, this function allows you to perform classification using the Svm infrastructure). As hyperparameter (Kernel :'linear', 'poly', 'rbf', 'sigmoid') can be used.
  3. With the .fit command the model is trained with x_train and y_train data.

-Decision Tree

from sklearn.tree import DecisionTreeClassifier
Dt = DecisionTreeClassifier(random_state=9)
Dt.fit(x_train,y_train)
Enter fullscreen mode Exit fullscreen mode

To explain the code here in order:

  1. We call the DecisionTreeClassifier module from sklearn.tree.
  2. DecisionTree is integrated. With the random_state parameter, the stability of the algorithm is increased.
  3. With the .fit command, the model is trained with x_train and y_train data.

Now that we have integrated our algorithms, we can move on to visualization and comparison.

Visualization and Comparison:

First, let's count the libraries I will use:

  1. matplotlib In short, Matplotlib is a visualization library. It is simple to use and suitable for clean code writing.

All algorithms need to be trained to make comparisons. The code we will use after training:

dt_report =dt.predict_report(3, dt_x_train, dt_x_test, dt_y_train, dt_y_test)
svm_report =Svc.predict_report(3, svc_x_train, svc_x_test, svc_y_train, svc_y_test)
knn_report =Knear.predict_report(3, knn_x_train, knn_x_test, knn_y_train, knn_y_test)
Enter fullscreen mode Exit fullscreen mode

In short, we can print the values we want on the screen with the very simple predict_report command.

Sample output (taken from the internet):

Predict_score Photo

Now let's move on to the comparison:


-Accuracy

Acc Comp Graph

  1. Decision_Tree >> 0.6976744186046512
  2. KNN >> 0.6511627906976745
  3. SVM >> 0.6511627906976745

Here the algorithm with the highest prediction was Decision Tree.

-Macro avg precision Score

Macavg prec Comp Graph

  1. Decision_Tree >> 0.7226495726495727
  2. SVM >> 0.611111111111111
  3. KNN >> 0.5030501089324618

Here the algorithm with the highest prediction was Decision Tree.

-Macro avg Recall Score

Macavg recall Comp Graph

  1. Decision_Tree >> 0.6472222222222223
  2. SVM >> 0.5863095238095238
  3. KNN >> 0.4795454545454545

Here the algorithm with the highest prediction was Decision Tree.

-Macro avg F1 Score

Macavg F1 Comp Graph

  1. Decision_Tree >> 0.6738576238576238
  2. SVM >> 0.5548611111111111
  3. KNN >> 0.45506715506715506

Here the algorithm with the highest prediction was Decision Tree.

-Weighted avg precision Score

Weiavg Prec Comp Graph

  1. Decision_Tree >> 0.7241502683363149
  2. SVM >> 0.6627906976744186
  3. KNN >> 0.6219182246542027

Here the algorithm with the highest prediction was Decision Tree.

-Weighted avg Recall Score

Weiavg Recall Comp Graph

  1. Decision_Tree >> 0.6976744186046512
  2. SVM >> 0.6511627906976745
  3. KNN >> 0.6511627906976745

Here the algorithm with the highest prediction was Decision Tree.

-Weighted avg F1 Score

Weiavg F1 Comp Graph

  1. Decision_Tree >> 0.7030168797610657
  2. SVM >> 0.6397286821705426
  3. KNN >> 0.6020444671607461

The algorithm with the highest prediction was Decision Tree.

CONCLUSION

As a result, in this article, we compared 3 Machine Learning Algorithms and decided that Decision Tree is the best for the Database we have.

You can access the codes here and you can change and improve them as you wish.

-CODE : https://github.com/Ertugrulmutlu/Machine_Learning_Alg_Comp

If you have a "Suggestion-Request-Question", please leave a comment or contact me via e-mail...

Top comments (0)