DEV Community

Ertugrul
Ertugrul

Posted on • Edited on

Machine Learning Algorithms Comparison: KNN vs SVM vs Decision Tree

πŸ‡ΉπŸ‡· TΓΌrkΓ§e versiyon


πŸ”„ What We Will Compare

This post compares three popular supervised ML algorithms:

  • K-Nearest Neighbors (KNN)
  • Support Vector Machine (SVM)
  • Decision Tree (DT)

We'll evaluate them based on:

  • Accuracy
  • Macro Avg Metrics (Precision, Recall, F1)
  • Weighted Avg Metrics (Precision, Recall, F1)

πŸ”Ή The Dataset: Glass Identification

We use the classic Glass Identification dataset which contains:

  • 214 Rows
  • 10 Columns: Refractive Index + 8 chemical properties + Glass Type
import pandas as pd
csv = pd.read_csv("glass.csv")
print(csv.columns)
Enter fullscreen mode Exit fullscreen mode

πŸ” Features

  • RI (Refractive Index)
  • Na, Mg, Al, Si, K, Ca, Ba, Fe (Chemical elements)
  • Type (Target class)

🌐 Resources for Learning Algorithms

Instead of long theoretical definitions, here are great visual + article resources for each algorithm:

πŸ”’ KNN:

KNN

🌳 Decision Tree:

DT

βš–οΈ SVM:

SVM


πŸ“… Project Structure

.
β”œπŸ“„ glass.csv           <- Dataset
β”œπŸ—‹ Dt.py              <- Decision Tree class
β”œπŸ—‹ KNN.py             <- KNN class
β”œπŸ—‹ SVM.py             <- SVM class
β”œπŸ—‹ tools.py           <- Visualization tools
β”œπŸ—‹ db.py              <- Data inspection helper
β””πŸ—‹ main.py            <- Main runner
Enter fullscreen mode Exit fullscreen mode

πŸ“ˆ Data Preparation

Each class implements a common interface:

x_train, x_test, y_train, y_test = model.data_preprocces()
Enter fullscreen mode Exit fullscreen mode

Steps:

  1. Read CSV
  2. Drop last column for X
  3. Use last column as y
  4. Split into train and test (80/20)

Supported by: Dt.py, KNN.py, SVM.py


πŸ’‘ Model Integration

Each class supports a predict_report(choice, x_train, x_test, y_train, y_test) method with the following choices:

  • 0: Return predictions
  • 1: Return accuracy
  • 2: Return confusion matrix
  • 3: Return full classification report

🌿 Visualization Tools

The tools.py module provides utility methods to plot each metric using matplotlib:

Tools.Acc_table(report_list)
Tools.macro_prec(report_list)
Tools.macro_recall(report_list)
Tools.macro_f1(report_list)
Tools.wei_prec(report_list)
Tools.wei_recall(report_list)
Tools.wei_f1(report_list)
Enter fullscreen mode Exit fullscreen mode

These will display stem charts for each metric and print scores for each model.
Output of the head command


πŸ“Š Comparison Results

βœ” Accuracy

Acc
Winner: Decision Tree

🌱 Macro Avg Precision

Macro Prec
Winner: Decision Tree

πŸ“Š Macro Avg Recall

Macro Recall
Winner: Decision Tree

πŸ“Š Macro Avg F1

Macro F1
Winner: Decision Tree

πŸ“Š Weighted Avg Precision

Weighted Prec
Winner: Decision Tree

πŸ“Š Weighted Avg Recall

Weighted Recall
Winner: Decision Tree

πŸ“Š Weighted Avg F1

Weighted F1
Winner: Decision Tree


πŸ”Ή Conclusion

βœ… Decision Tree outperforms both SVM and KNN on every evaluation metric for this dataset.

This doesn’t mean it’s the best overall, but for Glass Type classification, DT gives the best results without any hyperparameter tuning.


πŸ“† Try it Yourself

Check out the full source code here:

All code is modularized for reuse. You can:

  • Change the dataset
  • Add new models
  • Extend the visualizations

✨ Star the Repo

If you found this project helpful, consider giving it a star on GitHub!

🌟 GitHub - Ertugrulmutlu/Machine_Learning_Alg_Comp

Thank you for reading! πŸš€

Top comments (0)