petercour

Posted on

Machine Learning Classification comparison

Classification is a task algorithms can do. Given a new sample, the classifier can decide what class it belongs to.

The algorithms we talk about are Machine Learning algorithms.

These algorithms classify based on data. For instance, data is collected on the flower data set.

Then based on these measurements and the algorithm, classification can be done.

These algorithms have a different performance. Some algorithms do better than others. The ideal is to have the classifier score as high as possible (near 100%), but classifiers don't always achieve that.

The Python module sklearn contains many implementations of algorithms

``````from sklearn.ensemble import  RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
``````

and many more. The program below evaluates how well they do the classification test for different algorithms.

``````#!/usr/bin/python3
#coding=utf-8

from sklearn import datasets
from sklearn.ensemble import  RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn.model_selection import train_test_split

X, y = iris.data, iris.target

# split data into training and test data.
train_x, test_x, train_y, test_y = train_test_split(X, y,
train_size=0.5,
test_size=0.5,
random_state=123)

clfs = {'random_forest' : RandomForestClassifier(n_estimators=50),
'knn' : KNeighborsClassifier(),
'bayes': MultinomialNB(alpha=0.01)}

def try_different_method(clf):
clf.fit(train_x,train_y.ravel())
score = clf.score(test_x,test_y.ravel())
print('the score is :', score)

for clf_key in clfs.keys():
print('the classifier is :',clf_key)
clf = clfs[clf_key]
try_different_method(clf)
``````