DEV Community

loading...

False positives and ML

petercour
・2 min read

Computers sometimes come to the wrong conclusions. That's because of different reasons. The model (Machine Learning algorithm) may be incorrect, it may be not enough data and so on.

Lets take the simple flower data set and find the number of false positives. The Python program below calculates:

#!/usr/bin/python3
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of false positives out of a total %d points : %d"  % (iris.data.shape[0],(iris.target != y_pred).sum()))

That's 6 errors out of 150. You could say it's not much, but it depends on what kind of data you are dealing with and what kind of fault tolerance is allowed.

Predictive algorithms sometimes come to the wrong conclusion. There are 4 possible outcomes:

In an ideal situation, it would score 100% and there would be only true positives and true negatives. In practice algorithms often are less accurate.

For a toy program that's not such a big of a deal, but what if the algorithm is driving your car or flying the plane you sit in?

Indeed, an error in your self driving airplane could be a disaster:

You want to optimize your algorithm as much as possible, with lots of training and test data.

Related links:

Discussion (0)