DEV Community

Cover image for Data classification in 8 lines of code

Posted on

Data classification in 8 lines of code

Classification is something very natural to do. Each time you look at something, you decide what group it belongs to. Is it a bird? is it a plane?

So much for human classification. Can computers do the same?
Yes, they can!

The popular solution is Machine Learning or Artificial Intelligence.
The Python module sklearn is a good choice. (Why Python for Machine Learning)

So why Machine Learning?

Can't you just write a bunch of if statements?

Well, great programmers are very lazy. Imagine having to write a computer program each time a customer wants an "intelligent robot".

To much work.

The code must learn itself! How? make the code learn from data.

And so you code

So to start you need to load Python and sklearn. We'll use a classifier named svm.

from sklearn import svm

Then we need data. No problem, here is a little bit of data.

x = [[2, 0], [1, 1], [2, 3]]
y = [0, 0, 1]

So what is x and y?

  • x are the measurements.
  • y is the output.

Look at y, there are two possible outputs: 0 and 1.

Remember, svm is a classifier. So it's output is either class 0 or class 1.

Train and predict

The next step, create the svm and train it. Like humans, Machine Learning algorithms need training or learning.

clf = svm.SVC(kernel = 'linear'), y)

Training time!

After training is completed, it can classify. Given new data [2,0] which class is it:

print (clf.predict([[2,0]]))

The magical program below

from sklearn import svm

x = [[2, 0], [1, 1], [2, 3]]
y = [0, 0, 1]
clf = svm.SVC(kernel = 'linear'), y)
print (clf.predict([[2,0]]))

So if you were curious about the terminology

  • svm = support vector machine
  • classifier = algorithm which outputs class
  • fit = train algorithm
  • artificial intelligence = computer code + data
  • machine learning = algorithms + data

With respect to code, it doesn't really matter what we call it all.

Related links:

Top comments (0)