DEV Community

Mukumbuta
Mukumbuta

Posted on

Introduction to Scikit-Learn

In this article, we will be discussing Scikit learn. Scikit is an open-source Python library which provides a range of supervised and unsupervised machine learning algorithms. Besides this, Scikit also contains very powerful packages which include:
 NumPy
 Matplotlib
 SciPy
The above packages must be installed (using the Terminal) and imported in order to implement Scikit learn. In the same way, we need to import Scikit. Scikit is built upon SciPy (Science Python) which, also, must be installed.
To install SciPy, type the command below in the Terminal:
Pip install scipy
Scikit-learn comes from sample datasets, such as iris and digits. To use the afore mentioned, we need to import SVM (Support Vector Machine). SVM is a form of machine learning which is used to analyze data.
We can take digits dataset and it will categorize the numbers for us. Let’s consider the code below:

_import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm

digits= datasets.load_digits()
print(digits.data)_

The Output of the above code will be:
[[ 0. 0. 5. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 10. 0. 0.]
[ 0. 0. 0. ..., 16. 9. 0.]
...,
[ 0. 0. 1. ..., 6. 0. 0.]
[ 0. 0. 2. ..., 12. 0. 0.]
[ 0. 0. 10. ..., 12. 1. 0.]]

The imported libraries above gives us access to the features that can be used to classify digits sample. The same can be done with images. Let’s consider the following line of code:

_import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm

digits= datasets.load_digits()
print(digits.target)
print(digits.images[0])_

OUTPUT:
[0 1 2 ..., 8 9 8] // target of the data
[[ 0. 0. 5. 13. 9. 1. 0. 0.] // image of the data
[ 0. 0. 13. 15. 10. 15. 5. 0.]
[ 0. 3. 15. 2. 0. 11. 8. 0.]
[ 0. 4. 12. 0. 0. 8. 8. 0.]
[ 0. 5. 8. 0. 0. 9. 8. 0.]
[ 0. 4. 11. 0. 1. 12. 7. 0.]
[ 0. 2. 14. 5. 10. 12. 0. 0.]
[ 0. 0. 6. 13. 10. 0. 0. 0.]]

In the output above, both the digits and the image of the digits are printed.
digits.target give the ground truth for the digit dataset, i.e., the number corresponding to each digit image. It should be mentioned that data is always a 2D array which has a shape (n_sample, n_features) even though the original data may have had a different shape. In the case of digits, each original sample is an image of shape (8, 8) which can be accessed using digits.image.
A look at learning and predicting
Using the above work where we have used a dataset sample of 10 possible classes (digits from 0 - 9), we will need to predict the digits when the image is given. To achieve this, we need an estimator which helps to predict the classes to which unseen samples belong. An estimator is a Python object that implements classification using the methods:
fit(x,y) and predict(T). Let’s consider the following example:

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits= datasets.load_digits() // dataset
clf = svm.SVC(gamma=0.001, C=100)
print(len(digits.data))
x,y=digits.data[:-1],digits.target[:-1] // train the data
clf.fit(x,y)
print('Prediction:', clf.predict(digits.data[-1])) //predict data
plt.imshow(digits.images[-1],cmap=plt.cm.gray_r,
interpolation="nearest")
plt.show()

OUTPUT:
_1796
Prediction: [8]

Image Here…_

In the above example, we had first found the length and loaded 1796 examples. Next, we have used this data as a learning data, where we need to test the last element and first negative element. Also, we need to check whether the machine has predicted the right data or not. For that, we had used Matplotlib where we had displayed the image of digits.
In a nutshell, we have digits data, we got the target, we fit and predict it and that’s all.
Now, we can go ahead and visualize the target labels with an image as in the code below:

_import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm

digits= datasets.load_digits()
//Join the images and target labels in a list
images_and_labels = list(zip(digits.images, digits.target))

//for every element in the list
for index, (image, label) in enumerate(images_and_labels[:8]):
//initialize a subplot of 2X4 at the i+1-th position
plt.subplot(2, 4, index + 1)
//Display images in all subplots
plt.imshow(image, cmap=plt.cm.gray_r,interpolation='nearest')
//Add a title to each subplot
plt.title('Training: ' + str(label))
//Show the plot
plt.show()
_
OUPUT:

Images here…

As can be seen in the code above, we have used the ‘zip’ function to join the images and target labels in a list and then save it into a variable, images_and_labels.
Then we have indexed the first eight elements in a grid of 2 by 4 at each position and displayed the images with the help of Matplotlib and added the title as ‘training’.

Top comments (0)