# Talking about Machine Learning (I): Setup

### Andrés Baamonde Lozano twitter logo github logo Mar 12・2 min read

Part of "Talking about Machine Learning" series

Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.

### Environment

I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.

``````mkdir -p talkingaboutml/talkingaboutml
python3 -m virtualenv talkingaboutml/venv
talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib
``````

now, your talkingaboutml dir looks like:

``````talkingaboutml/
├── talkingaboutml (here we store our examples)
└── venv

``````

### First example

On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.

As a First example, i will train a simple classification and run a predict.

We need some imports, datasets, accuracy metric and a linear svc:

``````from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC
``````

Load dataset, this datasets are already divided (data, target).

``````iris = datasets.load_iris()
X = iris.data # each register, a iris with features
y = iris.target # classification for each register

feature_number = X.shape
``````

Create classification, train and predict.

``````
clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'

clf.fit(X, y) # Train

y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
``````

So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.

``````import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC

iris = datasets.load_iris()

X = iris.data # each register, a iris with features
y = iris.target # clasiffication ir each register

feature_number = X.shape

penalties = list(np.arange(0.5,10.0, 0.1))

accs = []

for C in penalties:
clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'

clf.fit(X, y) # Train

y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
accs.append(accuracy)

# plot the data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(penalties, accs, 'r')
plt.show()
`````` As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.

Part of "Talking about Machine Learning" series

DISCUSS
Classic DEV Post from Jan 22

## Self Care for Developers  