This post is inspired by Kevin Markham's course Introduction to Machine Learning with scikit-learn on Data School. If you're a beginner looking to get started with Machine Learning using scikit-learn, I would highly recommend this course to gain all the required foundational skills.

In just a few hours, you'd be able to understand and build basic regression, and classification models with the optimal hyperparameters.

πIf this sounds interesting, please be sure to check the course at this link.

That said, we shall cover the following in this post.

- How to Use Cross-Validation to Evaluate a Machine Learning Model
- How to Use Cross-Validation to Search for the Best Model Hyperparameters
- How to Use Grid Search for Hyperparameter Search
- How to Search for Hyperparameters More Efficiently Using Randomized Search

## How to Use Cross-Validation to Evaluate a Machine Learning Model

### How do we validate a machine learning model?

When evaluating a machine learning model, training and testing on the same dataset is not a great idea. Why? Let us draw a relatable analogy.

Ever since school days, weβve been giving exams, and how are our exams designed?

- Well, theyβve been designed so as to test our understanding of the concepts rather than our ability to memorize!
- The same analogy can be transposed to our machine learning model as well!

Hereβs the answer to the question

βWhy can we not evaluate a model on the same data that it was trained on?β

It is because this process inherently encourages the model to **memorize** the training data. So, the model performs extremely well on the training data.

However, it **generalizes rather poorly** to data that it has never seen before.

###
What does `train_test_split`

do?

As a model's performance on data that it has never seen before is a more reliable estimate of its performance, we usually validate the model by checking how it performs on out-of-sample data, that is, on data that it has never seen before.

If you remember, it is for this reason, we use the `train_test_split`

method, in our very friendly and nifty library, scikit-learn.

This `train_test_split`

splits the available data into two sets: the train and test sets in certain proportions.

For example, train on 70% of the available data and test on the remaining 30% of the data. This way, we can ensure that every record in the dataset can either be in the training set or the test set but not both!

And in doing so, we are making sure that we test the modelβs performance on unseen data.

But, is this good enough? Or do we take this with a pinch of salt?

Let us train a simple `KNeighborsClassifier`

in scikit-learn on the iris dataset.

βΆ **Necessary imports**

As shown below, we import the necessary modules.

```
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
```

βΆ **Load the data**

Let us load the iris data and separate out the **features**(Sepal Length, Sepal Width, Petal Length and Petal Width) and the **target** variables which are the class labels indicating the iris type. (Setosa, Versicolour, and Virginica)

```
# read in the iris data
iris = load_iris()
# create X (features) and y (response)
X = iris.data
y = iris.target
```

βΆ **Create train_test_split**

Letβs create the train and test sets with `random_state= 4`

. Setting the `random_state`

ensures reproducibility.

In this case, it ensures that the records that go into the train and test sets stay the same every time our code is run.

```
# use train/test split with different random_state values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 4)
```

βΆ **Check Classification Accuracy**

We now instantiate the `KNeighborsClassifier`

with `n_neighbors=9`

and fit the classifier on the training set and predict on the test set.

```
knn = KNeighborsClassifier(n_neighbors=9)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))
# Output
0.9736842105263158
```

Now, let us change the `random_state`

to a different value. What do you think the accuracy score would be?

Well, please feel free to insert your favorite number in the `random_state`

and check for yourselves. It'd be a different accuracy score this time.

Setting the `random_state`

to another value, we would get another value for the accuracy score.

The evaluation metric thus obtained is therefore susceptible to

high variance.

- The accuracy score may depend heavily on which data points end up in the training set and which end up in the test set.
- Thus the accuracy score may be significantly different depending on how the division is made. Clearly, this doesnβt seem like the best way to validate our modelβs performance!

#### How do we reach a consensus on how to calculate the accuracy score?

One very natural thing to do would be to create **multiple train/test splits**, calculate the accuracy for each such split, and compute the average of all the accuracy scores thus obtained. This definitely seems like a better estimate of the accuracy, doesnβt it?

This is precisely the essence of cross-validation, which we shall see in the subsequent section.

### Understanding K-fold cross-validation

**Steps in K-fold cross-validation**

- Split the dataset into K equal partitions (or βfoldsβ).
- Use fold 1 for testing and the union of the other folds as the training set.
- Calculate accuracy on the test set.
- Repeat steps 2 and 3 K times, using a different fold for testing each time.
- Use the average accuracy on different test sets as the estimate of out-of-sample accuracy.

Let us try to visualize this by splitting a dataset of 25 observations into 5 equal folds as shown below.

The dataset contains 20 observations (numbered 0 through 24).

5-fold cross-validation runs for 5 iterations.

Let's see how the folds are created

```
# simulate splitting a dataset of 25 observations into 5 folds
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, shuffle = False).split(range(25))
# print the contents of each training and testing set
print('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations'))
for iteration, data in enumerate(kf, start=1):
print('{:^9} {} {:^25}'.format(iteration, data[0], str(data[1])))
# Output
Iteration Training set observations Testing set observations
1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4]
2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9]
3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14]
4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19]
5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24]
```

We observe the following:

- For each iteration, every observation is either in the training set or the testing set, but not both.
- Every observation is in the test set exactly once.
- Each fold is used as the test set exactly once and in the training set (K-1) times.
- The average accuracy thus obtained is a more accurate estimate of out-of-sample accuracy.

This process uses data more efficiently as every observation is used for both training and testing.

It is recommended to use **stratified sampling** for creating the folds, as this ensures that **all class labels** are represented in **equal proportions** in **each fold**. And, scikit-learnβs `cross_val_score`

does this by default.

In practice, we can even do the following:

- βHold outβ a portion of the data before beginning the model building process.
- Find the best model using cross-validation on the remaining data, and test it using the hold-out set.
- This gives a more reliable estimate of out-of-sample performance since the hold-out set is truly out-of-sample.

## How to Use Cross-Validation to Search for the Best Model Hyperparameters

### Cross-validation for hyperparameter tuning

For the KNN classifier on the iris dataset, can we possibly use cross-validation to find the optimal value for `K`

? That is, to search for the optimal value of `n_neighbors`

?

Remember, `K`

in KNN classifier is the number of neighbors (`n_neighbors`

) that we take into account for predicting the class label of the test sample. Not to be confused with the `K`

in K-fold cross-validation.

```
from sklearn.model_selection import cross_val_score
# 10-fold cross-validation with K=5 for KNN (the n_neighbors parameter)
knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
print(scores)
# Output
[1. 0.93333333 1. 1. 0.86666667 0.93333333
0.93333333 1. 1. 1. ]
# use average accuracy as an estimate of out-of-sample accuracy
print(scores.mean())
# Output
0.9666666666666668
```

Now, we shall run the K fold cross-validation for the models with different values of `n_neighbors`

, as shown below.

```
# search for an optimal value of K for KNN
k_range = list(range(1, 31))
k_scores = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
k_scores.append(scores.mean())
print(k_scores)
# Output k_scores
[0.96, 0.9533333333333334, 0.9666666666666666, 0.9666666666666666, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9800000000000001, 0.9666666666666666, 0.9666666666666666, 0.9733333333333334, 0.96, 0.9666666666666666, 0.96, 0.9666666666666666, 0.9533333333333334, 0.9533333333333334, 0.9533333333333334]
```

Let us plot the values to get a better idea.

```
import matplotlib.pyplot as plt
%matplotlib inline
# plot the value of K for KNN (x-axis) versus the cross-validated accuracy (y-axis)
plt.plot(k_range, k_scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')
```

We see that `n_neighbors (K)`

values from 13 to 20 yield higher accuracy, especially `K`

=13,18 and 20.As a larger value of `K`

yields a less complex model, we choose `K`

=20.

This process of searching for the optimal values of hyperparameters is called

hyperparameter tuning.

In this example, we chose the values of `K`

that resulted in higher mean accuracy score under 10-fold cross validation.

This is how cross-validation can be used to search for the best hyperparameters and this can be done much more efficiently in scikit-learn.

In KNN classifiers, setting a very small value for

`K`

will make the model needlessly complex, and a very large value of`K`

would result in a model with high bias that yields suboptimal performance.

As `K`

=13,18 and 20 gave the highest accuracy score, close to 0.98, we decided to choose `K`

=20 as a larger value of `K`

would yield a less complex model.

While it is not particularly difficult to write the for loop, we do realize that we may have to do it often.

Therefore, it would be good to have a more convenient way of doing the hyperparameter search, without having to write a loop every time and identify the best parameter through inspection.

Grid Search class in scikit-learn helps us do this all the more easily. Let's learn about it in the next section.

## How to Use Grid Search for Hyperparameter Search

βΆ **Import GridSearchCV Class**

```
from sklearn.model_selection import GridSearchCV
```

βΆ **Define the Parameter Grid**

We now define the **parameter grid** (`param_grid`

), a Python dictionary, whose key is the name of the hyperparameter whose best value weβre trying to find and the value is the list of possible values that we would like to search over for the hyperparameter.

```
# define the parameter values that should be searched
k_range = list(range(1, 31))
# create a parameter grid: map the parameter names to the values that should be searched
param_grid = dict(n_neighbors=k_range)
print(param_grid)
# param_grid
{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]}
```

We now instantiate `GridSearchCV`

. Note that we specify the `param_grid`

instead of the `n_neighbors`

argument that we had specified for `cross_val_score`

earlier.

**Why is this valid?**

Remember, the parameter grid, `param_grid`

is a dictionary whose key is `n_neighbors`

and the value is a list of possible values of `n_neighbors`

. Therefore, specifying the `param_grid`

ensures that the value at index `i`

is fetched as the value of `n_neighbors`

in the i_th run.

βΆ **Instantiate, fit the grid and view the results**

```
# instantiate the grid
grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)
```

We now go ahead and fit the grid with data, and access the `cv_results_`

attribute to get the mean accuracy score after 10-fold cross-validation, standard deviation and the parameter values.

For convenience, we may store the results in a pandas DataFrame. The mean and standard deviation of the accuracy scores for `n_neighbors`

=1 to 10 are shown below.

```
# fit the grid with data
grid.fit(X, y)
# view the results as a pandas DataFrame
import pandas as pd
pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]
# Output
mean_test_score std_test_score params
0 0.960000 0.053333 {'n_neighbors': 1}
1 0.953333 0.052068 {'n_neighbors': 2}
2 0.966667 0.044721 {'n_neighbors': 3}
3 0.966667 0.044721 {'n_neighbors': 4}
4 0.966667 0.044721 {'n_neighbors': 5}
5 0.966667 0.044721 {'n_neighbors': 6}
6 0.966667 0.044721 {'n_neighbors': 7}
7 0.966667 0.044721 {'n_neighbors': 8}
8 0.973333 0.032660 {'n_neighbors': 9}
9 0.966667 0.044721 {'n_neighbors': 10}
```

When using `cross_val_score`

, we tried eyeballing the accuracy scores to identify the best hyperparameters and to make it easier, we plotted the value of hyperparameters vs the respective cross-validated accuracy scores!

Sounds good but doesnβt seem to be a great option though!

Once weβve completed the grid search, the following attributes can be very useful! We can choose to examine:

β the `best_score_`

, the **highest cross-validated accuracy score**

β the `best_params_`

, the **optimal value for the hyperparameters**, and

β the `best_estimator_`

, which is the **best model** that has the **best hyperparameter**.

Let's now examine these for our example.

βΆ **Examine the best score and best hyperparameters**

```
# examine the best model
print(grid.best_score_)
print(grid.best_params_)
print(grid.best_estimator_)
# Output
0.9800000000000001
{'n_neighbors': 13}
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=13, p=2,
weights='uniform')
```

`K`

=13 has been chosen, remember, `K`

=13 was one of the values of `K`

that gave highest cross-validated accuracy score.β

So far so good!

### Searching for Multiple Hyperparameters

In this example, the only hyperparameter that we searched for was `n_neighbors`

.

What if there were many such hyperparameters?

We may think,βWhy not tune each hyperparameter independently?β

Well, we may independently search for the optimal values for each of the hyperparameters; but *the model may perform best at some values of the parameters that are very different from the individual best values*.

So, we have to search for the **combination of the parameters** that **optimizes performance** rather than the individual best parameters.

Let's build on the same example of KNNClassifier.

In addition to `n_neighbors`

, let's search for the optimal weighting strategy as well.

- The default weighting option is
`βuniformβ`

where all points are weighted equally and`βdistanceβ`

option weights points by the inverse of their distance.

In this case, **closer neighbors** of a query point will have a **greater influence** than neighbors which are far away.

βΆ **Define the Parameter Grid**

```
# define the parameter values that should be searched
k_range = list(range(1, 31))
weight_options = ['uniform', 'distance']
# create a parameter grid: map the parameter names to the values that should be searched
param_grid = dict(n_neighbors=k_range, weights=weight_options)
print(param_grid)
# param_grid
{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'weights': ['uniform', 'distance']}
```

Let us instantiate and fit the grid and view results, as before.

```
# instantiate and fit the grid
grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)
grid.fit(X, y)
# view the results
pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]
# Results
mean_test_score std_test_score params
0 0.960000 0.053333 {'n_neighbors': 1, 'weights': 'uniform'}
1 0.960000 0.053333 {'n_neighbors': 1, 'weights': 'distance'}
2 0.953333 0.052068 {'n_neighbors': 2, 'weights': 'uniform'}
3 0.960000 0.053333 {'n_neighbors': 2, 'weights': 'distance'}
4 0.966667 0.044721 {'n_neighbors': 3, 'weights': 'uniform'}
5 0.966667 0.044721 {'n_neighbors': 3, 'weights': 'distance'}
6 0.966667 0.044721 {'n_neighbors': 4, 'weights': 'uniform'}
7 0.966667 0.044721 {'n_neighbors': 4, 'weights': 'distance'}
8 0.966667 0.044721 {'n_neighbors': 5, 'weights': 'uniform'}
9 0.966667 0.044721 {'n_neighbors': 5, 'weights': 'distance'}
10 0.966667 0.044721 {'n_neighbors': 6, 'weights': 'uniform'}
11 0.966667 0.044721 {'n_neighbors': 6, 'weights': 'distance'}
12 0.966667 0.044721 {'n_neighbors': 7, 'weights': 'uniform'}
13 0.966667 0.044721 {'n_neighbors': 7, 'weights': 'distance'}
14 0.966667 0.044721 {'n_neighbors': 8, 'weights': 'uniform'}
15 0.966667 0.044721 {'n_neighbors': 8, 'weights': 'distance'}
16 0.973333 0.032660 {'n_neighbors': 9, 'weights': 'uniform'}
17 0.973333 0.032660 {'n_neighbors': 9, 'weights': 'distance'}
18 0.966667 0.044721 {'n_neighbors': 10, 'weights': 'uniform'}
```

We see that we have 30*2=60 models.(As we had 30 possible values for `n_neighbors`

and 2 possible values for weights)

As we chose 10-fold cross-validation, there will be 60*10=600 predictions made!

Time to look at our modelβs best score and parameters that yielded the best score.

βΆ **Examine the best score and best hyperparameters**

```
# examine the best model
print(grid.best_score_)
print(grid.best_params_)
# best score and best parameters
0.9800000000000001
{'n_neighbors': 13, 'weights': 'uniform'}
```

We obtain the same best cross-validated accuracy score of 0.98, with `n_neighbors`

=13 and `weights`

= βuniformβ.

Now, let's suppose we have to tune 4 hyperparameters and we have a list of 10 possible values for each of the hyperparameters.

This process creates 10*10*10*10 =10,000 models and when we run 10 fold cross-validation, there are 100,000 predictions made.

Clearly, things do scale up very quickly and can soon become computationally infeasible.

## How to Search for Hyperparameters More Efficiently Using Randomized Search

We shall now try to rephrase the limitations of Grid Search better, in a more formal way.

- Say we have to search for
`M`

parameters; Let`p_1, p_2,p_3, β¦, p_M`

be the`M`

parameters. - Let the number of values that we would like to search over for
`p_1`

be`n1`

, for`p_2`

be`n2`

, and so on, with`nM`

values for`p_M`

.

Grid Search considers **all possible hyperparameter settings** (combinations) into account and creates a model for each possible setting to choose the best model with optimal hyperparameters.

- To understand it better, assume that out of
`M`

parameters, we decide to freeze the values of all hyperparameters except one, say the M_th parameter`p_M`

. So, Grid Search involves searching through the list of`nM`

values for the M_th hyperparameter; And`nM`

models are created. - Suppose we now freeze the values of all hyperparameters except two, say the last two (
`p_M`

and`p_(M-1)`

). We now have to search through all possible combinations of`p_M`

and`p_(M-1)`

, each having`nM`

and`n_(M-1)`

possible values that we could search over. - We now take a step back and freeze the value of
`p_M-1`

and search through all values for`p_M`

; To account for all possible combinations, we should repeat the procedure for all`n_M-1`

values for`p_M-1`

. So, this process would leave us with`n_(M-1) * nM`

models.

Hope itβs clear how the complexity scales with increase in the number of values each hyperparameter could take.

- For the above example with
`M`

hyperparameters, we would have`n1*n2*n3*β¦*n_M`

models. This is why we said that things could scale up quickly and become computationally intractable with Grid Search.

With this motivation to make hyperparameter search computationally more efficient, let us proceed to understand Randomized Search.

### Understanding RandomizedSearchCV

In contrast to `GridSearchCV`

, not all parameter values are tried out in `RandomizedSearchCV`

, but rather a fixed number of parameter settings is sampled from the specified distributions/ list of parameters.

If some of the hyperparameters that weβre searching for are continuous, then we should specify the distribution rather than the list of values, while defining the parameter grid. How do we define the fixed number of parameter settings?

The number of parameter settings that are tried is given by `n_iter`

. There's a quality vs computational cost trade-off in picking `n_iter`

.

A very small value of

`n_iter`

would imply that weβre more likely to find a suboptimal solution, because we are actually considering too few combinations.A very high value of n_iter would mean we can ideally get closer to finding the best hyperparameters that yield the best model, but this again comes with a high computation cost as before.

In fact, if we set

`n_iter= n1*n2*n3*β¦*n_M`

from the previous example, then, weβre essentially considering all possible hyperparameter combinations and now Randomized Search and Grid Search are equivalent.

Let us build on the same example of KNNClassifier from the previous section.

Let us search for the optimal weighting strategy and `n_neighbors`

. And now, let us implement Randomized Search in scikit-learn and do the following steps, as we did for Grid Search.

βΆ **Import RandomizedSearchCV class**

```
from sklearn.model_selection import RandomizedSearchCV
```

βΆ **Define the parameter grid**

```
# specify "parameter distributions" rather than a "parameter grid"
param_dist = dict(n_neighbors=k_range, weights=weight_options)
```

βΆ **Instantiate the grid; Set n_iter=10, Fit the grid & View the results**

```
# n_iter controls the number of searches
rand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, random_state=5, return_train_score=False)
rand.fit(X, y)
pd.DataFrame(rand.cv_results_)[['mean_test_score', 'std_test_score', 'params']]
#DataFrame
mean_test_score std_test_score params
0 0.973333 0.032660 {'weights': 'distance', 'n_neighbors': 16}
1 0.966667 0.033333 {'weights': 'uniform', 'n_neighbors': 22}
2 0.980000 0.030551 {'weights': 'uniform', 'n_neighbors': 18}
3 0.966667 0.044721 {'weights': 'uniform', 'n_neighbors': 27}
4 0.953333 0.042687 {'weights': 'uniform', 'n_neighbors': 29}
5 0.973333 0.032660 {'weights': 'distance', 'n_neighbors': 10}
6 0.966667 0.044721 {'weights': 'distance', 'n_neighbors': 22}
7 0.973333 0.044222 {'weights': 'uniform', 'n_neighbors': 14}
8 0.973333 0.044222 {'weights': 'distance', 'n_neighbors': 12}
9 0.973333 0.032660 {'weights': 'uniform', 'n_neighbors': 15}
```

βΆ **Examine the best score and best hyperparameters**

```
# examine the best model
print(rand.best_score_)
print(rand.best_params_)
# Output
0.9800000000000001
{'weights': 'uniform', 'n_neighbors': 18}
```

βΆ **Parameters of the best model**

- Surprisingly, we see that the highest accuracy score obtained in this case, where we only looked at 10 different parameter settings instead of 60 in Grid Search, is the same as before: 0.98 β
- And the value for
`n_neighbors= 18`

, which is also one of the optimal values that we got when we initially searched for the optimal value of`n_neighbors`

.

Maybe we just got lucky?

What is the guarantee that we will always get the best results?

Ah, this question makes perfect sense, doesnβt it?

Let us do the following now: Let us run `RandomizedSearchCV`

for multiple times and see how many times we really end up getting lucky!

β‘οΈRun RandomizedSearchCV 20 times and see what happens; We log the best_score_ for every run.

```
# run RandomizedSearchCV 20 times (with n_iter=10) and record the best score
best_scores = []
for _ in range(20):
rand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, return_train_score=False)
rand.fit(X, y)
best_scores.append(round(rand.best_score_, 3))
Let us examine all the 20 best scores now.
print(best_scores)
# Output: Best Scores
[0.973, 0.98, 0.98, 0.98, 0.973, 0.98, 0.98, 0.973, 0.98, 0.973, 0.973, 0.98, 0.98, 0.98, 0.98, 0.973, 0.98, 0.98, 0.98, 0.973]
```

Upon examining the best scores above for all the 20 runs, we see that we get the best accuracy score of 0.98 about 13 times.

Looks like weβre lucky indeed! What about the other 7 times when we didn't quite get the best accuracy score? These accuracy scores are around 0.973 which is pretty close to 0.98.

This observation convinces us that even though Randomized Search may not always give the hyperparameters of the best performing model, the models obtained by using these hyperparameters

do not perform much worse compared to the best modelobtained from Grid Search.

This means, the best models thus obtained, with the hyperparameters from randomized search are clearly very close to the optimal model.

In essence, these

may not be the best hyperparameters, but certainly close to the best hyperparameters, except that these are found under resource-constrained settings.

Hope you all understood how we could use Randomized Search for hyperparameter tuning.

Hope you all enjoyed reading this post.π

Happy Learning! Until next time!β¨

## References

[1]https://courses.dataschool.io/courses/introduction-to-machine-learning-with-scikit-learn

[2]http://scikitlearn.org/stable/modules/cross_validation.html

[3]http://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

[4]http://scikitlearn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

Cover Image: Photo by Ross Sneddon on Unsplash

**Note**: This post is a comprehensive post for understanding model evaluation, and hyperparameter search. This is a compiled, and improved version of a few of my earlier shorter posts.

## Top comments (0)