DEV Community: Bibek Chalise

Learning PyTorch fundamental Neural Network Structure

Bibek Chalise — Mon, 25 Apr 2022 16:57:27 +0000

First Lets import all the requirements that is needed for building the basic architecture of Neural Network in PyTorch. If you haven't yet installed PyTorch, i strongly suggest you to install it from here (PyTorch Official Site) cause I wont be teaching it in details.

Plus, we won't talking much about PyTorch fundamentals, like tensors and other operations, but if you want it, I will make separate Tutorial on that. But in this series, I assume you are pretty familiar with basic PyTorch , numpy and python and we will continue with that assumptions. So much to cover, lets see how far we can go on this.

The first task is to import the libraries we need for the overall task, PyTorch is the must at first and if you need something else in the future, we will import requirements in subsequent cells.

# importing libraries
import torch

I have taken a toy dataset for less complications, In future we may take a real life dataset but for now lets stick with this one.

x  = [[1,2],[3,4],[5,6],[7,8]]
y = [ [3],[7],[11],[15]]

the next task is to convert it into tensor which is the building block of PyTorch library, its like numpy ndarray, but not exactly the same.

x = torch.tensor(x).float()
y = torch.tensor(y).float()

Like i mentioned above, tensor and numpy ndarray are same, but different, the difference can be seen when it comes to execution. PyTorch tensors can be executed in GPU while numpy array doesn't support execution in GPU, plus because of threading, it takes less time for PyTorch tensors to execute even in CPU than to numpy array. So, if we have GPU available, we will be using it in full extend.

Its difficult to afford GPU by us, so we will be using Free GOOGLE COLAB and enable GPU there.

from torch.cuda import is_available
device = 'cuda' if torch.cuda.is_available() else 'cpu'

print(device)

The default in Colab is CPU, so if you want to change to GPU, navigate through Runtime and change the runtime type to GPU.

X = x.to(device)
Y = y.to(device)

Now, we are into the basis foundation of Neural Network. Lets learn each step, with explanation of each step.

import torch.nn as nn 
# torch.nn is the class where everything about neural network resides.

class Nnet(nn.Module):
  #inheriting the class torch.nn.Module into Nnet It is compulsory to inherit from nn.Module
  # as it is the base class for all NN.
  def __init__(self): #making all the initializations of all components nn.Module
    super().__init__()
    # super()__init__() make sure that the class completely inherit nn.Module
    #with this, we can completely take advantage of pre-built functionlaties of nn.Module


    #define Layers in the Neural Network

    self.input_to_hidden_layer = nn.Linear(2,8)
    self.hidden_layer_activation = nn.ReLU()
    self.hidden_to_output = nn.Linear(8,1)


    #defining Forward Propgation

  def forward (self, x):
      x = self.input_to_hidden_layer(x)
      x = self.hidden_layer_activation(x)
      x = self.hidden_to_output(x)
      return x

here we are defining 3 layers, input layer, hidden layer and output layer, with activation in the hidden layer.

If we look closely, we can see we have used nn.Linear(2,8) and nn.Linear(8,1) it means, the first parameter is the number of input features to the node and the second is the number of output features from the node. This means, as in our dataset, we will be sending 2 features into the node and it will output 8 output features in the hidden layer.

The hidden layer also comes up with activation layers, in brief the activation layers makes sure either to fire or not to the node.

Here, we have used ReLU activation function, which stands for Rectified Linear Unit, The other popular activation function are

sigmoid
SoftMax
Tanh

The forward function is for defining forward propogation, the name "FORWARD" is compulosory as it is reserved word to define forward propagation. With other name, it would create error.

Lets create an instance of NNet class as mynet.

Also, we will look how the randomly initialized weights would look like.

mynet = Nnet().to(device)
# taking everything to the device is compulory if we want to utilize GPU.

mynet.input_to_hidden_layer.weight

image.png

Note, everytime you run the above code, the weights initialized will be different, if you want to have same, you have to specify the seed using manual seed method in torch as torch.manual_seed(42).

Now, lets define the loss function for our model. We will be using mean square loss in our case, the other available prominent loss functions can be

CrossEntropyLoss (for multinomial classifications)
BCELoss (Binary cross entropy loss for binary classification) But more on these in upcming tutorials.

loss_func = nn.MSELoss()

model_output = mynet(X)
loss_value = loss_func(model_output, Y)
print(loss_value)

In pytorch, for loss function, the first parameter is the predicted output and the second parameter is the actual output required.

Now, its time to optimize the model using optimizer that tries to reduce the loss value. The inputs to the optimizer will be weights, biases and learning rate when updating the weights.

Here, we will be employing Stochastic gradient descent (SGD), other optimizers will be used for other use cases.

from torch.optim import SGD
opt = SGD(mynet.parameters(), lr=0.001)

Now we need to perform the following steps in a single epoch together and run all the steps for number of loops.

Calulate loss values correponging to given input and output
calculate the gradient correponding to each parameter
update the weights based on learning rate and gardient
flush out previous epochs gardient

loss_history = []
for _ in range(50):
  opt.zero_grad() # flush out previous epochs gradients
  loss_value = loss_func(mynet(X), Y) #calculating loss value
  loss_value.backward() #performing back propagation
  opt.step() #update weights according to the gradients calculated
  loss_history.append(loss_value.cpu().detach().numpy()) 
  #The last step to convert all the tensors in GPU to cpu and then to numpy since numpy doesnt support GPU.

Lets plot out result.

import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(loss_history)
plt.title('loss variation over increasing epochs')
plt.xlabel('epochs')
plt.ylabel('loss value')

Saving and loading the pytorch model

#saving model
torch.save(mynet.state_dict(), 'mymodel.pth')

#loading model
mynet.load_state_dict(torch.load('mymodel.pth'))

GitHub Link

Impact of Scaling in Accuracy

Bibek Chalise — Mon, 25 Apr 2022 16:55:54 +0000

Scaling a dataset is one of the major step of Data Pre-Processing, It is done to reduce the range of data variables. When it comes to image, the minimum - maximum possible value range is always 0-255, that means 255 is the maximum value. So the best way to scale down image array value is to divide it by the maximum value. So the range will always be in between 0-1.

Scaling down the input variables will keep in such a range that, it will be easier for the model to work with, will take less time.

Here, for the given example we have taken Fashion MNIST dataset described in torch.datasets module.

A brief about the dataset, it has 10 classes of wear-items, with total of 60000 data points equally distributed as 6000 each class. So, the dataset is balanced, we dont have to look for that.

First we have defined the program to train, the dataset without scaling down the dataset. More than Code Structure, We are more interested to know the difference scaling of dataset that can come.

I have in short commented out the code structure too.

# Importing all required libraries

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
device = 'cuda' if torch.cuda.is_available() else 'cpu'
from torchvision import datasets

# Here we downloaded the datset to data_folder, with train=True indicating, it is for training purpose. 
data_folder = '/data/'

fmnist = datasets.FashionMNIST(data_folder, download=True, train=True) 
tr_images = fmnist.data
tr_targets = fmnist.targets


# Here, we have extended Dataset Class to define dataset the way we wanted, More on this on upcoming tutorials


class FMNISTDataset (Dataset):
  def __init__(self, x, y):
    x = x.float()
    x = x.view(-1, 28*28) #Flattern the input 28*28 image
    self.x, self.y = x, y
  def __getitem__(self, ix):
    x, y = self.x[ix], self.y[ix]
    return x.to(device), y.to(device)

  def __len__(self):
    return len(self.x)

# Here we simple, loaded dataset defined above, using DataLoader Module in batch_size of 32.

def get_data():
  train = FMNISTDataset(tr_images, tr_targets)
  trn_dl = DataLoader(train, batch_size=32, shuffle=True)
  return trn_dl


#   Defining the model

from torch.optim import SGD
def get_model():
  model = nn.Sequential(
      nn.Linear(28*28,1000),
      nn.ReLU(),
      nn.Linear(1000,10)
  ).to(device)

  loss_fn = nn.CrossEntropyLoss()
  optimizer = SGD(model.parameters(), lr = 1e-2)
  return model, loss_fn, optimizer

#Training the data in defined model

def train_batch(x, y, model, opt, loss_fn):
  model.train()
  prediction = model(x)
  batch_loss = loss_fn(prediction, y)
  batch_loss.backward()
  opt.step()
  opt.zero_grad()
  return batch_loss.item()

#For calculating accuracy, here @torch.no_grad() used to define that, we dont calculate gradient while testing.

@torch.no_grad()
def accuracy(x, y, model):
  model.eval()
  prediction = model(x)
  max_values, argmaxes = prediction.max(-1)
  is_correct = argmaxes ==y
  return is_correct.cpu().numpy().tolist()

#Running Model for training and testing in number of epochs


train_dl = get_data()
model, loss_fn, optimizer = get_model()
losses, accuricies = [], []
for epoch in range(5):
  print(epoch)
  epoch_losses, epoch_accuricies = [], []
  for ix, batch in enumerate(iter(train_dl)):
    x, y = batch
    batch_loss = train_batch(x, y, model, optimizer, loss_fn)
    epoch_losses.append(batch_loss)
  epoch_loss = np.array(epoch_losses).mean()
  for ix, batch in enumerate(iter(train_dl)):
    x, y  =batch
    is_correct = accuracy(x, y, model)
    epoch_accuricies.extend(is_correct)
  epoch_accuracy = np.mean(epoch_accuricies)
  losses.append(epoch_loss)
  accuricies.append(epoch_accuracy)

#For Plotting purpose. 

epochs = np.arange(5)+1
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuricies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) \
 for x in plt.gca().get_yticks()])
plt.legend()

Here, in above plot we can see that the loss has reduced to a point where it has saturated but, the accuracy seems to be just around 13%, that's not what we want our model to perform right. So, we tweak just single hyperparameter, scaling down the dataset range. Okay, I agree, you don't want to call it hyperparameter tuning, we don't call it so.

As mentioned above, we divide the input by it's maximum possible value, i.e. 255.

The changed code will be

class FMNISTDataset (Dataset):
  def __init__(self, x, y):

    #The changed code starts here
    x = x.float()/255
    #The changed code ends here

    x = x.view(-1, 28*28)
    self.x, self.y = x, y
  def __getitem__(self, ix):
    x, y = self.x[ix], self.y[ix]
    return x.to(device), y.to(device)

Now, lets Rerun the code and see how much change it makes.

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
device = 'cuda' if torch.cuda.is_available() else 'cpu'
from torchvision import datasets


data_folder = '/data/'

fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets


class FMNISTDataset (Dataset):
  def __init__(self, x, y):

    #The changed code starts here
    x = x.float()/255
    #The changed code ends here

    x = x.view(-1, 28*28)
    self.x, self.y = x, y
  def __getitem__(self, ix):
    x, y = self.x[ix], self.y[ix]
    return x.to(device), y.to(device)

  def __len__(self):
    return len(self.x)



def get_data():
  train = FMNISTDataset(tr_images, tr_targets)
  trn_dl = DataLoader(train, batch_size=32, shuffle=True)
  return trn_dl



from torch.optim import SGD
def get_model():
  model = nn.Sequential(
      nn.Linear(28*28,1000),
      nn.ReLU(),
      nn.Linear(1000,10)
  ).to(device)

  loss_fn = nn.CrossEntropyLoss()
  optimizer = SGD(model.parameters(), lr = 1e-2)
  return model, loss_fn, optimizer


def train_batch(x, y, model, opt, loss_fn):
  model.train()
  prediction = model(x)
  batch_loss = loss_fn(prediction, y)
  batch_loss.backward()
  opt.step()
  opt.zero_grad()
  return batch_loss.item()


@torch.no_grad()
def accuracy(x, y, model):
  model.eval()
  prediction = model(x)
  max_values, argmaxes = prediction.max(-1)
  is_correct = argmaxes ==y
  return is_correct.cpu().numpy().tolist()



train_dl = get_data()
model, loss_fn, optimizer = get_model()
losses, accuricies = [], []
for epoch in range(5):
  print(epoch)
  epoch_losses, epoch_accuricies = [], []
  for ix, batch in enumerate(iter(train_dl)):
    x, y = batch
    batch_loss = train_batch(x, y, model, optimizer, loss_fn)
    epoch_losses.append(batch_loss)
  epoch_loss = np.array(epoch_losses).mean()
  for ix, batch in enumerate(iter(train_dl)):
    x, y  =batch
    is_correct = accuracy(x, y, model)
    epoch_accuricies.extend(is_correct)
  epoch_accuracy = np.mean(epoch_accuricies)
  losses.append(epoch_loss)
  accuricies.append(epoch_accuracy)



epochs = np.arange(5)+1
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuricies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) \
 for x in plt.gca().get_yticks()])
plt.legend()

Wow, look at the accuracy, its around 85%, from 13% to 85% just by scaling down the input.

But what could be the reason for such drastic increase in accuracy merely scaling down the input. Lets, see the math behind this.

We know,

sigmoid =

source: Modern Computer Vision with PyTorch

In the left hand side, when the input is 255 and the weight is more than or equal to 0.1, there is no change in the sigmoid output. Similarly the change was also so less when the weight was extremely low.

The reason is that exponential of the large negative values is very close to 0. But in the right hand side, since the input is 1, we can see change in the sigmoid output.

Scaling the input dataset so that it contains a much smaller range of values generally helps in achieving better model accuracy.

GitHub

What is TF-IDF?

Bibek Chalise — Thu, 03 Jun 2021 01:01:43 +0000

Hi!, How are you?

Today lets, see how we can represent text data of a corpus in array format.
As we know, computers only understand numbers, and when we are performing any machine learning algorithm, we have to encode each data into some sort of numerical format, so that the algorithms can find a pattern from that data and build a model.
And if we are into Natural Language Processing and especially text-data analysis, we have to deal with the text as data. so, in order to feed to the algorithm, it is a must-performed step that we, change the textual raw data into numerical data.
There are various ways to do it. Let's discuss those.
The first is Bag of Words, it is just a way of counting the numbers of each text that appears in a corpus. (Here, Corpus means the entire dataset of text.)
Let's take 3 sentences.

"It is going to rain today"
"I am going to drink coffee"
"I am going to capital today"

If we perform Bag of words in the above example, first we make count the number of times individual items, repeats in a corpus.

Term	Frequency
going	3
to	3
i	2
am	2
today	2
it	1
is	1
rain	1
drink	1
coffee	1
capital	1

Now if we represent it in the tabular form, the bag of words representation looks like this.

Term/document No	going	it	to	i	am	is	rain	today	drink	coffee	capital
1.	1	1	1	0	0	1	1	1	0	0	0
2.	1	0	1	1	1	0	0	0	1	1	0
3.	1	0	1	1	1	0	0	1	0	0	1

But we can already see the problem in this Bag of Words representation, All the words carry the same importance. In the given dataset, the word 'going' is present in each of the sentences. While, words like rain, coffee, capital are present only in each sentence, and carry the main essence of the sentence. But when we represent it in the BoW model, these all words got the value of 1. So, BoW model representation, will not represent the importance of some words which can be problematic during
The problem we can see is it, no order is maintained, which means the semantic information is not preserved. We know, the text is sequential data, so the order of data is very important, but the BoW model doesn't care about the order of data. So, this can cause problems when we have to work on models where data need to be in proper order so that machines can learn from the data.
If you want to perform Bag of Words in python sklearn, we can perform it as.

    from sklearn.feature_extraction.text import CountVectorizer
    import pandas as pd
    vectorizer = CountVectorizer()
    doc = ["It is going to rain today",
        "I am going to drink coffee",
        "I am going to capital today"]
    X = vectorizer.fit_transform(doc)
    column = vectorizer.get_feature_names()
    df = pd.DataFrame(X.toarray(), columns=column)
    df

In order to solve the problems with the Bag of Words Model, we use something called TF-IDF.
So what is TF-IDF?
Tf-IDF stands for Term Frequency - Inverse Document Frequency.
Here, Term Frequency means the ratio of Number of Occuracnies of a word in a Document to the Number of Words in that Document.
Term frequency, tf(t,d), is the frequency of term t,

where ft,d is the raw count of a term in a document, i.e., the number of times that term t occurs in document d. There are various other ways to define term frequency.

From the above example, the term-frequency of the word going is:
Here, going appears 3 times in the document and there are total 18 words. so,
tf(going) = 3/18 = 0.1666
similarly, the tf of word to is : tf(to) = 2/18 = 0.111

so, let calculate the term frequency for all the terms:

Term	TF value(doc1)	TF value(doc2)	Tf value(doc3)
going	0.1666	0.1666	0.1666
to	0.1666	0.1666	0.1666
i	0	0.1666	0.1666
am	0	0.1666	0.1666
it	0.1666	0	0
is	0.1666	0	0
rain	0.1666	0	0
today	0.1666	0	0.1666
drink	0	0.1666	0
coffee	0	0.1666	0
capital	0	0	0.1666

Since we have calculated the term-frequency, let's discuss Inverse Document Frequency (IDF).
IDF is calculated as the log of the ratio of Numbers of the document to the Number of documents that contain the particular term. So, measure the amount of value the word provides i.e, is the measurement of how common or how rare is the word in the given corpus.

with

: total number of documents in the corpus N = | D |
: number of documents where the term t appears (i.e., t f ( t , d ) ≠ 0 ). If the term is not in the corpus, this will lead to a division-by-zero. It is therefore common to adjust the denominator to 1 + | { d ∈ D: t ∈ d }

So, let's calculate the IDF value of some terms.
The IDF of 'going' can be calculated as:
Word 'going' is present in all three documents and there are since total 3 documents. so the idf value of going must be, idf(going) = log(3/)= log(1) = 0.
What it tells that since going is present in all the 3 documents, it carries no importance at all.
Also, if we calculate the idf value of to, it becomes: idf(to) = log(3/2) = 0.17609
Also, if we calculate the idf value of coffee, it becomes: idf(coffee) = log(3/1) = 0.47712
So, let's see what IDF value of each term becomes.

Term	IDF value
going	0
to	0
i	0.17609
am	0.17609
today	0.17609
it	0.47712
is	0.47712
rain	0.47712
drink	0.47712
coffee	0.47712
capital	0.47712

Now, it's time to do magic, calculate TF-IDF. It is simply the product of Term Frequency and Inverse Document Frequency.
If we calculate the TF-IDF value of the word to in document 1, we get. TFIDF(to) = TF(to) * IDF(to) = 0.1666*0.17609

Term/document No	it	i	am	is	rain	today	drink	coffee	capital
1.	0.07948	0	0	0.07948	0.07948	0.02933	0	0	0
2.	0	0.02933	0.02933	0	0	0	0.07948	0.07948	0
3.	0	0.02933	0.02933	0	0	0.02933	0	0	0.07948

This is the final TF-IDF text representation for the example corpus. You can try TF-IDF in sklearn as given below code.

    from sklearn.feature_extraction.text import TfidfVectorizer
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(doc)
    column = vectorizer.get_feature_names()
    df = pd.DataFrame(X.toarray(), columns=column)

If you have tried TF-IDF in sklearn, then you can see that the results are quite different. It is because the sklearn TI-IDF vectorizer uses the log normalization method for the calculation and has tuned parameters in a different way. The above-mentioned method is the root idea about TFIDF, yet it needs to be tuned for large extensive use.

If you are still confused in TFIDF, let me know in the comments, until then, enjoy Learning.
The code for this tutorial can also be found at this link.

originally published at : My Personal Site

Thank you!

A gentle Guide to HyperParameter Tuning.

Bibek Chalise — Sat, 08 May 2021 06:53:03 +0000

Hi!
How you doing?
Today we will be doing hyperparameter tuning with the help of the RandomisedSearchCV algorithm.

What are Hyperparameters actually?

Let’s see this way when using a machine learning algorithm, there are various parameters associated with the instance or the method we using of a particular algorithm By default, it is provided, which gives significantly good results. However, if we want to increase the accuracy of the results, we have to make some tweaks to the default parameters. And the process of tuning such parameters with the hope of better accuracy of the given model using a particular algorithm instance can be called Hyperparameter Tuning.
If it looks like Jargon, we will look at an example of the default parameter of the Support Vector Machine Classifier SVC instance.

In the above example, when we see the parameters of the SVC instance, we get the default parameters as mentioned above. So when we instantiate the SVC instance, the default parameters are passed in it. But when we visit the official documentation of SVC, we see a bunch of these parameters can be passed as a dictionary or list. So, we use that feature of such flexibility of these parameters and try a different set of parameters and find the best parameters that give the best results.

So, what we do is take a dataset and work on it and find the accuracy by default parameters and then tune few parameters to increase the score.

For this task, we will be using Jupyter Notebook. If you like doing it in a local machine it's okay, but I highly suggest using the online Jupyter Notebook. Colab by Google is a very good resource that we can use for free and Deepnote is another alternative to Google Colab.
Here, I personally will be using Deepnote.

    #importing required moduels.
    import pandas as pd #for tabular data frame analysis
    import numpy as np. #Form mathematical Manipulation
    import matplotlib.pyplot as plt #for data visulaization
    import seaborn as sns #Seaborn is developed on top of matplotlib library

So, we need a dataset for it. There are various datasets available in kaggle. And we take a simple dataset from Kaggle Heart Disease Dataset.

#loading dataset
df = pd.read_csv('./heart.csv') #The dataset is downloaded and saved to root folder.
df.head()

After this, we get the first five rows of the dataset.

However, if we look closely to Target feature, we see all 1's and if we make even close oversvation with df['target'], we see a pattern that first half of the dataset has 1 value in target feature and remaining has 0. This can be a great problem and can result to bad in the training, testing phase. So, what we do is shuffle this dataset using pandas.sample() method.

    df = df.sample(frac = 2, random_state=42, replace = True)
    df.head()

After this, when we analyse the dataset, we see random distribution of 0's and 1's in the target variable.

Now, what we do is see if there are any null or missing value in the dataset.

    df.isnull().sum()

Here, we can see that, there are no null values.

Now we are set to go for machine Learning Tasks

First we import required modules.

    #Import Machine Learning Libraries
    from sklearn.model_selection import train_test_split
    from sklearn import svm 
    from sklearn.model_selection import RandomizedSearchCV

The required modeules are imported. train_test_split is for dividing the dataset into training and testing sub-dataset. The svm is Support Vector Machine Algorithm. The RandomizedSearchCV is for hyperparameter Tuning. Alternative to RandomizedSearchCV is GridSearchCV, however RandomizedSearchCV is likely to be faster than GridSearchCV.

    X = df.drop(['target'], 1)
    y = df.target
    print(X.shape, y.shape)

Then we created a dataframe X which consists of Feature Variables, target is dropped because it is not Feature variable, rather it is target variable. y is defined as pandas series object with target as it only feature.

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    clf = svm.SVC()
    clf.fit(X_train, y_train)
    print("The accuracy of the classifier is {}".format(clf.score(X_test, y_test)))

In this step, we divided X, y into train and test sub-dataset. train_test_split returns four objects, so we stored those values into X_train, X_test, y_train, y_test. The parameters are X, y, and the test_size=0.2 parameters defines what percentage of dataset is to be described for test_set which in this case are X_test and y_test.
then we instantiated SVC (Support vector Classifier) into vairable clf and used fit() method to fit, X_train and y_train.
The accuracy of the classifier is found to be mere 68.85%.

Sadly, 68.85% percentage accuracy is very less, so we try to tune certain parameters and improve the accuracy.

#Lets try tuning some hyperparameters.
    param_dist = {'C': [0.1, 1, 10, 100, 1000],
    'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
    'kernel': ['rbf']
    }
    svc_hyper = RandomizedSearchCV(SVC(), param_distributions=param_dist, verbose=2, cv=3, random_state=42, n_iter=10, scoring='accuracy')
    svc_hyper.fit(X_train, y_train)

Here, we used different set of parameters like C, gamma and kernel to loop through set of combinations of prameters and at the end define which set of combination of these parameters gives the best result.
Here C is Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.
The parameter gamma is Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
And the parameter kernel Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used.
Here, we used only 'rbf' because other kernel takes significant time to get trained. You, yourself can try other kernels and see if that changes the results.
To know more about SVC, go through this.

    svc_hyper.best_params_

we get the best parameter as {'kernel': 'rbf', 'gamma': 0.001, 'C': 1000}. So, lets use it to fit the data.

    best_svc = SVC(C=1000, gamma=0.001, kernel='rbf')
    best_svc.fit(X_train, y_train)
    print("The accuracy of the classifier is {}".format(best_svc.score(X_test, y_test)))

After fitting the data using SVC method and using the best parameter, we got the accuracy to be 94.26%. That's remarkable to what we observe at first place as 68.85%.

Hence in this way, we can use RandomizedSearchCV to tune the parameters and increase the accuracy.

GitHub Repo of the code: https://github.com/bibekebib/Hyperpramater-tuning-article-code

Deepnote Shared code: https://deepnote.com/@bibek-chalise/Hyperparameters-Tuning-Tutorial-j46REW6sTXaWqbz8APnolQ#

If you want to try this with other Algoithms, here is a list of parameters that you can hypertune.

    #Random Forest 
    n_estimator = [int(x) for (x) in np.linspace(100, 1200, num=12)]
    max_depth = [int(x) for x in np.linspace(5, 30, num=6)]     
    min_samples_split = [2, 5, 10, 15, 100]
    min_samples_leaf = [1, 2, 5, 10] criterion = ['gini', 'entropy']
    param_dist = { "n_estimators" : n_estimator, "max_depth" : max_depth, "min_samples_leaf":min_samples_leaf, "criterion":criterion, "min_samples_split":min_samples_split }

    #KNN
    n_neighbors = [int(x) for x in np.linspace(start = 1, stop = 100, num = 50)]
    weights = ['uniform','distance'] 
    metric = ['euclidean','manhattan','chebyshev','seuclidean','minkowski'] 
    random_grid = { 'n_neighbors': n_neighbors, 'weights': weights, 'metric': metric, }

    #Logistic Regression
    param_dist = { 'penalty' : ['l1', 'l2'], 
    'C' : [0, 1, 2, 3, 4] 
    }

    #Gaussian Naive
    params_NB = {'var_smoothing': np.logspace(0,-9, num=100)}