This is a basic computer vision example using PyTorch.
What you'll find here :
- What is computer vision
- A simple classification example using a deep learning computer vision model
Without further ado lets begin... (before jumping right into the code please understand what computer vision is)
What is Computer Vision
Computer vision is a field of computer science that focuses on building intelligent systems that can process, analyse and and derive information from visual data like images, videos. Lately people are achieving marvellous things using 3D point cloud information...
FASHION MNIST CLASSIFICATION
Link to the dataset : Fashion Mnist
You dont need to download the dataset manually, they are included as part of pytorch
Its better if you use jupyter-notebook as the code in this blog is a step by step process with data visualisation in between for better understanding.
import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import FashionMNIST
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
Lets download our test and train dataset, the data that is used for training is stored in dataset and the test data is stored in test_dataset
dataset = FashionMNIST(root='data/', download=True, transform=ToTensor())
test_dataset = FashionMNIST(root='data/', train=False, transform=ToTensor())
The training process usually involves taking a small subset of training data for validation(this subset is not used in the training process). We have a subset of 10000 datapoints for validating our model.
val_size = 10000
train_size = len(dataset) - val_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
print("Training dataset size : ", len(train_ds))
print("Validation dataset size : ", len(val_ds))
Lets define our batch size which specifies how many data points we are planning to use in each iteration during the training process.
batch_size=128
DataLoader is a utility provided by pytorch that helps in creating an iterable for our dataset. The arguments shuffle is used to shuffle the data, num_workers is used for transferring data to RAM or whatever memory you are using and pin_memory is used for implicit CPU-to-CPU copy which helps to make the processing much more efficient.
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)
Lets Visualise out training dataset, the function below in jupyter notebook displays a grid of 126(our batch_size) images from the training subset.
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(make_grid(images, nrow=16).permute((1, 2, 0)))
break
The accuracy function compares our model output to the actual desired output and returns an accuracy score, this helps to monitor our training process.
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
And finally.... lets define our D.L model
init : contructor where the neural network layers are defined, here one output layer and 2 hidden layers are used which are made of linear layer
forward : This is where we have defined the model architecture,initially we are flattening our tensor as it has to pass through linear layers.
training_step : This is where the actual training happens, here we pass the data as input to a function called self which is part of the base class nn.Module which return the predictions as tensors.
validation_step : This is a very similar function to training_step , the only difference is that this is used for validating the training, this is where we use the accuracy function to evaluate our current state of model.
validation_epoch_end and epoch_end : These are used to display the state of our model at the end of the training process i.e after all epochs have completed.
class MnistModel(nn.Module):
"""Feedfoward neural network with 1 hidden layer"""
def __init__(self, in_size, out_size):
super().__init__()
# hidden layer
self.linear1 = nn.Linear(in_size, 16)
# hidden layer 2
self.linear2 = nn.Linear(16, 32)
# output layer
self.linear3 = nn.Linear(32, out_size)
def forward(self, xb):
# Flatten the image tensors
out = xb.view(xb.size(0), -1)
# Get intermediate outputs using hidden layer 1
out = self.linear1(out)
# Apply activation function
out = F.relu(out)
# Get intermediate outputs using hidden layer 2
out = self.linear2(out)
# Apply activation function
out = F.relu(out)
# Get predictions using output layer
out = self.linear3(out)
return out
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss, 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))
CPU or GPU, you can use anything to train the model, this function automatically detects your available device and assigns it for training.
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
device = get_default_device()
Now let's move the data to the chosen device, to understand this better if you are using GPU/CPU you cannot store the complete data into the the available memory(RAM/GPU-memory), so we are going to transfer it as batches at the time of training, this is an efficient way of training.
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
Now let's define a class to handle all the data movement
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)
Now we are defining two functions to explicitly handle the training and validation of the model
def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history
Each of our data point is of dimension 28*28 = 784 and we have 10 distinct labels(classes) which we need to classify
input_size = 784
num_classes = 10
Now let's initialise our model and move it inside GPU/CPU memory
model = MnistModel(input_size, out_size=num_classes)
to_device(model, device)
Initially evaluate the model without any training that is the model now has random weights and biases
history = [evaluate(model, val_loader)]
print("Before Training : ",history)
Now lets train it for 10 epochs and learning rate is 0.5
history += fit(10, 0.5, model, train_loader, val_loader)
and our training is done........
To visualise the loss and accuracy the code given below could be very helpful, run this is jupyter notebook
losses = [x['val_loss'] for x in history]
plt.plot(losses, '-x')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Loss vs. No. of epochs');
accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Accuracy vs. No. of epochs');
Lets do some predictions/classification with our trained model
def predict_image(img, model):
xb = to_device(img.unsqueeze(0), device)
yb = model(xb)
_, preds = torch.max(yb, dim=1)
return preds[0].item()
We are taking one image from the test_data subset to test our model
img, label = test_dataset[0]
plt.imshow(img[0], cmap='gray')
print('Label:', dataset.classes[label], ', Predicted:', dataset.classes[predict_image(img, model)])
Lets look at the fruit of our work
evaluate(model, test_loader)
This is what I got
Hope this blog helped you, by the way this my first blog so do provide valuable feedback, thanks !!!!
Top comments (0)