DEV Community

praveenr
praveenr

Posted on

Computer Vision 101 - Fashion MNIST

This is a basic computer vision example using PyTorch.
What you'll find here :

  1. What is computer vision
  2. A simple classification example using a deep learning computer vision model

Without further ado lets begin... (before jumping right into the code please understand what computer vision is)

What is Computer Vision

Computer vision is a field of computer science that focuses on building intelligent systems that can process, analyse and and derive information from visual data like images, videos. Lately people are achieving marvellous things using 3D point cloud information...

FASHION MNIST CLASSIFICATION

Link to the dataset : Fashion Mnist
You dont need to download the dataset manually, they are included as part of pytorch
Its better if you use jupyter-notebook as the code in this blog is a step by step process with data visualisation in between for better understanding.

import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import FashionMNIST
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
Enter fullscreen mode Exit fullscreen mode

Lets download our test and train dataset, the data that is used for training is stored in dataset and the test data is stored in test_dataset

dataset = FashionMNIST(root='data/', download=True, transform=ToTensor())
test_dataset = FashionMNIST(root='data/', train=False, transform=ToTensor())
Enter fullscreen mode Exit fullscreen mode

The training process usually involves taking a small subset of training data for validation(this subset is not used in the training process). We have a subset of 10000 datapoints for validating our model.

val_size = 10000
train_size = len(dataset) - val_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
print("Training dataset size : ", len(train_ds)) 
print("Validation dataset size : ", len(val_ds))
Enter fullscreen mode Exit fullscreen mode

Lets define our batch size which specifies how many data points we are planning to use in each iteration during the training process.

batch_size=128
Enter fullscreen mode Exit fullscreen mode

DataLoader is a utility provided by pytorch that helps in creating an iterable for our dataset. The arguments shuffle is used to shuffle the data, num_workers is used for transferring data to RAM or whatever memory you are using and pin_memory is used for implicit CPU-to-CPU copy which helps to make the processing much more efficient.

train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)
Enter fullscreen mode Exit fullscreen mode

Lets Visualise out training dataset, the function below in jupyter notebook displays a grid of 126(our batch_size) images from the training subset.

for images, _ in train_loader:
    print('images.shape:', images.shape)
    plt.figure(figsize=(16,8))
    plt.axis('off')
    plt.imshow(make_grid(images, nrow=16).permute((1, 2, 0)))
    break
Enter fullscreen mode Exit fullscreen mode

Image description
The accuracy function compares our model output to the actual desired output and returns an accuracy score, this helps to monitor our training process.

def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))
Enter fullscreen mode Exit fullscreen mode

And finally.... lets define our D.L model

  • init : contructor where the neural network layers are defined, here one output layer and 2 hidden layers are used which are made of linear layer

  • forward : This is where we have defined the model architecture,initially we are flattening our tensor as it has to pass through linear layers.

  • training_step : This is where the actual training happens, here we pass the data as input to a function called self which is part of the base class nn.Module which return the predictions as tensors.

  • validation_step : This is a very similar function to training_step , the only difference is that this is used for validating the training, this is where we use the accuracy function to evaluate our current state of model.

  • validation_epoch_end and epoch_end : These are used to display the state of our model at the end of the training process i.e after all epochs have completed.

class MnistModel(nn.Module):
    """Feedfoward neural network with 1 hidden layer"""
    def __init__(self, in_size, out_size):
        super().__init__()
        # hidden layer
        self.linear1 = nn.Linear(in_size, 16)
        # hidden layer 2
        self.linear2 = nn.Linear(16, 32)
        # output layer
        self.linear3 = nn.Linear(32, out_size)

    def forward(self, xb):
        # Flatten the image tensors
        out = xb.view(xb.size(0), -1)
        # Get intermediate outputs using hidden layer 1
        out = self.linear1(out)
        # Apply activation function
        out = F.relu(out)
        # Get intermediate outputs using hidden layer 2
        out = self.linear2(out)
        # Apply activation function
        out = F.relu(out)
        # Get predictions using output layer
        out = self.linear3(out)
        return out

    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss

    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss, 'val_acc': acc}

    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

    def epoch_end(self, epoch, result):
        print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))


Enter fullscreen mode Exit fullscreen mode

CPU or GPU, you can use anything to train the model, this function automatically detects your available device and assigns it for training.

def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
device = get_default_device()
Enter fullscreen mode Exit fullscreen mode

Now let's move the data to the chosen device, to understand this better if you are using GPU/CPU you cannot store the complete data into the the available memory(RAM/GPU-memory), so we are going to transfer it as batches at the time of training, this is an efficient way of training.

def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)
Enter fullscreen mode Exit fullscreen mode

Now let's define a class to handle all the data movement

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device

    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)
Enter fullscreen mode Exit fullscreen mode
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)
Enter fullscreen mode Exit fullscreen mode

Now we are defining two functions to explicitly handle the training and validation of the model

def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result)
        history.append(result)
    return history
Enter fullscreen mode Exit fullscreen mode

Each of our data point is of dimension 28*28 = 784 and we have 10 distinct labels(classes) which we need to classify

input_size = 784
num_classes = 10
Enter fullscreen mode Exit fullscreen mode

Now let's initialise our model and move it inside GPU/CPU memory

model = MnistModel(input_size, out_size=num_classes)
to_device(model, device)
Enter fullscreen mode Exit fullscreen mode

Initially evaluate the model without any training that is the model now has random weights and biases

history = [evaluate(model, val_loader)]
print("Before Training : ",history)
Enter fullscreen mode Exit fullscreen mode

Now lets train it for 10 epochs and learning rate is 0.5

history += fit(10, 0.5, model, train_loader, val_loader)
Enter fullscreen mode Exit fullscreen mode

and our training is done........

To visualise the loss and accuracy the code given below could be very helpful, run this is jupyter notebook

losses = [x['val_loss'] for x in history]
plt.plot(losses, '-x')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Loss vs. No. of epochs');
Enter fullscreen mode Exit fullscreen mode

Image description

accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Accuracy vs. No. of epochs');
Enter fullscreen mode Exit fullscreen mode

Image description

Lets do some predictions/classification with our trained model

def predict_image(img, model):
    xb = to_device(img.unsqueeze(0), device)
    yb = model(xb)
    _, preds  = torch.max(yb, dim=1)
    return preds[0].item()
Enter fullscreen mode Exit fullscreen mode

We are taking one image from the test_data subset to test our model

img, label = test_dataset[0]
plt.imshow(img[0], cmap='gray')
print('Label:', dataset.classes[label], ', Predicted:', dataset.classes[predict_image(img, model)])
Enter fullscreen mode Exit fullscreen mode

Lets look at the fruit of our work

evaluate(model, test_loader)
Enter fullscreen mode Exit fullscreen mode

This is what I got

Image description

Hope this blog helped you, by the way this my first blog so do provide valuable feedback, thanks !!!!

Top comments (0)