DEV Community

Steven Mathew
Steven Mathew

Posted on

A Simple Convolutional neural network (CNN) on Google Colab?

Here is breakdown of running a simple convolutional neural network on Google Colab with its T4 GPU (free version).
We will be using pytorch for training this Convolutional Neural Network.

Basics First:
A Convolutional Neural Network (CNN) is a type of artificial neural network specifically designed for processing structured grid data, such as images.

It has the following:

Neurons: Basic units that process information.
Layers: Groups of neurons stacked together and the information passes through these layers, getting transformed at each step.

Convolution: This is like sliding a small window (called a filter or kernel) over an image and looking for specific patterns.

Filters: These are the small windows that detect specific features like edges, textures, or shapes in the images.

Pooling: This reduces the size of the data (image) while keeping important information

After going through multiple convolutional and pooling layers, the information becomes a stretched-out vector and then passed into fully connected layers.
These layers are the classic parts of a neural network that ultimately determine the outcome, such as recognizing the content of an image (for example, whether it's a cat, dog, or car).

STEPS:

1) Select the GPU in Colab
Image description

Image description

2) Check If GPU is present and available for our use:

`import torch

print("PyTorch version:", torch.version)
if torch.cuda.is_available():
print("GPU is available for PyTorch!")
else:
print("No GPU found for PyTorch.")`

This is let you know if GPU is available for usage or not.

3) We will load libraries:

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

Explanation:
import torch: Main PyTorch library
torch.nn: Contains neural network components
torch.optim: Optimization algorithms
dataset, transforms -> torchhvision: Handling & Datasets
Dataloader: Efficiently load data in batches

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

Explanation:
transforms.Compose: Composing several transforms together.
But what are composing and transformation?

Transforms: These are operations applied to images, such as resizing, cropping, rotating, normalizing, converting to tensors, etc.
Compose: This is a method to combine several of these transform operations into a single operation.

transforms.ToTensor(): : Converts a PIL Image or NumPy array to a PyTorch tensor and scales the image pixel values to the range [0, 1]
What in the heaven does that mean???

Converts a PIL Image or NumPy array to a PyTorch tensor:

PIL Image: This is an image format provided by the Python Imaging Library (PIL), often used for loading and processing images.

NumPy array: This is a format provided by the NumPy library, often used for numerical operations in Python.

PyTorch tensor: This is the *data format used in PyTorch * for all operations, particularly useful for GPU acceleration.

Example:

PIL Image -> Tensor

A PIL image might look like this: PIL.Image.Image image mode=RGB size=256x256 at 0x7F8B9C4CBB80
After applying transforms.ToTensor(), it converts the image to a PyTorch tensor: torch.Size([3, 256, 256])
The tensor shape [3, 256, 256] indicates that the image has 3 color channels (Red, Green, Blue), and each channel is 256x256 pixels.

Numpy Array -> Tensor
A NumPy array might look like this: array([[[255, 0, 0], ..., [0, 0, 255]]], dtype=uint8)
After applying transforms.ToTensor(), it converts the array to a PyTorch tensor with the same shape but scaled pixel values.

Scales the image pixel values to the range [0, 1]:
Image pixel values in a PIL Image or NumPy array typically range from 0 to 255 for each color channel (Red, Green, Blue).

Example:
Before Scaling:
Pixel values in a typical image range from 0 to 255.
For example, a pixel value of 255 represents full intensity (white) and 0 represents no intensity (black).

After Scaling:
Each pixel value is divided by 255, so the new range is [0, 1].
For example, a pixel value of 255 becomes 1.0, and a pixel value of 0 remains 0.0.

transforms.ToTensor() divides each pixel value by 255, converting the range to [0, 1].
This scaling is important for neural networks because it helps in stabilizing the training process and improving the convergence.

EXAMPLE CODE:
`from PIL import Image
import numpy as np
import torchvision.transforms as transforms

Load a PIL image

pil_image = Image.open('path_to_image.jpg')

Convert to PyTorch tensor and scale pixel values

transform = transforms.ToTensor()
tensor_image = transform(pil_image)

print(tensor_image.shape)
print(tensor_image.min(), tensor_image.max())`

OUTPUT
Output: torch.Size([3, height, width])
Output: 0.0 1.0

Top comments (0)