Mostafa Gazar

Posted on Nov 23, 2019

Train a lines segmentation model using Pytorch

#machinelearning #python #keras #pytorch

Let us start by identifying the problem we want to solve which is inspired by this project.

Given an image containing lines of text, returns a pixelwise labeling of that image, with each pixel belonging to either background or line of handwriting.

The project structure

It consists of 5 main sections, one for notebooks, one for the shared python code, datasets, Google Cloud scripts and one for saving the model weights.

In a production project, you will probably have more directories like web and api.

I also chose to use pipenv instead of conda and virtualenv to manage my python environment. I only recently switched to pipenv from conda and I found it to consistently work as expected everywhere.

For GPU training, I used a google cloud instance with one T4 Nvidia GPU. Bash scripts manage the instance lifecycle, from creating it initially to starting it, connecting to it and stopping it.

Data

The dataset is described in a toml file inside the raw directory, a toml file basically consists of key, value pairs. The other directories under data are git ignored because they will contain the actual full datasets downloads.

Notebooks

I use notebooks for exploration and as a high-level container for the code required to construct, clean datasets and build a training basic pipeline.

Python files

Under the src directory I keep the code that can be shared and reused between various notebooks. Following good Software Engineering practices is a key to get things done quickly and correctly, finding and identifying bugs in ML code can be extremely hard. That is why you would want to start small and reiterate often.

The python environment

You can install pipenv on Linux or mac using linuxbrew or macbrew with the following command:

brew install pipenv

And then you can download your dependencies using pipenv install SOMETHING from your project directory.

The dataset

I will use this old academic dataset here as a base to build a lines segmentation dataset to train a UNet mini-network to detect lines of handwriting.

The original images in the dataset look like the following, they also come with XML files that define the bounding boxes.

In notebooks/01-explore-iam-dataset.ipynb I downloaded the dataset, unzipped it and then overplayed some random images with the data from the XML file.

Next, I cropped the images and generated masks images to match the new dimensions. The mask images are the ground truth images that we will use for training the final model.

Finally, I split the data into train, valid and test

The Network

Because we do not have a lot of data available for training, I used a mini version of the UNet architecture based on this Keras implementation.

And using this great library I can visualize the network by doing a feedforward with a specific input size.

The Training Pipeline

Now that we have the data ready and the network that we want to train defined, it is time to build a basic training pipeline.

First is defining a torch dataset and iterate through it using a DataLoader

from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils


class FormsDataset(Dataset):

    def __init__(self, images, masks, num_classes: int, transforms=None):
        self.images = images
        self.masks = masks
        self.num_classes = num_classes
        self.transforms = transforms

    def __getitem__(self, idx):
        image = self.images[idx]
        image = image.astype(np.float32)
        image = np.expand_dims(image, -1)
        image = image / 255
        if self.transforms:
            image = self.transforms(image)

        mask = self.masks[idx]
        mask = mask.astype(np.float32)
        mask = mask / 255
        mask[mask > .7] = 1
        mask[mask <= .7] = 0
        if self.transforms:
            mask = self.transforms(mask)

        return image, mask

    def __len__(self):
        return len(self.images)

train_dataset = FormsDataset(train_images, train_masks, number_of_classes, get_transformations(True))
train_data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
print(f'Train dataset has {len(train_data_loader)} batches of size {batch_size}')

Next, I define the training loop

# Use gpu for training if available else use cpu
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Here is the loss and optimizer definition
criterion = torch.nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# The training loop
total_steps = len(train_data_loader)
print(f"{epochs} epochs, {total_steps} total_steps per epoch")

for epoch in range(epochs):
    for i, (images, masks) in enumerate(train_data_loader, 1):
        images = images.to(device)
        masks = masks.type(torch.LongTensor)
        masks = masks.reshape(masks.shape[0], masks.shape[2], masks.shape[3])
        masks = masks.to(device)

        # Forward pass
        outputs = model(images)
        softmax = F.log_softmax(outputs, dim=1)
        loss = criterion(softmax, masks)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i) % 100 == 0:
            print (f"Epoch [{epoch + 1}/{epochs}], Step [{i}/{total_steps}], Loss: {loss.item():4f}")

Here are the final predictions

You can check a Keras backed by TF2 implementation here.

Thanks for making it this far. The last thing I would like to say is that unfortunately, most of the available online materials either offer bad advice or are very basic that they do not actually offer much value and some are plain wrong. There are some great resources though like their 60-minute blitz series and great API docs. There is also this cheat sheet and this great GitHub repo.

If you enjoyed reading this post and found it helpful I would love to hear from you, my Twitter DMs are open.

DEV Community