Understanding the Basics of Optimization Loops in PyTorch

#python #machinelearning #deeplearning #ai

cover image was generated with google's gemini.

PyTorch is a powerful, easy-to-use Python deep learning library, mainly used in areas of computer vision and natural language processing, two areas I have a huge interest in. With this article, I hope to help you understand the usage of optimization loops in PyTorch.

When creating a model, we typically divide the dataset into three parts: one for training, which carries a majority of the data, the second for evaluation, and the last for testing/inference. However, in some cases, we only need just the training set and testing set, depending on the project specification. After obtaining these sets, they undergo a series of operations with the goal of minimizing discrepancies between predictions and actual outcomes.

With this foundation, we can determine the loss function used to compute the loss value, followed by the optimizer. Both of which are vital for the optimization loop.

What is the optimization loop?

The optimization loop is an operation that involves performing two loops, the training loop; an operation that involves learning the relationships or patterns between the internal parameters by passing the training data through the model, and the testing loop, which involves passing the testing data through the trained model and evaluating how good the patterns are that the model learned on the training data. These processes, as the title of this article suggests are called "loops" because we want our model to look (loop through) at each sample in each dataset.

We now have a basic understanding of what the training and testing loops are, but how do we achieve them? There are a few steps involved in each of these processes, with the test loop containing fewer steps for reasons we will see later on.

Steps involved in the training loop:

Forward pass: here, we initiate a function that performs a one-time calculation as all of the training dataset is passed through the model at once.
Loss value: using the loss function obtained earlier, compare the training set with the predictions.
Zero gradient: set the gradients of the model parameters to zero, so they can be updated in the following steps.
Backpropagation: during the loop, a number of model parameters to be updated were obtained, now we have to go back and compute the gradient of the loss using those parameters.
Optimizer step: After performing the backpropagation and obtaining the loss gradients, we can then go on to update the model parameters using those results.

To help visualize what this might look like, take a look at Figure 1 below, which is an overview of what the training loop could look like in code. It won't always look like this, it is dependent on the project.

Figure 1: PyTorch training loop [image source]

Steps involved in the testing loop:

Forward pass: same as in step 1 above, except this time we use the test dataset.
Loss value: compute the loss value in the same manner as in the training loop, using test data.
Other evaluation computations: we often need to obtain other metrics to evaluate our model better, some of which are accuracy, precision, recall, F1, etc.

You may have noticed that the testing loop omits the backpropagation and optimizer step. This is because no model parameters are altered during testing; instead, they have already been determined. We are solely interested in the output of the forward run through the model for testing purposes. The annotated image(Figure 2) below illustrates the steps involved in the testing loop.

Figure 2: pytorch testing loop[image source]

Code example

If we added our understanding of the training loop and testing loop, below is what our optimization loop would look like in a basic project.

link to code:
https://github.com/stenwire/playground/blob/main/pytorch/01_pytorch_workflow_exercises.ipynb

code snippet:

# Optimization loop

# Train model for 300 epochs
epochs = 300

# Send data to target device
X_train = X_train.to(device)
y_train = y_train.to(device)
X_test = X_test.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
  def test():
   ### Perform testing every 20 epochs
    if epoch % 20 == 0:

      # Put model in evaluation mode and setup inference context 
      model.eval()
      # 1. Forward pass
      test_pred = model(X_test)
      # 2. Calculate test loss
      test_loss = loss_fn(test_pred, y_test)

  def train():
      ### Training

      # Put model in train mode
      model.train()

      # 1. Forward pass
      y_pred = model(X_train)

      # 2. Calculate loss
      loss = loss_fn(y_pred, y_train)

      # 3. Zero gradients
      optimizer.zero_grad()

      # 4. Backpropagation
      loss.backward()

      # 5. Step the optimizer
      optimizer.step()

The above code sample covers a PyTorch model that learns the pattern of a straight line and matches it, I would love to see other use cases, so feel free to reach out to me on any of the social platforms I added below, let's learn together.

Connect with me on LinkedIn:
https://www.linkedin.com/in/stephen-nwankwo-9876b4196/

Connect with me on X(Twitter): https://x.com/Sage_Sten

Checkout my Github: https://github.com/stenwire