Introduction to PyTorch

#deeplearning #pytorch

Early frameworks required defining the entire model structure upfront and couldn't use normal Python control flow. PyTorch addressed this issue.

A single neuron is a linear equation with weight (W) and bias (b).

Even with multiple inputs a single neuron will still be a linear equation with weights corresponding to each parameter and a bias value.

Higher level API than TensorFlow and JAX.
Includes layers and optimizers like Keras APIs
PyTorch tensors are assignable.
A parameter can only be created using torch.Tensor value. no numpy arrays allowed.

import torch
torch.ones(size=(2, 1))
torch.zeros(size=(2, 1))
torch.tensor([1, 2, 3], dtype=torch.float32)

torch.normal(mean=torch.zeros(size=(3,1)),
    std=torch.ones(size=(3,1)))

x = torch.zeros(size=(2, 1))
x[0,0] = 1

x = torch.zeros(size=(2, 1))
p = torch.nn.parameter.Parameter(data=x)  #1

a = torch.ones((2, 2))
b = torch.square(a)  #1
c = torch.sqrt(a)  #2
d = b + c  #3
e = torch.matmul(a, b)  #4
f = torch.cat((a, b), dim=0)  #5

def dense(inputs, W, b):
    return torch.nn.relu(torch.matmul(inputs, W) + b)

input_var = torch.tensor(3.0, requires_grad=True)  #1
result = torch.square(input_var)
result.backward()  #2
gradient = input_var.grad  #2

The general idea is to define a subclass of torch.nn.Module, which will:
- Hold some Parameters, to store state variables. Those are defined in the __init__() method.

- Implement the forward pass computation in the forward() method.
Benefits:
- Debugging is easier as PyTorch code runs eagerly and does not require compilation (which can be optionally performed). In TensorFlow or JAX compilation is required at some point.
- Hugging Face has first-class support for PyTorch so any model you would like to use from Hugging Face is likely available in PyTorch.
- PyTorch is much slower than JAX and for larger models it could be 3-5 times slower compared to if the model were implemented in JAX.

Implementing linear classifier in PyTorch:

input_dim = 2
output_dim = 1

class LinearModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.W = torch.nn.Parameter(torch.rand(input_dim, output_dim))
        self.b = torch.nn.Parameter(torch.zeros(output_dim))

    def forward(self, inputs):
        return torch.matmul(inputs, self.W) + self.b

model = LinearModel()
torch_inputs = torch.tensor(inputs)
output = model(torch_inputs)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

def training_step(inputs, targets):
    predictions = model(inputs)
    loss = mean_squared_error(targets, predictions)
    loss.backward()
    optimizer.step()
    model.zero_grad()
    return loss 

compiled_model = torch.compile(model)

Gather raw data - Get raw dataset for ingestion
Data prep - Clean the dataset to fix any errors, missing values etc, Transforming the data to different formats, engineering new features (like converting an address to a distance) etc.
Modeling
Training the model with training set
Evaluation of model using test set
Deployment

import torch                 # core functionality of pytorch
import torch.nn as nn.       # components for building neural networks
import torch.optim as optim  # tools fr training those networks
distances = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
times = torch.tensor([[6.96], [12.11], [16.77], [22.21]], dtype=torch.float32)
# Define the model
model = nn.Sequential(nn.Linear(1, 1))
# Define the loss function and the optimizer
loss_function = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

Activation Function

Adding activation function to the output of a neuron converts the linear equation to a non-linear one. This enables the neurons to learn patterns which are non-linear.

E.g. nn.ReLU()

model = nn.Sequential(nn.Linear(1, 1), nn.ReLU())
Adding a ReLU layer in the model as shown above.

First linear layer takes 1 input and gives 5 outputs
ReLU layer just does the transform and the last linear layer then taken the 5 inputs and produces the final 1 output.

Code: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/C1M1_Assignment.ipynb

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

# Load the dataset from the CSV file
file_path = './data_with_features.csv'
data_df = pd.read_csv(file_path)

def rush_hour_feature(hours_tensor, weekends_tensor):
    """
    Engineers a new binary feature indicating if a delivery is in a weekday rush hour.

    Args:
        hours_tensor (torch.Tensor): A tensor of delivery times of day.
        weekends_tensor (torch.Tensor): A tensor indicating if a delivery is on a weekend.

    Returns:
        torch.Tensor: A tensor of 0s and 1s indicating weekday rush hour.
    """

    ### START CODE HERE ###

    # Define rush hour and weekday conditions
    is_morning_rush = (hours_tensor >= 8.0) & (hours_tensor < 10.0)
    is_evening_rush = (hours_tensor >= 16.0) & (hours_tensor < 19.0)
    is_weekday = weekends_tensor == 0

    # Combine the conditions to create the final rush hour mask
    is_rush_hour_mask = is_weekday & (is_morning_rush | is_evening_rush)

    ### END CODE HERE ###

    # Convert the boolean mask to a float tensor to use as a numerical feature
    return is_rush_hour_mask.float()


def prepare_data(df):
    """
    Converts a pandas DataFrame into prepared PyTorch tensors for modeling.

    Args:
        df (pd.DataFrame): A pandas DataFrame containing the raw delivery data.

    Returns:
        prepared_features (torch.Tensor): The final 2D feature tensor for the model.
        prepared_targets (torch.Tensor): The final 2D target tensor.
        results_dict (dict): A dictionary of intermediate tensors for testing purposes.
    """

    # Extract the data from the DataFrame as a NumPy array
    # (There's no direct torch.from_dataframe(), so we use .values to get a NumPy array first)
    all_values = df.values

    ### START CODE HERE ###

    # Convert all the values from the DataFrame into a single PyTorch tensor
    full_tensor = torch.tensor(all_values, dtype=torch.float32)

    # Use tensor slicing to separate out each raw column
    raw_distances = full_tensor[:, 0]
    raw_hours = full_tensor[:, 1]
    raw_weekends = full_tensor[:, 2]
    raw_targets = full_tensor[:, 3]

    # Call your rush_hour_feature() function to engineer the new feature
    is_rush_hour_feature = rush_hour_feature(raw_hours, raw_weekends)

    # Use the .unsqueeze(1) method to reshape the four 1D feature tensors into 2D column vectors
    distances_col = raw_distances.unsqueeze(1)
    hours_col = raw_hours.unsqueeze(1)
    weekends_col = raw_weekends.unsqueeze(1)
    rush_hour_col = is_rush_hour_feature.unsqueeze(1)

    ### END CODE HERE ###

    # Normalize the continuous feature columns (distance and time)
    dist_mean, dist_std = distances_col.mean(), distances_col.std()
    hours_mean, hours_std = hours_col.mean(), hours_col.std()

    distances_norm = (distances_col - dist_mean) / dist_std
    hours_norm = (hours_col - hours_mean) / hours_std

    # Combine all prepared 2D features into a single tensor
    prepared_features = torch.cat([
        distances_norm,
        hours_norm,
        weekends_col,
        rush_hour_col
    ], dim=1) # dim=1 concatenates them column-wise, stacking features side by side

    # Prepare targets by ensuring they are the correct shape
    prepared_targets = raw_targets.unsqueeze(1)

    # Dictionary for Testing Purposes
    results_dict = {
        'full_tensor': full_tensor,
        'raw_distances': raw_distances,
        'raw_hours': raw_hours,
        'raw_weekends': raw_weekends,
        'raw_targets': raw_targets,
        'distances_col': distances_col,
        'hours_col': hours_col,
        'weekends_col': weekends_col,
        'rush_hour_col': rush_hour_col
    }


    return prepared_features, prepared_targets, results_dict

# Process the entire DataFrame to get the final feature and target tensors.
features, targets, _ = prepare_data(data_df)

def init_model():
    """
    Initializes the neural network model, optimizer, and loss function.

    Returns:
        model (nn.Sequential): The initialized PyTorch sequential model.
        optimizer (torch.optim.Optimizer): The initialized optimizer for training.
        loss_function: The initialized loss function.
    """

    # Set the random seed for reproducibility of results (DON'T MANIPULATE IT)
    torch.manual_seed(41)

    ### START CODE HERE ###

    # Define the model architecture using nn.Sequential
    model = nn.Sequential(
        # Input layer (Linear): 4 input features, 64 output features
        nn.Linear(4, 64),
        # First ReLU activation function
        nn.ReLU(),
        # Hidden layer (Linear): 64 inputs, 32 outputs
        nn.Linear(64, 32),
        # Second ReLU activation function
        nn.ReLU(),
        # Output layer (Linear): 32 inputs, 1 output (the prediction)
        nn.Linear(32, 1)
    ) 

    # Define the optimizer (Stochastic Gradient Descent)
    optimizer = optim.SGD(model.parameters(), lr=0.01)

    # Define the loss function (Mean Squared Error for regression)
    loss_function = nn.MSELoss()

    ### END CODE HERE ###

    return model, optimizer, loss_function

model, optimizer, loss_function = init_model()

def train_model(features, targets, epochs, verbose=True):
    """
    Trains the model using the provided data for a number of epochs.

    Args:
        features (torch.Tensor): The input features for training.
        targets (torch.Tensor): The target values for training.
        epochs (int): The number of training epochs.
        verbose (bool): If True, prints training progress. Defaults to True.

    Returns:
        model (nn.Sequential): The trained model.
        losses (list): A list of loss values recorded every 5000 epochs.
    """

    # Initialize a list to store the loss
    losses = []

    ### START CODE HERE ###

    # Initialize the model, optimizer, and loss function using `init_model`
    model, optimizer, loss_function = init_model()

    # Loop through the specified number of epochs
    for epoch in range(epochs):

        # Forward pass: Make predictions
        outputs = model(features)

        # Calculate the loss
        loss = loss_function(outputs, targets)

        # Zero the gradients
        optimizer.zero_grad()

        # Backward pass: Compute gradients
        loss.backward()

        # Update the model's parameters
        optimizer.step()

    ### END CODE HERE ### 

        # Every 5000 epochs, record the loss and print the progress
        if (epoch + 1) % 5000 == 0:
            losses.append(loss.item())
            if verbose:
                print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

    return model, losses

# Training loop
model, loss = train_model(features, targets, 30000)

# Disable gradient calculation for efficient predictions
with torch.no_grad():
    # Perform a forward pass to get model predictions
    predicted_outputs = model(features)

# Change the values below to get an estimate for a different delivery
# Set distance for the delivery in miles
distance_miles = 10 
# Set time of day in 24-hour format (e.g., 9.5 for 9:30 AM)
time_of_day_hours = 15
# Use True/False or 1/0 to indicate if it's a weekend
is_weekend = 1

# Convert the raw inputs into a 2D tensor for the model
raw_input_tensor = torch.tensor([[distance_miles, time_of_day_hours, is_weekend]], dtype=torch.float32)
helper_utils.prediction(model, data_df, raw_input_tensor, rush_hour_feature)

Output:

+------------------------------------------+-----------------------+
|                         Model Prediction                         |
+------------------------------------------+-----------------------+
| Time of the Week                         | Weekend               |
| Distance                                 | 10.0 miles            |
| Time                                     | 15:00                 |
| Is this considered a rush hour period?   | No                    |
+------------------------------------------+-----------------------+
| Estimated Delivery Time                  | 27.50 minutes         |
+------------------------------------------+-----------------------+

Note: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/helper_utils.py#L517 has the prediction() helper_utils function defined.