DEV Community

Akriva Labs
Akriva Labs

Posted on

Introduction to PyTorch

Early frameworks required defining the entire model structure upfront and couldn't use normal Python control flow. PyTorch addressed this issue.

Single Neuron
A single neuron is a linear equation with weight (W) and bias (b).

Single Neuron
Even with multiple inputs a single neuron will still be a linear equation with weights corresponding to each parameter and a bias value.

  • Higher level API than TensorFlow and JAX.
  • Includes layers and optimizers like Keras APIs
  • PyTorch tensors are assignable.
  • A parameter can only be created using torch.Tensor value. no numpy arrays allowed.
import torch
torch.ones(size=(2, 1))
torch.zeros(size=(2, 1))
torch.tensor([1, 2, 3], dtype=torch.float32)

torch.normal(mean=torch.zeros(size=(3,1)),
    std=torch.ones(size=(3,1)))

x = torch.zeros(size=(2, 1))
x[0,0] = 1

x = torch.zeros(size=(2, 1))
p = torch.nn.parameter.Parameter(data=x)  #1

a = torch.ones((2, 2))
b = torch.square(a)  #1
c = torch.sqrt(a)  #2
d = b + c  #3
e = torch.matmul(a, b)  #4
f = torch.cat((a, b), dim=0)  #5

def dense(inputs, W, b):
    return torch.nn.relu(torch.matmul(inputs, W) + b)

input_var = torch.tensor(3.0, requires_grad=True)  #1
result = torch.square(input_var)
result.backward()  #2
gradient = input_var.grad  #2

Enter fullscreen mode Exit fullscreen mode

The general idea is to define a subclass of torch.nn.Module, which will:
- Hold some Parameters, to store state variables. Those are defined in the __init__() method.

- Implement the forward pass computation in the forward() method.
Benefits:
- Debugging is easier as PyTorch code runs eagerly and does not require compilation (which can be optionally performed). In TensorFlow or JAX compilation is required at some point.
- Hugging Face has first-class support for PyTorch so any model you would like to use from Hugging Face is likely available in PyTorch.
- PyTorch is much slower than JAX and for larger models it could be 3-5 times slower compared to if the model were implemented in JAX.

Implementing linear classifier in PyTorch:

input_dim = 2
output_dim = 1

class LinearModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.W = torch.nn.Parameter(torch.rand(input_dim, output_dim))
        self.b = torch.nn.Parameter(torch.zeros(output_dim))

    def forward(self, inputs):
        return torch.matmul(inputs, self.W) + self.b

model = LinearModel()
torch_inputs = torch.tensor(inputs)
output = model(torch_inputs)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

def training_step(inputs, targets):
    predictions = model(inputs)
    loss = mean_squared_error(targets, predictions)
    loss.backward()
    optimizer.step()
    model.zero_grad()
    return loss 

compiled_model = torch.compile(model)
Enter fullscreen mode Exit fullscreen mode

ML Pipeline

  1. Gather raw data - Get raw dataset for ingestion
  2. Data prep - Clean the dataset to fix any errors, missing values etc, Transforming the data to different formats, engineering new features (like converting an address to a distance) etc.
  3. Modeling
  4. Training the model with training set
  5. Evaluation of model using test set
  6. Deployment
import torch                 # core functionality of pytorch
import torch.nn as nn.       # components for building neural networks
import torch.optim as optim  # tools fr training those networks
distances = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
times = torch.tensor([[6.96], [12.11], [16.77], [22.21]], dtype=torch.float32)
# Define the model
model = nn.Sequential(nn.Linear(1, 1))
# Define the loss function and the optimizer
loss_function = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
Enter fullscreen mode Exit fullscreen mode

Activation Function

Adding activation function to the output of a neuron converts the linear equation to a non-linear one. This enables the neurons to learn patterns which are non-linear.

E.g. nn.ReLU()

model = nn.Sequential(nn.Linear(1, 1), nn.ReLU())
Adding a ReLU layer in the model as shown above.


First linear layer takes 1 input and gives 5 outputs
ReLU layer just does the transform and the last linear layer then taken the 5 inputs and produces the final 1 output.

Code: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/C1M1_Assignment.ipynb

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

# Load the dataset from the CSV file
file_path = './data_with_features.csv'
data_df = pd.read_csv(file_path)

def rush_hour_feature(hours_tensor, weekends_tensor):
    """
    Engineers a new binary feature indicating if a delivery is in a weekday rush hour.

    Args:
        hours_tensor (torch.Tensor): A tensor of delivery times of day.
        weekends_tensor (torch.Tensor): A tensor indicating if a delivery is on a weekend.

    Returns:
        torch.Tensor: A tensor of 0s and 1s indicating weekday rush hour.
    """

    ### START CODE HERE ###

    # Define rush hour and weekday conditions
    is_morning_rush = (hours_tensor >= 8.0) & (hours_tensor < 10.0)
    is_evening_rush = (hours_tensor >= 16.0) & (hours_tensor < 19.0)
    is_weekday = weekends_tensor == 0

    # Combine the conditions to create the final rush hour mask
    is_rush_hour_mask = is_weekday & (is_morning_rush | is_evening_rush)

    ### END CODE HERE ###

    # Convert the boolean mask to a float tensor to use as a numerical feature
    return is_rush_hour_mask.float()


def prepare_data(df):
    """
    Converts a pandas DataFrame into prepared PyTorch tensors for modeling.

    Args:
        df (pd.DataFrame): A pandas DataFrame containing the raw delivery data.

    Returns:
        prepared_features (torch.Tensor): The final 2D feature tensor for the model.
        prepared_targets (torch.Tensor): The final 2D target tensor.
        results_dict (dict): A dictionary of intermediate tensors for testing purposes.
    """

    # Extract the data from the DataFrame as a NumPy array
    # (There's no direct torch.from_dataframe(), so we use .values to get a NumPy array first)
    all_values = df.values

    ### START CODE HERE ###

    # Convert all the values from the DataFrame into a single PyTorch tensor
    full_tensor = torch.tensor(all_values, dtype=torch.float32)

    # Use tensor slicing to separate out each raw column
    raw_distances = full_tensor[:, 0]
    raw_hours = full_tensor[:, 1]
    raw_weekends = full_tensor[:, 2]
    raw_targets = full_tensor[:, 3]

    # Call your rush_hour_feature() function to engineer the new feature
    is_rush_hour_feature = rush_hour_feature(raw_hours, raw_weekends)

    # Use the .unsqueeze(1) method to reshape the four 1D feature tensors into 2D column vectors
    distances_col = raw_distances.unsqueeze(1)
    hours_col = raw_hours.unsqueeze(1)
    weekends_col = raw_weekends.unsqueeze(1)
    rush_hour_col = is_rush_hour_feature.unsqueeze(1)

    ### END CODE HERE ###

    # Normalize the continuous feature columns (distance and time)
    dist_mean, dist_std = distances_col.mean(), distances_col.std()
    hours_mean, hours_std = hours_col.mean(), hours_col.std()

    distances_norm = (distances_col - dist_mean) / dist_std
    hours_norm = (hours_col - hours_mean) / hours_std

    # Combine all prepared 2D features into a single tensor
    prepared_features = torch.cat([
        distances_norm,
        hours_norm,
        weekends_col,
        rush_hour_col
    ], dim=1) # dim=1 concatenates them column-wise, stacking features side by side

    # Prepare targets by ensuring they are the correct shape
    prepared_targets = raw_targets.unsqueeze(1)

    # Dictionary for Testing Purposes
    results_dict = {
        'full_tensor': full_tensor,
        'raw_distances': raw_distances,
        'raw_hours': raw_hours,
        'raw_weekends': raw_weekends,
        'raw_targets': raw_targets,
        'distances_col': distances_col,
        'hours_col': hours_col,
        'weekends_col': weekends_col,
        'rush_hour_col': rush_hour_col
    }


    return prepared_features, prepared_targets, results_dict

# Process the entire DataFrame to get the final feature and target tensors.
features, targets, _ = prepare_data(data_df)

def init_model():
    """
    Initializes the neural network model, optimizer, and loss function.

    Returns:
        model (nn.Sequential): The initialized PyTorch sequential model.
        optimizer (torch.optim.Optimizer): The initialized optimizer for training.
        loss_function: The initialized loss function.
    """

    # Set the random seed for reproducibility of results (DON'T MANIPULATE IT)
    torch.manual_seed(41)

    ### START CODE HERE ###

    # Define the model architecture using nn.Sequential
    model = nn.Sequential(
        # Input layer (Linear): 4 input features, 64 output features
        nn.Linear(4, 64),
        # First ReLU activation function
        nn.ReLU(),
        # Hidden layer (Linear): 64 inputs, 32 outputs
        nn.Linear(64, 32),
        # Second ReLU activation function
        nn.ReLU(),
        # Output layer (Linear): 32 inputs, 1 output (the prediction)
        nn.Linear(32, 1)
    ) 

    # Define the optimizer (Stochastic Gradient Descent)
    optimizer = optim.SGD(model.parameters(), lr=0.01)

    # Define the loss function (Mean Squared Error for regression)
    loss_function = nn.MSELoss()

    ### END CODE HERE ###

    return model, optimizer, loss_function

model, optimizer, loss_function = init_model()

def train_model(features, targets, epochs, verbose=True):
    """
    Trains the model using the provided data for a number of epochs.

    Args:
        features (torch.Tensor): The input features for training.
        targets (torch.Tensor): The target values for training.
        epochs (int): The number of training epochs.
        verbose (bool): If True, prints training progress. Defaults to True.

    Returns:
        model (nn.Sequential): The trained model.
        losses (list): A list of loss values recorded every 5000 epochs.
    """

    # Initialize a list to store the loss
    losses = []

    ### START CODE HERE ###

    # Initialize the model, optimizer, and loss function using `init_model`
    model, optimizer, loss_function = init_model()

    # Loop through the specified number of epochs
    for epoch in range(epochs):

        # Forward pass: Make predictions
        outputs = model(features)

        # Calculate the loss
        loss = loss_function(outputs, targets)

        # Zero the gradients
        optimizer.zero_grad()

        # Backward pass: Compute gradients
        loss.backward()

        # Update the model's parameters
        optimizer.step()

    ### END CODE HERE ### 

        # Every 5000 epochs, record the loss and print the progress
        if (epoch + 1) % 5000 == 0:
            losses.append(loss.item())
            if verbose:
                print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

    return model, losses

# Training loop
model, loss = train_model(features, targets, 30000)

# Disable gradient calculation for efficient predictions
with torch.no_grad():
    # Perform a forward pass to get model predictions
    predicted_outputs = model(features)

# Change the values below to get an estimate for a different delivery
# Set distance for the delivery in miles
distance_miles = 10 
# Set time of day in 24-hour format (e.g., 9.5 for 9:30 AM)
time_of_day_hours = 15
# Use True/False or 1/0 to indicate if it's a weekend
is_weekend = 1

# Convert the raw inputs into a 2D tensor for the model
raw_input_tensor = torch.tensor([[distance_miles, time_of_day_hours, is_weekend]], dtype=torch.float32)
helper_utils.prediction(model, data_df, raw_input_tensor, rush_hour_feature)
Enter fullscreen mode Exit fullscreen mode

Output:

+------------------------------------------+-----------------------+
|                         Model Prediction                         |
+------------------------------------------+-----------------------+
| Time of the Week                         | Weekend               |
| Distance                                 | 10.0 miles            |
| Time                                     | 15:00                 |
| Is this considered a rush hour period?   | No                    |
+------------------------------------------+-----------------------+
| Estimated Delivery Time                  | 27.50 minutes         |
+------------------------------------------+-----------------------+

Enter fullscreen mode Exit fullscreen mode

Note: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/helper_utils.py#L517 has the prediction() helper_utils function defined.

Resources to learn further

Top comments (0)