Python Dev Environment on Mac

Akriva Labs — Thu, 06 Nov 2025 12:40:54 +0000

Setup a python development environment on Mac with uv

This is my favorite way to setup an environment. Alternatively we could use conda (Anaconda or mini-conda) which has some advantages but uv approach is fast and thus preferred.

Install uv - UV is a fast python package and project manager, written in Rust. This is one of the best approaches to setting up virtual environments for each project. Other approaches are to use mini-conda or Anaconda which comes with several packages pre-installed. I personally use uv to setup the environment and use VSCode as editor for both python and notebook files.

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a project

uv init my_project
cd my_project

Create a virtual environment

uv python install 3.12
uv venv --python 3.12
source .venv/bin/activate

For IDE one can use VS Code or PyCharm. Both offer good support for running and debugging python code. VS Code also supports running notebooks locally but for most practical purposes we will end up running the notebooks in Google Collab or Kaggle or similar such Jupyter environment with GPU/TPU infra to run ML code.
Select the python virtual environment created as the interpreter in the IDE.

Install local LLM using Ollama

Install ollama from https://ollama.com/download
ollama run mistral - will download (if not present) and run mistral
ollama pull mistral - will download but not run the model
/exit - to exit chat
ollama list — View downloaded models.
ollama rm <model_name> — Delete a model to save space.
We can also use one of ollama cloud models so the inference is run on remote server and does not run locally.

Setting Up and Managing a Python Project with Poetry and Jupyter

Ensure you have Python installed (version 3.9 or later).
Install Poetry by running: curl -sSL https://install.python-poetry.org | python3 - Navigate to your project directory: cd path/to/dspy-dev Initialize a new Poetry project:

poetry init
poetry add dspy

Install the dependencies and create a virtual environment:

poetry install
poetry add jupyter ipykernel

Create a new Jupyter kernel:

poetry run python -m ipykernel install --user --name dspy-dev --display-name "Python (dspy-dev)"

poetry run jupyter notebook

Introduction to PyTorch

Akriva Labs — Mon, 03 Nov 2025 14:14:52 +0000

Early frameworks required defining the entire model structure upfront and couldn't use normal Python control flow. PyTorch addressed this issue.

A single neuron is a linear equation with weight (W) and bias (b).

Even with multiple inputs a single neuron will still be a linear equation with weights corresponding to each parameter and a bias value.

Higher level API than TensorFlow and JAX.
Includes layers and optimizers like Keras APIs
PyTorch tensors are assignable.
A parameter can only be created using torch.Tensor value. no numpy arrays allowed.

import torch
torch.ones(size=(2, 1))
torch.zeros(size=(2, 1))
torch.tensor([1, 2, 3], dtype=torch.float32)

torch.normal(mean=torch.zeros(size=(3,1)),
    std=torch.ones(size=(3,1)))

x = torch.zeros(size=(2, 1))
x[0,0] = 1

x = torch.zeros(size=(2, 1))
p = torch.nn.parameter.Parameter(data=x)  #1

a = torch.ones((2, 2))
b = torch.square(a)  #1
c = torch.sqrt(a)  #2
d = b + c  #3
e = torch.matmul(a, b)  #4
f = torch.cat((a, b), dim=0)  #5

def dense(inputs, W, b):
    return torch.nn.relu(torch.matmul(inputs, W) + b)

input_var = torch.tensor(3.0, requires_grad=True)  #1
result = torch.square(input_var)
result.backward()  #2
gradient = input_var.grad  #2

The general idea is to define a subclass of torch.nn.Module, which will:
- Hold some Parameters, to store state variables. Those are defined in the __init__() method.

- Implement the forward pass computation in the forward() method.
Benefits:
- Debugging is easier as PyTorch code runs eagerly and does not require compilation (which can be optionally performed). In TensorFlow or JAX compilation is required at some point.
- Hugging Face has first-class support for PyTorch so any model you would like to use from Hugging Face is likely available in PyTorch.
- PyTorch is much slower than JAX and for larger models it could be 3-5 times slower compared to if the model were implemented in JAX.

Implementing linear classifier in PyTorch:

input_dim = 2
output_dim = 1

class LinearModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.W = torch.nn.Parameter(torch.rand(input_dim, output_dim))
        self.b = torch.nn.Parameter(torch.zeros(output_dim))

    def forward(self, inputs):
        return torch.matmul(inputs, self.W) + self.b

model = LinearModel()
torch_inputs = torch.tensor(inputs)
output = model(torch_inputs)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

def training_step(inputs, targets):
    predictions = model(inputs)
    loss = mean_squared_error(targets, predictions)
    loss.backward()
    optimizer.step()
    model.zero_grad()
    return loss 

compiled_model = torch.compile(model)

Gather raw data - Get raw dataset for ingestion
Data prep - Clean the dataset to fix any errors, missing values etc, Transforming the data to different formats, engineering new features (like converting an address to a distance) etc.
Modeling
Training the model with training set
Evaluation of model using test set
Deployment

import torch                 # core functionality of pytorch
import torch.nn as nn.       # components for building neural networks
import torch.optim as optim  # tools fr training those networks
distances = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
times = torch.tensor([[6.96], [12.11], [16.77], [22.21]], dtype=torch.float32)
# Define the model
model = nn.Sequential(nn.Linear(1, 1))
# Define the loss function and the optimizer
loss_function = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

Activation Function

Adding activation function to the output of a neuron converts the linear equation to a non-linear one. This enables the neurons to learn patterns which are non-linear.

E.g. nn.ReLU()

model = nn.Sequential(nn.Linear(1, 1), nn.ReLU())
Adding a ReLU layer in the model as shown above.

First linear layer takes 1 input and gives 5 outputs
ReLU layer just does the transform and the last linear layer then taken the 5 inputs and produces the final 1 output.

Code: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/C1M1_Assignment.ipynb

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

# Load the dataset from the CSV file
file_path = './data_with_features.csv'
data_df = pd.read_csv(file_path)

def rush_hour_feature(hours_tensor, weekends_tensor):
    """
    Engineers a new binary feature indicating if a delivery is in a weekday rush hour.

    Args:
        hours_tensor (torch.Tensor): A tensor of delivery times of day.
        weekends_tensor (torch.Tensor): A tensor indicating if a delivery is on a weekend.

    Returns:
        torch.Tensor: A tensor of 0s and 1s indicating weekday rush hour.
    """

    ### START CODE HERE ###

    # Define rush hour and weekday conditions
    is_morning_rush = (hours_tensor >= 8.0) & (hours_tensor < 10.0)
    is_evening_rush = (hours_tensor >= 16.0) & (hours_tensor < 19.0)
    is_weekday = weekends_tensor == 0

    # Combine the conditions to create the final rush hour mask
    is_rush_hour_mask = is_weekday & (is_morning_rush | is_evening_rush)

    ### END CODE HERE ###

    # Convert the boolean mask to a float tensor to use as a numerical feature
    return is_rush_hour_mask.float()


def prepare_data(df):
    """
    Converts a pandas DataFrame into prepared PyTorch tensors for modeling.

    Args:
        df (pd.DataFrame): A pandas DataFrame containing the raw delivery data.

    Returns:
        prepared_features (torch.Tensor): The final 2D feature tensor for the model.
        prepared_targets (torch.Tensor): The final 2D target tensor.
        results_dict (dict): A dictionary of intermediate tensors for testing purposes.
    """

    # Extract the data from the DataFrame as a NumPy array
    # (There's no direct torch.from_dataframe(), so we use .values to get a NumPy array first)
    all_values = df.values

    ### START CODE HERE ###

    # Convert all the values from the DataFrame into a single PyTorch tensor
    full_tensor = torch.tensor(all_values, dtype=torch.float32)

    # Use tensor slicing to separate out each raw column
    raw_distances = full_tensor[:, 0]
    raw_hours = full_tensor[:, 1]
    raw_weekends = full_tensor[:, 2]
    raw_targets = full_tensor[:, 3]

    # Call your rush_hour_feature() function to engineer the new feature
    is_rush_hour_feature = rush_hour_feature(raw_hours, raw_weekends)

    # Use the .unsqueeze(1) method to reshape the four 1D feature tensors into 2D column vectors
    distances_col = raw_distances.unsqueeze(1)
    hours_col = raw_hours.unsqueeze(1)
    weekends_col = raw_weekends.unsqueeze(1)
    rush_hour_col = is_rush_hour_feature.unsqueeze(1)

    ### END CODE HERE ###

    # Normalize the continuous feature columns (distance and time)
    dist_mean, dist_std = distances_col.mean(), distances_col.std()
    hours_mean, hours_std = hours_col.mean(), hours_col.std()

    distances_norm = (distances_col - dist_mean) / dist_std
    hours_norm = (hours_col - hours_mean) / hours_std

    # Combine all prepared 2D features into a single tensor
    prepared_features = torch.cat([
        distances_norm,
        hours_norm,
        weekends_col,
        rush_hour_col
    ], dim=1) # dim=1 concatenates them column-wise, stacking features side by side

    # Prepare targets by ensuring they are the correct shape
    prepared_targets = raw_targets.unsqueeze(1)

    # Dictionary for Testing Purposes
    results_dict = {
        'full_tensor': full_tensor,
        'raw_distances': raw_distances,
        'raw_hours': raw_hours,
        'raw_weekends': raw_weekends,
        'raw_targets': raw_targets,
        'distances_col': distances_col,
        'hours_col': hours_col,
        'weekends_col': weekends_col,
        'rush_hour_col': rush_hour_col
    }


    return prepared_features, prepared_targets, results_dict

# Process the entire DataFrame to get the final feature and target tensors.
features, targets, _ = prepare_data(data_df)

def init_model():
    """
    Initializes the neural network model, optimizer, and loss function.

    Returns:
        model (nn.Sequential): The initialized PyTorch sequential model.
        optimizer (torch.optim.Optimizer): The initialized optimizer for training.
        loss_function: The initialized loss function.
    """

    # Set the random seed for reproducibility of results (DON'T MANIPULATE IT)
    torch.manual_seed(41)

    ### START CODE HERE ###

    # Define the model architecture using nn.Sequential
    model = nn.Sequential(
        # Input layer (Linear): 4 input features, 64 output features
        nn.Linear(4, 64),
        # First ReLU activation function
        nn.ReLU(),
        # Hidden layer (Linear): 64 inputs, 32 outputs
        nn.Linear(64, 32),
        # Second ReLU activation function
        nn.ReLU(),
        # Output layer (Linear): 32 inputs, 1 output (the prediction)
        nn.Linear(32, 1)
    ) 

    # Define the optimizer (Stochastic Gradient Descent)
    optimizer = optim.SGD(model.parameters(), lr=0.01)

    # Define the loss function (Mean Squared Error for regression)
    loss_function = nn.MSELoss()

    ### END CODE HERE ###

    return model, optimizer, loss_function

model, optimizer, loss_function = init_model()

def train_model(features, targets, epochs, verbose=True):
    """
    Trains the model using the provided data for a number of epochs.

    Args:
        features (torch.Tensor): The input features for training.
        targets (torch.Tensor): The target values for training.
        epochs (int): The number of training epochs.
        verbose (bool): If True, prints training progress. Defaults to True.

    Returns:
        model (nn.Sequential): The trained model.
        losses (list): A list of loss values recorded every 5000 epochs.
    """

    # Initialize a list to store the loss
    losses = []

    ### START CODE HERE ###

    # Initialize the model, optimizer, and loss function using `init_model`
    model, optimizer, loss_function = init_model()

    # Loop through the specified number of epochs
    for epoch in range(epochs):

        # Forward pass: Make predictions
        outputs = model(features)

        # Calculate the loss
        loss = loss_function(outputs, targets)

        # Zero the gradients
        optimizer.zero_grad()

        # Backward pass: Compute gradients
        loss.backward()

        # Update the model's parameters
        optimizer.step()

    ### END CODE HERE ### 

        # Every 5000 epochs, record the loss and print the progress
        if (epoch + 1) % 5000 == 0:
            losses.append(loss.item())
            if verbose:
                print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

    return model, losses

# Training loop
model, loss = train_model(features, targets, 30000)

# Disable gradient calculation for efficient predictions
with torch.no_grad():
    # Perform a forward pass to get model predictions
    predicted_outputs = model(features)

# Change the values below to get an estimate for a different delivery
# Set distance for the delivery in miles
distance_miles = 10 
# Set time of day in 24-hour format (e.g., 9.5 for 9:30 AM)
time_of_day_hours = 15
# Use True/False or 1/0 to indicate if it's a weekend
is_weekend = 1

# Convert the raw inputs into a 2D tensor for the model
raw_input_tensor = torch.tensor([[distance_miles, time_of_day_hours, is_weekend]], dtype=torch.float32)
helper_utils.prediction(model, data_df, raw_input_tensor, rush_hour_feature)

Output:

+------------------------------------------+-----------------------+
|                         Model Prediction                         |
+------------------------------------------+-----------------------+
| Time of the Week                         | Weekend               |
| Distance                                 | 10.0 miles            |
| Time                                     | 15:00                 |
| Is this considered a rush hour period?   | No                    |
+------------------------------------------+-----------------------+
| Estimated Delivery Time                  | 27.50 minutes         |
+------------------------------------------+-----------------------+

Note: https://github.com/akrivalabs/pytorch-course/blob/main/module-1/helper_utils.py#L517 has the prediction() helper_utils function defined.