DEV Community: ThatMLGuy

Intro to Pytest

ThatMLGuy — Sun, 23 Nov 2025 12:40:09 +0000

The pytest framework makes it easy to write small, readable tests, and can scale to support complex functional testing for applications and libraries.

To install pytest, run:

pip install pytest

In this post, we’ll create a simple division function and then write tests that validate its behavior.

Step 1: Create Your Code File

Let’s create a file called methods.py with a simple method that divides two integer numbers and returns a floating number:

# methods.py
def division(a: int, b: int) -> float:
    return a / b

Step 2: Create Your Test File

Next, create a file named tests.py, which will contain all the tests.

We’ll use @pytest.mark.parametrize to run the test function multiple times with different inputs.

# tests.py
import pytest
from methods import division

@pytest.mark.parametrize(
    "a,b,expected",
    [
        (10, 20, 0.5),                  
        (20, 0, ZeroDivisionError),
        ("10", "hello", TypeError),
    ],
)
def test_division(a, b, expected):
    # If expected is an exception type, assert that the error is raised
    if isinstance(expected, type) and issubclass(expected, Exception):
        with pytest.raises(expected):
            division(a, b)
    else:
        # Otherwise, assert the result matches
        assert division(a, b) == expected

Our Test Cases:

a	b	expected result
10	20	0.5
20	0	ZeroDivisionError raised
"10"	"hello"	TypeError raised

Step 3: Run Your Tests

From your terminal, run:

pytest tests.py

Pytest will automatically:

execute all the test methods in the file
display which tests passed or failed
show helpful tracebacks when something goes wrong

Output:

collected 3 items                                                                                          

tests.py ...                                                                                         [100%]

============================================ 3 passed in 0.02s =============================================

All the test cases have passes as we are aware what errors would be raised.

I know this is a short post, but I felt it was cool and just wanted to share it with you all. If you know any other tools or frameworks worth checking out, feel free to share them in the comments.

If you want to explore more, check out the official documentation at PyTest Docs

Introduction to Pytorch

ThatMLGuy — Fri, 17 Oct 2025 11:41:02 +0000

What is Pytorch?

PyTorch is an open-source machine learning framework that lets you build models for applications such as computer vision and natural language processing

How to install Pytorch

Pytorch can be installed using pip via the command pip install torch or if you would like torch to utilize your GPU, you can follow the installation guide from Pytorch over here. To discover what CUDA version your GPU is utilizing, you can open the terminal (on windows) and use the command nvidia-smi and look at the version and choose that version at the Pytorch installation guide.

Tensors

The fundamental data type in pytorch is a tensor, it can be considered as a data type similar to numpy arrays. A list of values can be converted into a tensor by using torch.tensor(list_of_values), as shown below

list1 = [1,2,3]
list2 = [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]


tensor1  = torch.tensor(list1)
print(tensor1)
"""
Output:

tensor([1, 2, 3])
"""
tensor2 = torch.tensor(list2)
print(tensor2)
"""
Output:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
"""

You can perform mathematical operations on tensors as you do for numpy for example

tensor1 * tensor2

"""
tensor([[ 1,  4,  9],
        [ 4, 10, 18],
        [ 7, 16, 27]])
"""

tensor1 + tensor2

"""
tensor([[ 2,  4,  6],
        [ 5,  7,  9],
        [ 8, 10, 12]])
"""

Dataset and DataLoader

Dataset allows you to establish how your data is loaded via __len__ and __getItem__. It prevents your entire data from being loaded, by loading them in samples.

DataLoader groups single samples into batches for efficient computation on the GPU, as well as it supports parallel data loading, shuffling and sampling.

To utilise the Dataset and Dataloader we import these functions from torch.utils.data. Here is an example of creating a custom Dataset and DataLoader.

class DATA:
    def __init__(self,x,y):
        self.x = torch.tensor(x)
        self.y = torch.tensor(y)

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        return self.x[index], self.y[index]


x = [[1,2,3],
     [4,5,6],
     [7,8,9]]

Y = [1,2,3]


dataset = DATA(x,Y)

loader = DataLoader(dataset, batch_size=2, shuffle=True)

for batch_inputs, batch_targets in loader:
    print(f"Inputs: {batch_inputs}, Targets: {batch_targets}")

"""
Output

Inputs: tensor([[7, 8, 9],
        [1, 2, 3]]), Targets: tensor([3, 1])
Inputs: tensor([[4, 5, 6]]), Targets: tensor([2])
"""

Neural Networks

Neural networks in PyTorch are built using the torch.nn module. Each model inherits from nn.Module and defines two main parts:

Layers: defined in __init__()
Forward pass: defined in forward()
Backward pass: define in backward()


class ANN(nn.Module):
    def __init__(self):
        super().__init__()  # Uses the base Neural Network constructor from nn.Module
        self.layer1 = nn.Linear(3, 5)  # Creates the first layer: input of size 3, output of size 5
        self.layer2 = nn.Linear(5, 1)  # Creates the second layer: input of size 5, output of size 1
        self.optimizer = torch.optim.SGD(self.parameters(), lr=0.01)  # Defines the optimizer (Stochastic Gradient Descent) with learning rate 0.01
        self.loss_fn = nn.BCELoss()  # Defines the loss function (Binary Cross Entropy Loss) for binary classification

    def forward(self, x):
        # Defines how data flows through the network:
        x = self.layer1(x)  # Pass input through first layer (3 → 5)
        x = torch.relu(x)   # Apply ReLU activation to introduce non-linearity
        x = self.layer2(x)  # Pass through second layer (5 → 1)
        x = torch.sigmoid(x)  # Apply Sigmoid activation to output probabilities between 0 and 1
        return x  # Return final prediction

    def backward(self, loss):
        # Defines the backward propagation step:
        self.optimizer.zero_grad()  # Clears previously stored gradients
        loss.backward()  # Computes gradients using backpropagation
        self.optimizer.step()  # Updates model parameters (weights and biases)

Code implementation


import torch
import torch.nn as nn
from torch.utils.data import DataLoader

class DATA:
    def __init__(self,x,y):
        self.x = torch.tensor(x)
        self.y = torch.tensor(y)

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        return self.x[index], self.y[index]

# Define your ANN class
class ANN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(3, 5)
        self.layer2 = nn.Linear(5, 1)
        self.optimizer = torch.optim.SGD(self.parameters(), lr=0.01)
        self.loss_fn = nn.BCELoss()

    def forward(self, x):
        x = self.layer1(x)
        x = torch.relu(x)
        x = self.layer2(x)
        x = torch.sigmoid(x)
        return x

    def backward(self, loss):
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()



torch.manual_seed(42)

X = torch.randn(100, 3)

y = (X.sum(dim=1) > 0).float().unsqueeze(1)

dataset = DATA(X,y)

loader = DataLoader(dataset, batch_size=2, shuffle=True)

model = ANN()
epochs = 50

for epoch in range(epochs):
    total_loss = 0
    correct = 0
    total = 0

    for batch_X, batch_y in loader:
        y_pred = model(batch_X)
        loss = model.loss_fn(y_pred, batch_y)
        model.backward(loss)
        total_loss += loss.item()

        predicted = (y_pred >= 0.5).float()
        correct += (predicted == batch_y).sum().item()
        total += batch_y.size(0)

    accuracy = correct / total

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {total_loss/len(loader):.4f}, Accuracy: {accuracy:.4f}")

ML Learning #3: KNN - You are who you surround yourself with

ThatMLGuy — Tue, 14 Oct 2025 14:46:34 +0000

What is KNN?

K-Nearest Neighbors (KNN) is a non-parametric classification algorithm. Instead of learning explicit model parameters, it classifies a new sample by looking at the majority label among its K closest points in the training dataset, based on a chosen distance metric.

How do determine the value of K?

Choosing a value of K can be a bit tricky as if the value is too small the model overfits on the training data and if it's too large, the model underfits on the training data. There are various techniques that can be used to determine the value of K such as the Elbow method where you plot the K value vs Performance Metric for a range of values of K, and then choose the value of K with the best performance metric.

How does it work?

1.1 Loading Your Data

KNN is a computationally expensive algorithm because it needs to store and process the entire training dataset to make predictions. The larger your dataset, the more time and memory it requires.

1.2 Finding Distances of Data Point to Each Training Data

Now in this step, we calculate the distances of your new data point with each existing data point in your dataset. There are multiple ways to calculate the distances between two vectors. These are

Euclidean Distance
Manhattan Distance
Cosine Similarity

The most commonly used distance is Euclidean Distance, it’s provided by the formula

E u c l i d e an D i s t an ce (A, B) = (A_{y} - B_{y})^{2} + (A_{x} - B_{x})^{2}

Where: $A$ and $B$ are two data points and $x$ and $y$ are x and y coordinates of the respective data points.

or a more generalized form

E u c l i d e an D i s t an ce (A, B) = i = 1 \sum n (A_{i} - B_{i})^{2}

Note: Depending on your problem, you can also use other distance metrics such as Manhattan Distance or Cosine Similarity.

1.3 Finding the K Nearest Neighbours

We sort the distances in increasing order and we take the first K data points, as they are the K nearest data points.

1.4 Finding the most frequent class

Finally, we look at the class labels of these K nearest neighbors, count how many times each class appears, and assign the most frequent one to the new data point.

Code Implementation


import numpy as np
from collections import Counter

class KNN:
    def __init__(self, K):
        self.k = K
        self.x = self.y = None

    def fit(self, x, y):
        self.x = np.array(x)
        self.y = np.array(y)

    def predict(self, x):
        x = np.array(x)
        self.distances = []
        len_x = len(self.x)
        for i in range(len_x):
            distance = np.sqrt(np.sum((self.x[i] - x) ** 2))
            self.distances.append([distance, self.y[i]])

        self.distances.sort(key=lambda d: d[0])
        nearest_k = self.distances[:self.k]
        labels = [label for _, label in nearest_k]
        label_counts = Counter(labels)

        return label_counts.most_common(1)[0][0]

ML Learning #2: Logistic Regression

ThatMLGuy — Mon, 13 Oct 2025 15:34:36 +0000

What is Logistic Regression

Logistic Regression is a regression model used for classification applications. "How?" you may ask, well logistic regression is based on the logit function or sigmoid function; this function takes any value from $- \infty$ to $+ \infty$ and maps them to a value between 0 and 1. So assume you have two classes, a positive class and a negative class, your logistic regression model will predict the probability of your data belonging to the positive class (ie, 1). So in a sense, a logistic regression model predicts the probability of your data belonging to the positive class.

How does it work

1.1 Sigmoid Function

As mentioned, the model is based on the sigmoid function. This function can be represented by the following formula:

g (X) = \frac{1}{1 + e ^{- θ^{T} X ˙}}

Where

X \in (- \infty, + \infty)

and

g (X) \in [0, 1]

Now after the model predicts the probability, we will need to determine if the data belongs to the positive class or not, to do that, we compare the output g(X) with a threshold value as shown below

Note:

In general, the threshold is set to 0.5, meaning that if the model predicts a probability greater than or equal to 0.5, the output is classified as 1 (positive class), otherwise 0 (negative class).

However, the optimal threshold depends on the problem you’re trying to solve. For instance, in a spam detection system, it’s often better to let a few spam emails slip through than to incorrectly mark an important email as spam. In such cases, you might set a higher threshold (e.g., 0.8) to be more confident before labeling an email as spam.

This trade-off between precision and recall is a key aspect of tuning classification models.

1.2 Loss Function

As mentioned in the first article, all models use a loss function to determine how well they are performing, so they get penalized more with incorrect predictions. In a classification problem, there are two loss functions used, binary crossentropy (Two class classification), and categorical crossentropy (multi-class classification). We'll be using binary crossentropy as our model is used for a two class classification problem. Binary Crossentropy can be represented by the formula

BinaryCrossEntropy = - \frac{1}{N} i = 1 \sum N [y_{i} lo g (y^) + (1 - y_{i}) lo g (1 - y^)]

If you look at the formula, you will notice that the BinaryCrossEntropy (BSE in short) will be high for incorrect values, and lower for correct values.

1.3 Gradient Decent

θ = θ - L e a r nin g R a t e \cdot (g (X_{i}) - y_{i}) X_{i}

Where $y_{i}$ is the actual class label, $g (X)$ is the sigmoid function and $θ$ is the weights for the respective features.

This process occurs iteratively for each epoch, and the aim is to modify the weights (\theta) such that the Binary CrossEntropy is minimized.

Implementation

import numpy as np

class LogisticRegression:
    def __init__(self):
        self.theta = None
        self.LR = 0.001

    def sigmoid(self, z_input):
        z_input = np.clip(z_input, -300, 300)
        return 1 / (1 + np.exp(-z_input))

    def fit(self, x, y, epochs):
        x = np.array(x)
        y = np.array(y)
        self.theta = np.zeros(x.shape[1])

        for epoch in range(epochs):
            tp = tn = fp = fn = 0
            for i in range(len(x)):
                z = self.sigmoid(np.dot(self.theta, x[i]))
                gradient = (z - y[i]) * x[i]
                self.theta -= self.LR * gradient

    def predict(self, x):
        x = np.array(x, dtype=np.float64)
        probs = self.sigmoid(np.dot(x, self.theta))
        return 1 if probs >= 0.5 else 0

ML Learning #1 : Linear Regression

ThatMLGuy — Sat, 11 Oct 2025 18:47:09 +0000

What is Linear Regression

Linear Regression is a fundamental statistical machine learning algorithm that models the linear relationship between a dependent variable ( $y$ ) and one or more independent variables ( $x$ ). The goal is to fit a straight line (or a hyperplane in multiple dimensions) that minimizes the overall prediction error on the training data.

Note 1.1: Linearly related features exhibit a correlation where a change in one variable results in a proportional change in the other (e.g., as $X$ increases, $Y$ also tends to increase, or vice versa). Linear Regression works best when this relationship is approximately linear.

Note 1.2: Regression is the process of predicting a continuous or real value (e.g., 265.34, 10.231).

How does it work

Linear Regression models the relationship by defining a linear function, often called the hypothesis $y^$ , which calculates the predicted value.
For Multiple Linear Regression (more than one feature), this line is represented as:

y^= θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + \dots + θ_{n} x_{n}

$y^$ is the predicted value (the model’s output).
$θ_{0}$ is the y-intercept (the bias term).
$θ_{i}$ are the coefficients or weights for each feature $x_{i}$ .

The model’s task is to find the optimal set of weights ( $θ$ ) that best fit the data.

How to Measure the Model Performance?

The performance of a regression model is measured using a cost function (or loss function), which quantifies the “error” or “cost” for the model’s predictions. The most commonly used for Linear Regression is the Mean Squared Error (MSE).

Mean Squared Error (MSE) is calculated by averaging the squared differences between the predicted values and the actual values:

MSE = \frac{1}{n} i = 1 \sum n (y_{i}^- y_{i})^{2}

Where $y_{i}^$ is the predicted value and $y_{i}$ is the actual value.

MSE is popular because the squaring operation penalizes larger errors more heavily, making the model sensitive to outliers.

How does the Model Learn?

The model learns by iteratively adjusting its weights ( $θ$ ) to minimize the cost function using the Gradient Descent optimization algorithm.

Gradient Descent works by calculating the gradient (the slope) of the cost function with respect to each weight. This gradient indicates the direction of the steepest increase in error. The weights are then updated by moving in the opposite direction of the gradient. The weight update rules for Linear Regression using MSE are:

For each feature weight ( $θ_{i}$ , where $i = 1, 2, \dots, n$ ):

θ_{i} = θ_{i} - α \cdot \frac{2}{n} j = 1 \sum n (y_{j}^- y_{j}) x_{j, i}

For the intercept ( $θ_{0}$ ):

θ_{0} = θ_{0} - α \cdot \frac{2}{n} j = 1 \sum n (y_{j}^- y_{j})

$α$ (alpha) is the learning rate, a hyperparameter that controls the step size during each iteration.
$j = 1 to n$ is the data index.

This process is repeated over many iterations, called epochs, allowing the model to gradually converge on the optimal weights.

The code to implement Linear Regression from scratch is provided below.

import numpy as np

class linear_regression:
    def __init__(self):
        self.weights = []
        self.bias = 0.0
        self.learning_rate = 0.001

    def fit(self, x, y, epochs):
        data_size = len(x)
        number_of_features = len(x[0])
        x = np.array(x)
        y = np.array(y)
        self.weights = np.zeros(number_of_features)

        for epoch in range(epochs):
            derivatives = [0.0] * number_of_features
            bias_derivative = 0.0

            for pos in range(data_size):
                prediction = sum([self.weights[i] * x[pos][i] for i in range(number_of_features)]) + self.bias

                for i in range(number_of_features):
                    derivatives[i] += (2 / data_size) * (prediction - y[pos]) * x[pos][i]

                bias_derivative += (2 / data_size) * (prediction - y[pos])

            for i in range(number_of_features):
                self.weights[i] -= self.learning_rate * derivatives[i]

            self.bias -= self.learning_rate * bias_derivative

            # Safety check for numerical stability
            if any([np.isnan(w) or np.isinf(w) or abs(w) > 1e10 for w in self.weights]):
                return

    def predict(self, x):
        return sum([self.weights[i] * x[i] for i in range(len(self.weights))]) + self.bias