DEV Community: AlvBarros

Brief Introduction to CNNs

AlvBarros — Wed, 21 Jan 2026 13:32:43 +0000

This article is heavily based on An Introduction to Convolutional Neural Networks by Keiron O'Shea and Ryan Nash.

Let's begin by discussing some background knowledge.

Source: Application of Artificial Intelligence in Lung Cancer. A Venn diagram of fields inside Artificial Intelligence.

Deep Learning

As the diagram shows, deep learning is a subset of machine learning as a whole, and its key distinction to other types of machine learning is the automatic feature extraction and data dependency.

Feature engineering: Automatically learns features from raw data.
Data: Needs way more data than traditional ML.
Performance: Slower to train and requires significant computing power (GPUs), but achieves higher accuracy in complex tasks.
Interpretability: Often considered a "black box" due to its complexity and automatic feature extraction.

Neural Networks

Artificial Neural Networks are a high number of interconnected computational nodes (referred to as neurons), of which work entwine in a distributed fashion to collectively learn from the input in order to optimize its final output.

Source: An Introduction to Convolutional Neural Networks. A three-layered feedforward neural network, comprised of a input layer, a hidden layer and an output layer.

Convolutional Neural Networks

The main notable difference between CNNs and traditional ANNs is that CNNs are primarily used in the field of pattern recognition within images.

The layers within the CNN are comprised of neurons organized into three dimensions: the spatial dimesionality of the input (height and width) and the depth. This "depth" does not refer to the total number of layers within the ANN, but the third dimension of an activation volume.

Overall architecture

CNNs are comprised of three types of layers: convolutional layers, pooling layers and fully-connected layers.

Take for example this simplified CNN architecture for MNIST classification.

Source: An Introduction to Convolutional Neural Networks

The input layer will hold the pixel values of the image. These can be RGB colors.
The convolutional layer will determine the output of neurons of which are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume. The rectified linear unit (commonly shortened to ReLu) aims to apply an "elementwise" activation function such as sigmoid to the output of the activation produced by the previous layer.
The pooling layer will then simply perform downsampling along the spatial dimensionality of the given input, reducing the image size and in turn the amount of parameters.
The fully connected layers will then perform the same duties found in standard neural networks and attempt to produce class scores from the activations. It is also suggested that ReLu may be used between these layers as to improve performance.

Convolutional layer

Convolution is a mathematical operation that measures how one function overlaps another across space or time.

In neural networks, it means sliding a small filter over the data to systematically detect local patterns (such as edges, textures, or shapes).

Here's a great video by 3Blue1Brown that goes more in-depth and offers some visualization.

Notice how a matrix goes through every pixel of the original image.
The convolution in the animation is just averaging out every neighboring pixel, so it results in a "blur" effect. Consider that every pixel is a matrix of red, green and blue values (RGB) from 0 to 255. So, for example, a completely red pixel would be [255, 0, 0]. The kernel is a matrix of 3x3 filled with 1/9, so that you average the pixel's neighbors. I heavily encourage you to watch the video so it makes more sense.

Kernel

The "filter" or the matrix of 1/9's in the example above is called a kernel. They are usually small in spatial dimensionality, but spread along the entire image.

When the data hits a convolutional layer, the layer convolves each filter across the spatial dimensionality of the input to produce a 2D activation map.

Source: An Introduction to Convolutional Neural Networks. A visual representation of a convolutional layer. The center element of the kernel is placed over the input vector (image), of which is then calculated and replaced with a weighted sum of itself and any nearby pixels.

Training neural networks on inputs such as images results in models of which are too big to train effectively. Consider an image of 800 height and 600 width. This would mean 800 x 600 = 480.000 pixels. Bear in mind pixels are RGB matrixes as we explained earlier, so it would mean 480.000 x 3 = 1.440.000. So just for this image, the number of weights on a single neuron would be almost 1.5 million, and these networks usually way more than one neuron.

To mitigate this, the convolutional layer must be connected to small regions of the input, referred to as the receptive field size of the neuron. To visualize this, if the input of the network is an image of size 64 x 64 x 3 and we set the receptive field size as 6 x 6, we would have a total of 108 weights on each neuron in the layer. To put this into perspective, a standard neuron seen in other forms of neural networks would contain 12.288 weights each.

Optimization

Convolutional layers are also able to significantly reduce the complexity of the model through three hyperparameters: depth, stride and zero-padding.

Depth: The depth produced can be manually set through the number of neurons within the layer. This can be seen with other forms of neural networks, where all of the neurons in the hidden layer are connected to every single neuron beforehand. Reducing this can significantly minimize the total number of neurons, but it can also significantly reduce the pattern recognition capabilities.
Stride: You can think of this as the amount of "steps" we take when applying the convolution kernel. A stride of one would mean that every pixel would be put through the convolution. A stride of two, would mean that 1 in every 2 pixels would be put, so one would be skipped.
Zero-padding: It's the simple process of padding the border of the input, and is an effective method to give further control as to the dimensionality of the output volumes.

With these, we can calculate the size of the 2D output of the convolutional layer.

Source: An Introduction to Convolutional Neural Networks. V represents the input volume size (height x width x depth), R represents the receptive field size, Z is the amount of zero padding set and S referring to the stride. If the calculated result is not an integer, then the stride has been incorrectly set.

Parameter sharing, the idea that the kernel is the same for the entire image, works on the assumption that if one region feature is useful to compute at a set spatial region, then it is likely to be useful in another region.

Pooling layer

The objective is to gradually reduce the dimensionality of the representation.

It operates over each activation map in the input, and scales its dimensionality using the "MAX" function. In most CNNs, these come in the form of max-pooling layers with kernels of a dimensionality of 2 x 2 applied with a stride of 2 along the spatial dimensions of the input. This scales the activation map down to 25% of the original size - whilst maintaining the depth volume to its standard size.

Due to its destructive nature, there are only two generally observed methods of max-pooling. The one mentioned previously, which allows the layer to extend through the entirety of the image, or overlapping pooling where the stride is set to 2 with a kernel size set to 3. Having a kernel size of the pooling layer above 3 will usually greatly decrease the performance of the model.

It's important to note that beyond max-pooling, some CNNs may contain general pooling. These layers are comprised of pooling neurons that are able to perform a multitude of common operations including L1/L2-normalisation, and average pooling.

Fully-connected layer

This layer contains neurons of which are directly connected to the neurons in the two adjacent layers, without being connected to any layers within them. This is analogous to the way that neurons are arranged in traditional forms of neural networks.

Recipes

Despite the relatively small number of layers required for a CNN, there is no set way of formulating a CNN architecture. That being said, they follow a common architecture.

This common architecture is of convolutional layers stacked, followed by pooling layers in a repeated manner before feeding forward to fully-connected layers, as show in the Overall architecture section.

Another way is to stack two convolutional layers before each pooling layer, as illustrated below. This is strongly recommended as stacking multiple convolutional layers allows for more complex features of the input vector to be selected.

Source: An Introduction to Convolutional Neural Networks. A common form of CNN architecture in which convolutional layers are stacked between ReLus continuousl before being passed through the pooling layer, before going between one or many fully connected ReLus.

Also, it is advised to split up large convolutional layers into smaller ones in order to reduce the amount of computational complexity within a given layer.

For example, imagine you were to stack three layers on top of each other with a receptive field of 3 x 3. Each neuron on the first layer would have a 3 x 3 view of the input vector. The second layer neuron, however, is acting on the output of the first layer. So, even though the kernel size can also be of 3 x 3, effectively the second neuron is also depending on the first layer. Removing the overlapping pixels and assuming a stride of 1, We have a 5 x 5 dimensionality (since one column overlaps). If you do it again for a third layer, effectively the receptive field is now of 7 x 7, and so on.

The input layer should be recursively divisible by two. Common numbers include 32 x 32, 64 x 64, 96 x 96, 128 x 128 and 224 x 224.

With small filters, set stride to one and make use of zero-padding as to ensure that the convolutional layers do not reconfigure any of the dimensionality of the input. The amount of zero-padding to be used should be calculated by taking one away from the receptive field size and dividing by two.

CNNs can be horrendously resource-heavy. An example of this problem could be in filtering a large image (anything over 128 x 128 could be considered large), so if the input is 227 x 227 (as seen with ImageNet) and we're filtering with 64 kernels each with a zero padding, then the result will bee three activation vectors of size 227 x 227 x 64 - which calculates to roughly 10 million activations - or an enormous 70 megabytes of memory per image.

In this case there are two options.

First, you can reduce the spatial dimensionality of the input images by resizing the raw imags to something a little less heavy.

Alternatively, you can go against everything we stated earlier and opt for larger filter sizes with a larger stride (2, as opposed to 1).

Conclusion

CNNs differ to other forms of neural networks in that instead of focusing on the entirety of the problem domain, knowledge about the specific type of input is exploited. This in turn allows for a much simpler network architecture to be set up.

Gradient Descent from Scratch

AlvBarros — Sat, 09 Aug 2025 20:12:27 +0000

In your quest to learn machine learning, this is probably the first and simplest prediction model you will learn. Each one of these words have a meaning! Let's break it down:

Linear Regression

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observer data.

So, if you have two variables and they have a relationship you can use this to create a prediction model.

The classic example is Housing Prices. The bigger the house is, the pricier it gets. So one variable could be Area Size and the other Price. We can use Linear Regression to predict the price of a house based on its size!

Disclaimer
Of course, houses have way more variables than that.
Things like the number of bedrooms, number of bathrooms, the neighborhood and city, year of construction, and many other parameters influence the price.
This is not the best model for this prediction - it is just the simplest!

When we say linear, what we mean is that it is going to fit in a linear equation. On one axis you have the house area, and on the other the price!

Imagine that, by the end, we have a mathematical function that you just have to provide the house area and we get the price. Something like this:

Where:

Y is the house price
X is the house area
a is the slope of the line
b is the intercept (value of Y when X=0)

In code it will look like this:

def predict_house_price(area):
    price = area * a + b
    return price

price = predict_house_price(area)

Seems easy!

Calm down. We have to understand another thing: cost functions!!

Let's begin with an easy example:

Consider that a house with area 50m² is priced at US$ 100.000. we wish!
Now consider that another house with area 75m² is priced at US$ 125.000.
And last, another one that has area 100m² is priced at US$ 150.000.

This is easy: the function would be Price = Area * 1000 + 50.000. This is the graph:

But data in the real world isn't as easy, and house prices are not influenced only by its area. For example, houses with bigger areas could mean more bedrooms and more bathrooms, or it could mean a pool! These details are what complicates this prediction. Consider the following graph:

Now a simple straight line can't really match and predict correctly every single dot in the graph.

This is where we get more technical.

We can try and quantify the accuracy of this function by measuring the distance of every dot to the line. This is the cost function. The function for the line is called hypothesis.

Cost Function

It is the average difference of all the results of the hypothesis with inputs from X and the actual output Y.

This function will give us the average accuracy to every prediction we have with the straight line.

For this article, we'll implement a type of Mean Squared Error, but keep in mind there are other types of cost functions.

For every X we'll do the following:

Run the hypothesis > (Y = aX + b)
Get the difference > predictedY - actualY
Square it > difference*difference
Add it to all errors

At the end, we divide by the amount of data points (e.g. the number of houses/prices) in order to get the average. You can also do it by half of the average in cases where we're going to use gradient descent (more on this later).

In Python, this is what it would look like:

def cost_function(
    y,              # Target variable
    predictions,    # Model predictions
    m               # Number of training examples
):
    """
        Calculate the cost function for linear regression.
        J(theta) = (1/(2*m)) * sum(errors^2)
    """
    sumErrorsSquared = 0.0
    for i in range(m):
        sumErrorsSquared += (predictions[i] - y[i]) ** 2.0

    return (1.0 / (2.0 * m)) * sumErrorsSquared

We square the difference for two reasons:

Negative errors cancel out positive ones

Sometimes we predict the value Y as being higher than the real Y, and other times we predict Y as being lower. By squaring it, we make sure it does not matter if the difference is positive or negative.

Penalize bigger mistakes more

Small errors become huge errors, so the accuracy gets a bigger hit and small variations in accuracy matter more.

Okay, but why you're telling me about this?

Gradient Descent

Gradient descent is a method for unconstrained mathematical optimization. It is an iterative algorithm to minimize a function.

Can you see where we're going with this?

We're going to start out with a random arbitrary hypothesis, run our cost function, and then run gradient descent in order to minimize the value of this cost function. This will increase the accuracy of our hypothesis!

Gradient descent works by the following:

It begins by having arbitrary values
It calculates the loss (cost function)
Decides which way to go - in this case, descent means it wants to minimize the value
Take another in this direction, and the step size is the learning rate
Calculates again until it runs out of steps

If you (for some reason?) want the mathematical formula:

This basically means:

repeat until convergence means that we'll repeat the following steps until the values of the parameters stop changing significantly (convergence is reached).
θⱼ is the parameter being updated in every repetition
α is the learning rate. A small positive value that controls the size of the update step.
∂ is a partial derivative. It means we're taking the derivative of a function with respect to one variable, while keeping the others constant.
∂/∂θⱼ J(θ₀, θ₁) is the partial derivative of the cost function J(θ₀, θ₁) with respect to θⱼ. In other words, how much the cost would change if we nudged θⱼ slightly.

I find code easier to understand, so here it is:

def gradient_descent(
    X,                              # Input features
    y,                              # Target variable
    alpha,                          # Learning rate
    steps                           # Number of iterations
):
    """
    Perform gradient descent to find the best fitting line for linear regression.
    """

    theta0 = 0.0
    theta1 = 0.0
    m = len(X)

    for s in range(steps):
        theta0, theta1 = gradient_descent_step(X, y, m, (theta0, theta1), alpha)

    return theta0, theta1

And now for each step:

def gradient_descent_step(
        X,          # Input features
        y,          # Target variable
        m,          # Number of training examples
        thetas,     # Tuple of (theta0, theta1)
        alpha       # Learning rate
):
    """
        Perform a single step of gradient descent.
        theta0 = theta0 - alpha * (1/m) * sum(errors)
        theta1 = theta1 - alpha * (1/m) * sum(errors * X)
    """
    theta0, theta1 = thetas

    sumHypothesisMinusValue = 0
    sumHypothesisMinusValueTimesX = 0
    for i in range(m):
        hypothesis_value = hypothesis(theta0, theta1, X[i])
        error = hypothesis_value - y[i]
        sumHypothesisMinusValue += error
        sumHypothesisMinusValueTimesX += error * X[i]

    theta0 = theta0 - (alpha * 1 / m) * sumHypothesisMinusValue
    theta1 = theta1 - (alpha * 1 / m) * sumHypothesisMinusValueTimesX

    return theta0, theta1

Real world example

In this repository, I have a script that gets the data from a local real estate broker in my hometown (along with every code shared in this article!).

This results in a bunch of real-world data from houses in the market at the time of recording.

First, we begin by importing this data:

import pandas as pd
data = pd.read_csv(
X = data['area'].tolist()
y = data['price'].tolist()

When working with data, it is always good to run some data processing and cleaning.

First, we remove the outliers

def remove_outliers(X, y, threshold=3.0):
    def z_scores(values):
        mean = sum(values) / len(values)
        std = (sum((v - mean) ** 2 for v in values) / len(values)) ** 0.5
        return [(v - mean) / std for v in values], mean, std

    x_z, x_mean, x_std = z_scores(X)
    y_z, y_mean, y_std = z_scores(y)

    filtered = [
        (xi, yi)
        for xi, zi_x, zi_y in zip(X, x_z, y_z)
        for yi in [y[X.index(xi)]]
        if abs(zi_x) <= threshold and abs(zi_y) <= threshold
    ]

    if not filtered:
        raise ValueError("All data removed as outliers!")

    X_filtered, y_filtered = zip(*filtered)

    return X_filtered, y_filtered

Then, because the values are big (upwards of hundred of thousands) we can encounter some errors in python (especially since we're squaring some values).

To fix this, we scale the numbers down.

def minmax(X, y):
    x_min, x_max = min(X), max(X)
    y_min, y_max = min(y), max(y)

    X_scaled = [(xi - x_min) / (x_max - x_min) for xi in X]
    y_scaled = [(yi - y_min) / (y_max - y_min) for yi in y]

    return X_scaled, y_scaled, x_min, x_max, y_min, y_max

After the data is cleaned, we can run gradient descent.

theta0_scaled, theta1_scaled = gradient_descent(
    X_scaled, 
    y_scaled, 
    alpha=alpha, 
    steps=steps
)

The theta0 and theta1 mentioned in the code are the a and b we discussed previously.

We must scale them back to normal if we want to use these values in a prediction manner.

Scaling the thetas back:

def unscale_thetas(theta0_scaled, theta1_scaled, x_min, x_max, y_min, y_max):
    x_range = x_max - x_min
    y_range = y_max - y_min

    theta1_unscaled = (y_range / x_range) * theta1_scaled
    theta0_unscaled = y_min + y_range * (theta0_scaled - theta1_scaled * (x_min / x_range))
    return theta0_unscaled, theta1_unscaled

And that's it! 🎉🍾👏

This is an example of plot I got running my code:

And these are the thetas (a and b):

theta0: 104727.42003321546
theta1: 7699.98612392038

Now, if you want to implement the method predict_house_price we have in the beggining of this article:

def predict_house_price(area):
    theta0 = 104727.42003321546
    theta1 = 7699.98612392038

    price = theta0 + theta1 * area
    return price

Conclusion

Take a moment to look at my repository if you want to run this code in your own dataset.

Keep in mind, this process is super iterative and a different amount of steps and a bigger or smaller learning rate may provide better or worse results.

In the end, house prices cannot be predicted with only one variable, so this is more of a thought experiment than a real model for prediction.

Keep learning!

The Kth factor of N - an O(sqrt n) algorithm

AlvBarros — Tue, 17 Dec 2024 22:34:43 +0000

Introduction

Recently I wrote the post Learn Big O Notation once and for all. In that post I go over all of the types of Big O time notation that is available at the Big-O cheatsheet. And I did not think there would be any more time notations possible outside of those seven.

As if the universe itself was humbling me and mocking my ignorance, I encountered a LeetCode problem with a solution of O(√n) time. Which could be translated to O(N^1/2), if you're crazy.

The problem

You are given two positive integers n and k. A factor of an integer n is defined as an integer i where n % i == 0.

Consider a list of all factors of n sorted in ascending order, return the kth factor in this list or return -1 if n has less than k factors.

The obvious solution

Well, if you're like me your first thought was to go through every number from 1 to n, check if it is a factor, and if it is in the desired k index, return it.

The code looks like this:

def getkthFactorOfN(n, k):
    result = 0
    for i in range(1, n + 1):
        if n % i == 0:
            result = result + 1
            if result == k:
                return i
    return -1

This is all fine and dandy, but it is "only" O(n). After all, there is only one loop and it goes up to n + 1.
Every other operation is discarded when considering the time notation.

But, my friend, there's a catch.

Understanding factors

If you think about it, factors are "mirrored" after a certain point.

Take, for example, the number 81. Its factors are [1, 3, 9, 27], where:

1 * 81 = 81
3 * 27 = 81
9 * 9 = 81
27 * 3 = 81
81 * 1 = 81

If you don't count the number 9, The operations are simply repeated and flipped. If you divide n by one of its factors, you get another factor.
Expect the square root of n, where it is itself squared (duh).

Armed with this knowledge, we now know that we don't need to iterate through the loop up to n times (with range(1, n + 1)), but simply up to math.sqrt(n). After that, we've got every factor we need!

The not-so-obvious solution

Now that we have everything we need, we need to transform this loop from 1 -> n to 1 -> sqrt n.

I'll just throw the code here and we'll go over the lines one by one.

def getkthFactorOfN(n, k):
    i = 1
    factors_asc = []
    factors_desc = []
    while i * i <= n:
        if n % i == 0:
            factors_asc.append(i)
            if i != n // i:
                factors_desc.append(n // i)
        i += 1
    if k <= len(factors_asc):
        return factors_asc[k-1]
    k -= len(factors_asc)
    if k <= len(factors_desc):
        return factors_desc[-k]
    return -1

Oof, it's way more complex. Let's break it down:

First, we initialize i = 1. This variable will be used as the "number we're currently at" while searching for factors.

Second, we'll create two arrays: factors_asc and factors_desc. The magic here is that we are going to add factors to factors_asc - they're named like this because they'll be automatically in ascending order.
Whenever we add something to factors_asc, we'll divide n by it and add it to factors_desc. Similar logic here; they'll be conveniently added in descending order.

Then, we begin our loop. Here I've changed it to be while i * i <= n, since we stop when we hit the root of n.

We begin by checking if the current number is a factor (n % i == 0). If so, we can append it to our factors_asc array.

Next, we get the "reverse factor" of i. We can do this by checking if i != n // i, or in other words, if it is not the root. This is because the root must not be duplicated in both arrays. If it isn't, we get the reversed factor by running n // i and appending the result in factors_desc.

After that, we add 1 to i and continue our loop.

After the loop is done, we must have every factorial we need.

We begin by checking if k is in the first half including the root (which can be interpreted as the middle) with if k <= len(factors_asc). If so, get the index from this array (remember: arrays begin at zero!).

If not, we must subtract the amount of factors found from k and check again - with k -= len(factors_asc) and if k <= len(factors_desc).

If k is inside factors_desc, get its value with factors_desk[-k] (from last to first).

If all fails, return -1.

The curve

If you're wondering where in the curves graph it lands, it would be between O(n) and O(log n), being better than the former and worse than the latter. Here's a graph:

Available at Mathspace

Conclusion

This was a ride to uncover and research. Thank you so much for reading up to this point.

If you want to be more optimized, you can create factors_asc_len and factors_desc_len variables and add +1 every time you append a value to these arrays, so that the method len() doesn't have to be called, since this method is O(n) so it can impact time notation.

Good luck in your studies and until next time!

Learn Big O Notation once and for all

AlvBarros — Tue, 12 Nov 2024 23:01:12 +0000

Introduction

Recently I was doing a job interview to a position that I really wanted in a very cool company, and one of the steps was the dreaded code interview where we solve LeetCode problems live.

I got the solution, and when asked the big O function for my solution, I answered correctly, but I was very confused and probably stumbled my way into it by simply counting the loops.

In order to not fail anymore job interviews in the future, I'm revisiting this topic some years after first learning about it in college.

The main objective behind this post is to provide a quick summary and a refresher for me to read before a coding interview. While I learn by writing, it is also important to store this somewhere I can always revisit when I need. And hey, maybe it can work for you too.

Big thanks to NeetCode for providing so much material and teaching all of this stuff for free.

What is Big O time complexity?

In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows. [...] [It] characterizes functions according to their growth rates: different functions with the same asymptotic growth rate may be represented using the same O notation.

Source: Wikipedia

Or, in other words, it's a way to analyze the amount of time of our algorithm takes to run as the input grows. O is meant to be the whole operation, and n the input.

Let's look at some examples and it will make more sense.

O(n) - Sure, give me an example

Perhaps the easiest example to understand is of O(n), where the growth rate is linear.

Given an unsorted array n, write a function that will return the biggest value.

To solve this, we need to go through every item in the array n and store it whenever we find a value bigger than the previous found.

n = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] # Initialize the array of n
def find_max_value(arr):
    # Initialize the maximum value with the first element of the array
    max_value = arr[0]

    # Iterate through the array to find the maximum value
    for num in arr:
        if num > max_value:
            max_value = num

    return max_value
print(find_max_value(n)) # Output: 9

The previous algorithm will always run through every item inside n at least once - it has to, because the array is unsorted.

Because of this, we say this algorithm has time complexity of O(n), because as the array size (n) grows, the runtime grows in a linear fashion.

It also does not care about the non-constant attributes of your algorithm. Imagine that your algorithm iterates through every item in your n exactly twice, resulting in your time complexity of O(2n). We simplify it by saying it is O(n), because the priority of the Big O notation is to convey the shape of the growth in run time.

O(1) - First and only exception

After telling you that we shouldn't care about non-constant values in the notation, We have to discuss the O(1), where the n is not even present in this classification. It is perhaps the most desirable rate, where the time does not grow with the input, staying constant. For example:

Given a non-empty array, return the first element.

n = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] # Initialize the array of n
def first_element(arr):
    # Return the first element of the array
    return arr[0]
print(first_element(n)) # Output: 3

Because we don't actually iterate through any items in the array, the notation for this operation would be O(1).

Some other examples of this include appending items to an array, removing (pop), or when using Hash maps (or Dictionaries) where we simply lookup using an index - like the algorithm above.

O(n^2) - This seems easy enough

The simplest case for this notation is when you have nested loops, or a two-dimensional array and you have to go through them to find what you're looking for.

Given a number of sides in a dice, calculate every possible combination when using two dices of the given size

def dice_combinations(sides):
    # Initialize combinations array
    combinations = []
    # Iterate through first side
    for i in range(1, sides + 1):
        # Add every combination possible
        for j in range(1, sides + 1):
            combinations.append((i, j))
    return combinations

sides = 6  # Example for a 6-sided dice
print(dice_combinations(sides))
# Output: An array with 36 items (6 * 6)
# [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]

But what if the dice have different sides?

O(n*m) - Okay, now you're just adding letters

If instead you had to calculate the possible combinations using two dice of different sides, they would work as follows.

Given two numbers of sides in a dice, calculate every possible combination when rolling these two dices.

def two_dice_combinations(sides1, sides2):
    # Initialize combinations array
    combinations = []
    # Iterate through first side
    for i in range(1, sides1 + 1):
        # Add every combination possible
        for j in range(1, sides2 + 1):
            combinations.append((i, j))
    return combinations

sides1 = 6  # Example for a 6-sided dice
sides2 = 8  # Example for an 8-sided dice
print(two_dice_combinations(sides1, sides2))
# Output: An array with 48 items (6 * 8)
# [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8)]

Keep in mind that these can work indefinitely. You can have an O(n^3) algorithm if we had three dice, or even O(n^5) - math does not impose a limit.

O(log n) - What?

Most people don't even understand what log means, and simply memorize that this notation is used when doing some sort of binary search.

This is the case when for every iteration of the loop, we divide the loop in half for the next iteration. The log n part then becomes how many times can we divide n by 2 to get the result - which is kind of the definition of this log notation when the base is 2.

When working with binary threes we have to traverse the nodes, and on each node we have to make a decision - go "left" or go "right". This is already splitting the amount of operations in half since we're only going into one direction (don't mind that the nodes may have a different amount of child nodes).

This is one of the best algorithms since the run time grows very slowly. For really big input sizes, the time is basically a flat line.

The most common example of O(log n) is when we're doing a binary search.

I won't get into too much detail, but basically a binary search can be used when we have a sorted array where we want to find the index of a specific value.

Given a sorted array, find the index of a target value

def binary_search(arr, target):
    # Initialize the left and right pointers as the first and last
    left, right = 0, len(arr) - 1

    # Continue searching while the left pointer is less than or equal to the right pointer
    while left <= right:
        # Calculate the middle index
        mid = (left + right) // 2

        # Check if the middle element is the target
        if arr[mid] == target:
            return mid
        # If the middle element is less than the target, adjust the left pointer
        elif arr[mid] < target:
            left = mid + 1
        # If the middle element is greater than the target, adjust the right pointer
        else:
            right = mid - 1

    # Return -1 if the target is not found in the array
    return -1

# Example usage:
sorted_array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
target = 7
print(binary_search(sorted_array, target)) # Output: 6 - sorted_array[6] == 7

Note that the loop is not in the familiar for i in range(1, n), but rather the middle between the left and right indexes.

O(n log n) - Now you're just making stuff up

The only reason this is here is because it is very hard to intuitively figure this out.

This notation is commonly found in sorting algorithms and in fact is the most common for built-in sorting functions in modern languages.

Take, for example, the Merge Sort. Just so we do not get into too much detail, it basically works by dividing the array into two halves recursively (log n divisions) and then merges the halves back together in linear time (O(n) for each merge). By combining these two steps, we have O(n * log n).

Given an unsorted array, sort it by using merge sort.

def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    # Find the middle point and divide the array into two halves
    mid = len(arr) // 2
    left_half = arr[:mid]
    right_half = arr[mid:]

    # Recursively sort the two halves
    left_sorted = merge_sort(left_half)
    right_sorted = merge_sort(right_half)

    # Merge the sorted halves
    return merge(left_sorted, right_sorted)

def merge(left, right):
    sorted_array = []
    left_index, right_index = 0, 0

    # Merge the two arrays while maintaining order
    while left_index < len(left) and right_index < len(right):
        if left[left_index] < right[right_index]:
            sorted_array.append(left[left_index])
            left_index += 1
        else:
            sorted_array.append(right[right_index])
            right_index += 1

    # Append any remaining elements from the left or right array
    sorted_array.extend(left[left_index:])
    sorted_array.extend(right[right_index:])

    return sorted_array

# Example usage:
unsorted_array = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(merge_sort(unsorted_array)) # Output: [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]

We can see in the code above that every time merge_sort is called, it ends by calling merge. And every time merge_sort is called, it also calls itself twice, one for each half of the array.

O(2^n) - I should have seen it coming

This notation is usually found when we have a recursion algorithm that branches out in two ways.

We can easily see these complexities in bubble sort, but as we can't talk about algorithms without talking about Fibonacci, let's finally do it. But remember - there are more efficient ways to solve this problem.

Given an index, find the number in the Fibonacci sequence.

def fibonacci(n):
    if n <= 1:
        return n
    branch1 = fibonacci(n-1)
    branch2 = fibonacci(n-2)
    return branch1 + branch2

# Example usage:
print(fibonacci(5))  # Output: 5

It is clear in the implementation above that two branches are created for every iteration of the recursive loop.

For example, for n = 5, we would have fibonacci(4) and fibonacci(3) be called, which would generate fibonacci(3) again (we have not implemented memoization in the above algorithm), fibonacci(2) twice and fibonacci(1). You can visualize it as an "upside down binary tree", where the height of the tree is n.

Theoretically we could have any number being raised to the power of n, such as O(3^n) and O(5^n).

O(n!) - Make it end, please!

If you don't know what the ! means, we simply multiply the number by every number - 1 until we get to 1 (which we can ignore).

For example:
5! = 5 * 4 * 3 * 2 = 120

You can think of this as an algorithm that for every iteration, remove an item and run again. It mainly comes up in permutations or, perhaps more famously, in the Traveling Salesman Problem.

For this one, I won't be adding any piece of code since this can get very complicated and this is extremely rare anyways, because if you have an algorithm of O(n!) you most definitely don't have the optimal solution.

So there you have it!

You can refer to the graph below to see how the algorithms compare. The vertical axis means the number of operations (or also the time) and the horizontal axis means the amount of elements (or the value of n). Special thanks to Eric Rowell for the cheatsheet!

Available at https://www.bigocheatsheet.com/

I hope you've found this post useful, and good luck in your future studies! 🤞

EDIT: O(sqrt n)

After this was written I've encountered a LeetCode problem that has a solution O(sqrt n). Here's another blogpost if you're curious: The Kth factor of N - an O(sqrt n) algorithm

Toxicity in Tweets using a BERT model

AlvBarros — Thu, 11 Apr 2024 12:41:47 +0000

The goal

The goal for this project is to create a model that can accurately classify some piece of text into Toxic or not. Basically, if toxicity = 1 or 0.

This is a very simple problem to solve, all you need is a database of texts that are toxic and not, and then you can train your model on it.

The dataset

The competition specifies that the model must be able to predict texts written in Brazilian Portuguese, so the dataset is in Portuguese as well.

The dataset is based on ToLD-Br, which is a huge dataset of tweets ~~(or is it Xeets now?)~~ that contains some additional info such as a classification if the text contains homophobia, obscenity, insults, racism, misogyny and xenophobia. The dataset for the competition, however, is a simple toxicity column.

On the left, the 'Text' column contains the tweet in question, and the 'Toxicity' column if the text is either toxic or not (1 or 0)

Classification problem

Whenever you think about classification, your first guess would be that you need some kind of neural network.

As you may guess from the title of the article, BERT was chosen since it is more recent, it's built in a neural network architecture that uses transformers, which is perfect for Natural Language Processing (NLP).

How does BERT work?

Recurrent and convolutional neural networks use sequential computation to generate predictions. They can predict which word will follow a sequence of given words once trained on huge datasets - this behavior is nicknamed unidirectional algorithm.

BERT, however, has a mechanism called self-attention, which can do this prediction based on the words that precede but also that follow, or in other words, a bi-directional algorithm.

Source: Javier Canales Luna @ DataCamp

The training

First of all, the training data must be cleaned up so that less characters need to be processed by our model. There's some theory on what characters matter and what don't, but I've decided on this final function for format_text:

def format_text(text):
    # Convert text to lowercase
    text = text.lower()
    # Remove words that begin with @ such as tagging @user
    text = re.sub(r'@\w+', '', text)
    # Remove words that begin with # such as #happy
    text = re.sub(r'\b#\w+\b', '', text)
    # Remove URLs
    text = re.sub(r'http\S+', '', text)
    # Remove punctuation and emojis
    text = re.sub(r'[^\w\s]', '', text)
    # Remove stop words
    pt_stp_words = stopwords.words('portuguese')
    text = ' '.join([word for word in text.split() if word not in pt_stp_words])
    # Remove double spaces
    text = re.sub(r'\s+', ' ', text)

    return text

The comments are all self-explanatory. All but one: stopwords.

What are stopwords?

Stopwords are words that are very frequently found in phrases but they don't add very significant meaning.

Such words are "i", "my", "myself", "you", "your". More words can be found here.

For this project, however, I've used stopwords for the Portuguese language available in the nltk.corpus package.

The model

Now, to be used in our model we'll create a TextClassificationDataset class that'll handle the storing and encoding of our texts.

class TextClassificationDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(text, return_tensors='pt', max_length=self.max_length, padding='max_length', truncation=True)
        return {'input_ids': encoding['input_ids'].flatten(), 'attention_mask': encoding['attention_mask'].flatten(), 'label': torch.tensor(label)}

We begin by defining that this class is a PyTorch Dataset.
The __init__ method takes the arguments texts and labels, which are the values in the train dataset in the format of a list. So, for example, the row #3 would have the content of the tweet at texts[2] and the classification at labels[2].
The argument tokenizer is used to convert the texts into a format that the model can understand - since it cannot understand straight text.
The argument max_length is used to limit the length of the tokenized sequences.
The method __len__ returns the number of samples.
The method __getitem__ is used to retrieve the specific item given an index idx. This will retrieve the item from the lists of texts and labels, as well as encoding the value using the tokenizer from __init__.
This encoding is split into two parts: input_ids and attention_mask. input_ids are the tokenized text, and attention_mask is a binary mask that indicates which tokens are actual words versus padding.
Everything is transformed into a PyTorch Tensor.

With the data cleaned up, it was time to create the BERT Classifier. For this project, I used BERTimbau Base, a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment.

These people are so creative.

In the end, this is what our BERTClassifier looked like:

class BERTClassifier(nn.Module):
    def __init__(self, bert_model_name, num_classes):
        super(BERTClassifier, self).__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        self.dropout = nn.Dropout(0.1)
        self.fc = nn.Linear(self.bert.config.hidden_size, num_classes)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        x = self.dropout(pooled_output)
        logits = self.fc(x)
        return logits

# Example of initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BERTClassifier('neuralmind/bert-base-portuguese-cased', 2).to(device)

# If there's a .pth file to load
model.load_state_dict(torch.load('bert_classifier.pth'))

Breaking this stuff into parts:

The __init__ function acts as a constructor. It sets the pretrained BertModel from the given bert_model_name, add a dropout layer to keep things in check and a linear layer to help classify text into num_classes - in our case, 2 polar opposites.
The forward function is defined so that it correctly goes through the additional layers we've set up.

Please note that I didn't tinker a lot with these, since they were kind of default from the sources that I was studying.

Given all of that, now we need our train function. We'll need a lot of things, though:

# Set up parameters
bert_model_name = 'neuralmind/bert-base-portuguese-cased'
num_classes = 2
max_length = 128
batch_size = 16
num_epochs = 2
learning_rate = 2e-5

def train(model, data_loader, optimizer, scheduler, device):
    model.train()
    for batch in data_loader:
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['label'].to(device)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        loss = nn.CrossEntropyLoss()(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()

## Begin training

# Split into train and validation datasets
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, labels, test_size=0.2, random_state=42)
val_dataset = TextClassificationDataset(val_texts, val_labels, tokenizer, max_length)

# Create DataLoader for batch processing
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size)

# Additional steps
optimizer = AdamW(model.parameters(), lr=learning_rate)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

A lot to unpack here:

First, we define some parameters that are going to be used in the model.
num_classes is simple: either toxic, or not.
max_length as already described is the max length of the encoded text.
batch_size would be the number of samples to work through before the model's internal parameters are updated. This value is a choice of balance between reasonable memory requirements without that much loss of performance.
learning_rate is 2e-5, which would be 0.00002. If the learning rate is too high, the model might overshoot the minimum of the loss function and fail to converge. If the rate is too low, the model might get stuck in a sub optimal solution. The value of 2e-5 is commonly used since it is small enough to allow the model to make gradual progress without overshooting or converging to slowly.

Let's skip the train method for now and explain the items below:

The optimizer is used to adjust the parameters of our model to normalize the error or loss function. The optimizer changes the weighs and biases of the neurons in response to the error the model produced in its prediction during training. AdamW is a variation of the Adam optimizer.
The total_steps are the total number of steps that will be run, given that each epoch goes through the entire dataset once - so "amount of epochs" times "amount of rows in each epoch".
The learning rate scheduler, scheduler, is used to adjust learning rate during training. It is used to adjust the learning rate during training, and has proven to avoid overfitting, convergence faster and escape saddle points.

Given everything that was said (and I know that it's too much!), now let's break down the train method:

First, it sets the model to training mode.
Then, enters a loop for each batch of the data loader.
In this loop, it clears the gradient since they're accumulated in PyTorch. It needs to be reset for each batch.
It moves the batch to the device being used to training, such as the CPU or GPU.
Then, it retrieves the input IDs, attention masks and everything else. This is used as input to the model.
Then, with whatever the model outputs, loss is calcuated with the CrossEntropyLoss function.
It performs backpropagation by calling loss.backward().
The optimizer.step() applies the gradients computed in the previous step to update the model's parameters.
Finally, the learning rate is adjusted with scheduler.step().

Phew! A lot of things to uncover.

In the end, we can just call the train function for each epoch, and then save the model as a .pth file.

for epoch in range(num_epochs):
        print(f"Epoch {epoch + 1}/{num_epochs}")
        train(model, train_dataloader, optimizer, scheduler, device)
        accuracy, report = evaluate(model, val_dataloader, device)
        print(f"Validation Accuracy: {accuracy:.4f}")
        print(report)

torch.save(model.state_dict(), "bert_classifier.pth")

This model will be available in the path, and can be imported and used to predict the toxicity of texts! Here's one example:

def predict_sentiment(text, model, tokenizer, device, max_length=128):
    model.eval()
    encoding = tokenizer(text, return_tensors='pt', max_length=max_length, padding='max_length', truncation=True)
    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)
    with torch.no_grad():
            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
            _, preds = torch.max(outputs, dim=1)
    return 1 if preds.item() == 1 else 0

# Load the model from the .pth file
model = BERTClassifier('neuralmind/bert-base-portuguese-cased', 2)
model.load_state_dict(torch.load('bert_pt_classifier.pth'))
print(predict_sentiment('Hello world!', model, tokenizer, device)) # Returns 0

Conclusion

And that's it! If you want to check it out and train/test this model yourself, feel free to check the code in my GitHub repository!

This post was born out of my first Kaggle competition!

Despite not winning the competition, I'm still very close to the top, with 0.00952 setting me apart from the first place, so I hope my experience can also teach other beginners something useful!

I'm already a software engineer at work, but artificial intelligence has always been a source of curiosity for me. When I was in college, I had a brief exposure to computer vision and even ended up publishing some scientific articles. Now, I'm trying to make up for lost time studying and learning AI again. Follow me to join me on my journey!

Special thanks

First of all, special thanks to Pedro Gengo and the folks over at Tensorflow User Group São Paulo for creating the Kaggle Competition and inspiring this project!

Also, huge thanks to Kang Pham for writing this tutorial where I got most of this code!

And finally, thanks for Pedro Henrique Vieira de Lima whose work on Detecção de Comentários Tóxicos em Chats e Redes Sociais com Deep Learning was crucial for hitting a higher score on the leaderboard.

Automate Flutter app delivery to AppCenter with GitHub Actions

AlvBarros — Fri, 06 Oct 2023 22:31:42 +0000

Introduction

In this post, you'll see how to automate the delivery of your Flutter app to the testing team. For this to work, your app must have follow these conditions:

You're developing a Flutter app;
Your code is hosted on GitHub;
Your testing team is using AppCenter.

Got it? Let's begin.

What is Continuous Integration?

According to Atlassian's Max Rehkopf, it's about automatically running builds and tests with every change. This speeds up development, since it is happening automatically, the developer can focus on the development of new features and leave validation and testing to automated tools.

Ok, so what's Continuous Delivery then?

Taking inspiration from Pittet's post, CD is automatic deployment of the code on testing and/or production environment. It's often confused with Continuous Deployment, which is the automated release to the end customers.

In this post we'll focus on Continuous Delivery, since we'll be automating the deployment of your app to AppCenter.

Step 1 - Setting up a Runner for GitHub Actions

If your code is hosted on GitHub, the best approach to DevOps is to write GitHub Actions. You can read more about Actions here, but everything that you need to know will be explained later on. For now, you have to setup a Runner.

A runner is where your action's jobs will be run. It can be a hosted virtual environment, or you can self-host a runner in your machine.

To register your own machine as a runner, you must go to the Settings of your repository and follow the instructions on Settings > Actions > Runner.
Actions > Runner" width="800" height="562">

There you'll see a green button "New self-hosted runner". Click on it and then follow the steps. It is simply running a few commands on you terminal, so I'll not be covering it here.

In the end, you should be running your runner and it will be shown in the same page of the Settings, such as below.

If you have any issues on this part, feel free to leave a comment below.

Step 2 - Set up your AppCenter app

On your AppCenter app's settings, go to Settings > App API Tokens. There, you can create a token that'll be used on our Actions workflow.

Click on "New API Token". You'll be asked to set up a name and access for your token. Set whatever you'd like, but the token must have Full access. Don't worry, only you can see this token.

When the pop-up with the token appears - such as the one below, copy the token and save it somewhere safe. You cannot see it again, and if you happen to lose it, you must delete it and create a new one.

With your AppCenter API Token, go back to your repository on GitHub. On Settings > Secrets and variables > Actions, we'll create a Secret for our repository.

Secrets and variables > Actions" width="800" height="403">

A secret is a variable that is used by our GitHub actions but is not displayed for who does not have permission. The automated workflow can use the value, but no one is able to see what it is. This enables the user to have environment variables such as AppCenter's API Token without it being compromised or leaked. For more information on secrets, refer to the documentation.

Now, click on "New repository secret" and create a APPCENTER_API_TOKEN and paste the value of the token previously created. If you've done everything correctly, this must be what you see on your page.

Now we can finally start creating workflows for our repository.

Step 3 - Creating your GitHub Actions Workflow

First of all, you repository needs a .github folder on the root folder. Then, you create a workflows folder. There, we'll create a main.yml file. The end result should be something like this:

Note that the file doesn't have to be named main.yml, this is just for this example.

In the main.yml file, we must add some code so that it does what we want. Start by writing this:

name: Deploy App on AppCenter
on:
  push:
    branches:
      - main
      - 'releases/**'

name: Self explanatory. Sets the name of this workflow.
on: Defines the triggers of this workflow. On our case, we want this workflow to be triggered on every push that happens on the main branch, or any other branch that starts with releases/, such as releases/1.0.0, releases/hotfix-login, etc.

For more information on on tag, see the documentation.

But still, it's not doing anything. So now we need to add Jobs. Jobs are what make up workflows, and are the steps you want to execute.

For this post, we'll set up these steps:

Set up Flutter;
Set up AppCenter CLI;
Build the app's package;
Deploy the package to AppCenter.

In this post, only Android will be covered. Follow me to know when the post for iOS is released!

Step 3.1 - Set up Flutter

Now that you have your main.yml file configured, we can add jobs to achieve what we want. Start by adding a jobs and a "Set up Flutter" step, such as the code below.

jobs:
  setup-flutter:
    name: Setup Flutter
    runs-on: self-hosted
    steps:
      - name: Check out repository code
        uses: actions/checkout@v3
      - name: Install jq
        uses: dcarbone/install-jq-action@v2.0.2
      - uses: subosito/flutter-action@v2
        with:
          channel: master
          architecture: x64
      - run: flutter pub get

Let's take this apart:

jobs: Defines jobs that must be run for this workflow. These can be run in parallel or sequentially. By default, they are run in parallel. More information later.
runs-on: Defines on what kind of Runner this job can be run. Since we've defined our self-hosted runner in the previous steps, we just add this value.
steps: Defines the steps for this job. Each will be run sequentially.

If you pay attention, we already created 3 steps. They have two tags: name and uses. Name is self explanatory.

Uses defines what action is used on this step. In the first case, actions/checkout@v4 is used. This format says what repository and action is used. Below is the list of actions used:

actions/checkout@: Checks out the code in the runner. This makes sure that the workflow is being run on updated code.
dcarbone/install-jq-action: This actions installs jq on the runner machine. It's a lightweight command-line JSON processor. For more information, check it's page. This is not necessary to build Flutter, but it's used in other actions later on.
subosito/flutter-action: This action installs and runs Flutter commands. It's what we'll use to get dependencies and build our package. It can also be used to run our tests and check for coverage, but it's not being implemented in this guide. This action has a run parameter, which defines the specific command we want to be run.

If you pay attention, all these actions are actually public repositories that we can use. GitHub Actions have a public marketplace that you can add actions that have been developed by the community. To check what's the code is doing, you can go to the repository and check it. It's best to see what it's doing to make sure that you're not using anything with a specific vulnerability.

So to summarize, we've set up a workflow that on every push that happens on master or release/** installs Flutter and the Pub dependencies on the runner.

Now, we can add the next step.

Step 3.2 - Set up AppCenter CLI

This step is going to be easier since we already know what we're doing.

  setup-appcenter-cli: # must have node and npm installed!
      name: Setup AppCenter CLI
      runs-on: self-hosted
      steps:
      - name: Install appcenter-cli
        uses: charliealbright/appcenter-cli-action@v1.0.1
        with:
          token: '${{secrets.APPCENTER_API_TOKEN}}'
          command: 'appcenter help'

We now have a new tag to go over:

with: This sets some parameters to be given to the action as an input. In this case, we add token and command. If you pay attention, we're using secrets.APPCENTER_API_TOKEN.

Secret is an object that contains every variable we've set in the repository's secret. Since we've set APPCENTER_API_TOKEN, this token can now be used in this action. Again, make sure that you know what your actions are doing.
We then add "${{" and "}}" around the variable so that Actions knows that it must change this value to the one set on the secrets.

So now we have two steps: setting up Flutter and AppCenter CLI.

Step 3.3 - Call Deploy Android job on the main workflow

We now can use some of the superpowers of GitHub Actions - creating different jobs on different files. Add this specific job below setup-appcenter-cli:

  deploy_android:
    name: Deploy Android
    needs: [setup-flutter, setup-appcenter-cli]
    uses: ./.github/workflows/deploy_android.yml
    with:
      file: './build/app/outputs/apk/release/app-release.apk'
      name: 'AlvaroBarrosC/GithubAction-Android'

We now have two tags to go over:

needs: This sets some requirements for this job. In our case, it needs Flutter and AppCenter CLI to be set up. This also means that this Deploy job will be run sequentially.
uses: We've already seen this tag, but now we're giving a local path. This will use a deploy_android.yml file that we've created on workflows folder. So let's create the file.

As described previously, the with tag adds some inputs. In this case, we've added file and name.

file: This must be the path to the file that is generated by the Flutter build command. Above is the default output directory, but this can be changed.
name: This one is the username, slash and the app’s name on AppCenter. Mine is AlvaroBarrosC and GithubAction-Android, but you must change it to your own.

Step 4 - Create Deploy Android workflow

As said in the previous step, create the file on .github/workflow/deploy_android.yml, the same folder where your main.yml file is located. Then, you can paste this code:

name: Deploy Android App on AppCenter
on: 
  workflow_call:
    inputs:
      file:
        description: 'The path to the file to be released'
        required: true
        type: string
      name:
        description: 'The name of the app'
        required: true
        type: string
      group:
        description: 'The group that will have access to the version released'
        required: false
        type: string
        default: '"Collaborators"'
jobs:
  build:
    name: Build .apk file
    runs-on: self-hosted
    steps:
      - run: flutter build apk --release --verbose
  Deploy:
    name: Deploy file to AppCenter
    needs: [build]
    runs-on: self-hosted
    steps:
      - name: AppCenter CLI Action
        uses: charliealbright/appcenter-cli-action@v1.0.1
        with:
          token: ${{secrets.APPCENTER_API_TOKEN}}
          command: 'appcenter distribute release -f ${{inputs.file}} --app ${{inputs.name}} --group ${{inputs.group}}'

You now can see how the inputs are defined on the top of the file. They have some metadata, such as required, type and description. You can also see them being used on the command parameter of the AppCenter CLI Action. By using ${{inputs.[field]}}`, you place the value given when the workflow was called.

We can now make a push on any change in the affected branches and see the workflow being run. Make sure your runner is running.

Step 5 - Deploy!

If you followed every step correctly, whenever you go to your Repository’s Actions tab, you can see your previous runs there. Also, you can debug and see the log of the steps.

Note that iOS deployment was not covered on this post.

Thanks for reading!

If you've missed any of the steps or is encountering any problem, you can check the code on this repo. Feel free to comment on this post with any questions that you have.

Dependency Injection in Flutter

AlvBarros — Wed, 16 Aug 2023 17:14:00 +0000

In this article I'll attempt to teach you what it is, how to do it and why would you do it, as well as providing examples and a link to a repo on GitHub where you can check the code and try it for yourself. Now, moving on.

According to Wikipedia:

In software engineering, dependency injection is a design pattern in which an object or function receives other objects or functions that it depends on. A form of inversion of control, dependency injection aims to separate the concerns of constructing objects and using them, leading to loosely coupled programs. The pattern ensures that an object or function which wants to use a given service should not have to know how to construct those services. Instead, the receiving 'client' (object or function) is provided with its dependencies by external code (an 'injector'), which it is not aware of.

So, in other words:

Instead of creating objects inside a class or method, those objects are "injected" from outside;
The class does not need to know how to create the objects it depends on, it just needs to know how to use them;
This generates code that is easier to test and is more maintainable.

Like anything in life, DI comes with some Pros and Cons.

Pros:

Makes your code easier to test, since you can just inject mocks in your classes;
Makes your code easier to maintain, as changes to the implementation of the injected objects can be made without affecting the class or method that depends on them.

Cons:

DI can add more complexity to your project, especially if done improperly;
Injecting dependencies can introduce performance overhead;
DI can introduce runtime errors, such as null pointer exceptions, if dependencies are not properly managed or injected.

The Car example

So, let's start with some code. Suppose you have a Car class, that has an Engine.

class Car {
    Engine? engine;
    const Car();

    void start() {
        engine.start(); // Null reference exception
    }
}

For this Car to work, you need a working Engine. That, however, is another class that has a bunch of complexities and other requirements that do not concern the car itself.

Following the principles of dependency injection, this is what we can do:

Constructor injection

The dependencies are passed to a class through its constructor.

This pattern makes it clear what dependencies a class require to function, and it ensures that the dependencies are available as soon as the class is created.

If we implement constructor injection in our Car class:

class Car {
    final Engine engine;
    const Car(this.engine);

    void start() {
        engine.start(); // engine is not null!
    }
}

Since Car.engine is final and also required in the construcotr, we make sure that it will never be null.

void main() {
    final engine = Engine();
    final car = Car(engine);
    car.start();
}

Adding more parts

Now, let's imagine that you're a car manufacturer and you are creating parts of a car. Since cars are not only made of engines, you now have this class structure:

Please note that I'm not a car manufacturer and this is not all the parts a car needs.

class Car {
    final Engine engine;
    final List<Wheel> wheels;
    final List<Door> doors;
    final List<Window> windows;
    Car(this.engine, this.wheels, this.doors, this.windows);

    void start() {
        engine.start();
    }

    void rollDownAllWindows() {
        for (var w in windows) {
            w.rollDown();
        }
    }

    void openAllDors() {
        for (var d in doors) {
            d.open();
        }
    }

    // ...
}

Since the engine is final and must be passed on in the constructor, the class won't compile until you give it a working engine. It doesn't make sense that your doors doesn't work until you have a working engine.

With the construction injection approach, you're only able to have a Car instance after you have all the pieces already done, and can not have an "incomplete" Car.

Setter injection

The dependencies are set on a class through setter methods.

This pattern allows for more flexibility as the dependencies can be set or changed after the class is created.

Whenever you have an instance of Car, you can just use setEngine to set an engine to the car. This fixes the previous problem and we can now have a Car and later give it an engine.

class Car {
    Engine? engine;
    List<Wheel> wheels;
    List<Door> doors;
    List<Window> windows;
    Car(this.wheels, this.doors, this.windows, {this.engine});

    void setEngine(Engine newEngine) {
        engine = newEngine;
    }

    void start() {
        engine?.start();
    }

    // ...
}

Now all you have to do is call setEngine whenever your engine is ready to be placed in the car. You also must add some validation so that you don't have runtime errors happening in your code. For more information on how to properly prevent these issues, take a look at Null safety in Dart.

Other types of dependency injection

These other types will not be covered in this example, so these are just introductions.

Interface injection

The class implements an interface which defines the methods for injecting the dependencies.

This pattern allows for more abstraction and decoupling of the code, as the class does not have to depend on a specific implementation of the interface.

Ambient context

You may be familiar with the provider pub package

A shared context is used to provide the dependencies to the classes that require them.

This pattern can be useful in situations where multiple classes need access to the same dependencies.

Service locator

You may be familiar with the get_it pub package.

A central registry is used to manage and provide the dependencies to the classes that require them.

This pattern can make it easier to manage dependencies in large applications, but it can also make the code more complex and harded to test.

Ok, but why tho?

In one of my projects I needed to create an authentication layer so that my users can create accounts and authenticate themselves.

Since I was still deciding on which one to use - since it needed to be free and easy to scale - I created a dependency injection structure so that I can easily swap out whenever I'd like to test another authentication service.

This is the structure that I've got:

class AuthenticationRepository {
    final AuthenticationProvider provider;
    AuthenticationRepository(this.provider);

    Future<UserSession?> signIn(String email, String password) {
        return provider.signIn(email, password).then((session) {
            if (session != null) {
                return session;
            }
            throw 'Failed to authenticate';
        }).catchError((error) {
            throw error;
        });
    }

    // ...
}

This class has a method signIn that takes an user's email and password, then give it to the corresponding provider. It also returns an UserSession, class responsible to store the current user's data and authentication token.

class UserSession {
  final String username;
  final String email;

  UserSession({
    required this.username,
    required this.email,
  });

  String get sessionToken => "";
}

Take notice of AuthenticationRepository.provider. It's an instance of the class AuthenticationProvider. Here's the configuration:

abstract class AuthenticationProvider {
    Future<UserSession?> signIn(String email, String password);
}

Since this class is abstract, in order to create a repository that actually works, you need to give it an implementation.

So I have created two classes: FirebaseProvider and CognitoProvider. These classes are responsible for managin user authentication with Firebase's and Cognito's APIs respectively.

There's a pub package for Firebase integration and also one for Cognito integration.
These packages, however, do not seamlessly fit into the AuthenticationProvider abstract class showed in this example.

So now, in order to authenticate, we just need to decide which one we want to use. Imagine you have your AuthenticationRepository stored in a service locator such as GetIt:

// setting up 
GetIt.instance.registerSingleton<AuthenticationRepository>(AuthenticationRepository(CognitoProvider());

// authenticating an user
final auth = GetIt.instance<AuthenticationRepository>();
auth.signIn(email, password);

Testing example

To showcase how you can use DI to make better tests and mock classes easily, here's an example of MockAuthenticationProvider that enables testing on AuthenticationRepository.

You can begin by creating the mocked provider:

class MockAuthenticationProvider implements AuthenticationProvider {
  static String successPassword = "123";

  UserSession? userSession;
  MockAuthenticationProvider({this.userSession});

  @override
  Future<UserSession?> signIn(String email, String password) {
    if (password == successPassword) {
      return Future.value(userSession);
    } else {
      return Future.value(null);
    }
  }
}

Note that the class above has a static successPassword property. This is so that we can implement success and failure methods, but it is in no way necessary. Feel free to implement any logic that you'd like.

And now you can then create the mock factory:

AuthenticationRepository mockRepository() {
  final mockUserSession = UserSession(
    username: "mock",
    email: "mock@mail.com",
    sessionToken: "token",
  );
  final mockProvider = MockAuthenticationProvider(userSession: mockUserSession);
  return AuthenticationRepository(mockProvider);
}

By using this AuthenticationRepository, we can easily test its methods without needing to integrate with either Cognito nor Firebase. Here's an example of a successful unit test:

test('Should return a valid UserSession', () async {
    final repo = mockRepository();
    final result = await repo.signIn(
        "email", MockAuthenticationProvider.successPassword);
    assert(result.sessionToken != null);
});

Note that we're trying to signin with an "email" and MockAuthenticationProvider.successPassword, which is a way to force the provider to return an UserSession.

Now, testing for failures:

test('Should throw if UserSession comes null from provider', () async {
    final repo = mockRepository();
    try {
    await repo
        .signIn("email", "incorrect password")
        .then((userSession) {
        fail("Should throw an exception");
    });
    } catch (error) {
    assert(error.toString() == "Failed to authenticate");
    }
});

Ending

And that's it!

Thanks for reading through the end with this article. This is my first here on dev.to, so feel free to leave any feedbacks.

Here's the source code once again. Feel free to open an issue or comment down below.

See ya!