DEV Community: Ziad Alezzi

I made an AI that detects landmines

Ziad Alezzi — Thu, 19 Feb 2026 20:38:49 +0000

Machine Learning is transforming industries worldwide. Yet in mine action, where lives are at stake, we are barely seeing any implementation even though the potential is so high.

I'm Ziad Alezzi, a 16 year old developer from Lebanon, my dream is to pursue my education in Japan under scholarship and to work towards a citizenship. To contribute to that goal, I made this video to prove that even with fundamental programming knowledge and publicly available tools, I was able to make two machine learning models to demonstrate:

"If I can do this, Mine Action Organizations with Professional Developers absolutely can."

Part 1: The Problem

The moment I began this project, I faced difficulty immediately. Landmines are sensitive military objects. Public image datasets are not available. And with insufficient labeled data, training an image classification model is impossible.

Instead of stopping there, I generated my own dataset.

Part 2: The Solution

In three days, I learned the basics of 3D modeling using Blender.

I downloaded freely available 3D Landmines, assembled them, added rusty textures to make them look realistic, and placed them in different terrains. By changing camera angles, positions, and soil textures, I was able to generate as many synthetic training images as I wanted.

This approach, called synthetic data generation, is widely used in AI research but not in mine action.

With my handmade dataset ready, I fine-tuned a pretrained convolutional neural network using the fast.ai library.

To prove how accessible experimenting with AI has become, I used the fast.ai library that has made it so that anyone can make their own image model with only 17 lines of code!

from fastai.vision.all import * 
from pathlib import Path

path = Path('data')
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.1, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(224, method='squish')]
).dataloaders(path, bs=6)
speed_learn = vision_learner(dls, 
                             'beit_base_patch16_224', 
                             metrics=error_rate, 
                             pretrained=True, 
                             pretrained_cfg=None, 
                             img_size=224)
speed_learn.fine_tune(5)

Making an image model of this scale in 2015 used to require a research team and many years of development, but advancements in GPU capabilities has made it easy for even the average developer on a regular laptop.

The training process took only 10 seconds,

Yet was able to classify a photo of a real landmine which my mother had taken herself as she was touring a minefield for Lebanon's MAG (Mine Action Group)

Remember, the model was trained only on synthetic blender-generated images I created myself. Yet it still managed to successfully classify a real landmine with over 70% confidence.

My model was trained on images i made in blender like this:

And was able to detect the same type of landmine taken in real life:

The process went like this:

1- Get the freely available landmine parts

2- Assemble the parts and add rusty textures

3- Put the landmine model in different scenes and take images to use as training data

The goal here was proof of concept, not deployment readiness. However this demonstrated how accessible experimenting with AI has become.

Why image models alone aren't enough

In reality, most landmines are hidden underneath the ground. So while an image model is useful for cases like detecting butterfly landmines (antipersonel mines that are usually above the ground) using thermal imaging from drones, and has been done by researchers in Ukraine, it cannot be generalized to every mine.

They used simple low cost drones with thermal sensors attached:

Had it fly over PFM-1 (Russian butterfly mines) to collect data:

And were able to detect these mines through thermal sensing:

Citation: "Nikulin, Alex & Smet, Timothy & Baur, Jasper & Frazer, William & Abramowitz, Jacob. (2018). Detection and Identiﬁcation of Remnant PFM-1 ‘Butterﬂy Mines’ with a UAV-Based Thermal-Imaging Protocol. Remote Sensing. 10. 10.3390/rs10111672."

This is where sensor-based detection becomes important.

In 2018, students at Gazi University in Turkey wrote a PhD thesis where they used drones equipped with sensors to collect the following environmental variables:

Soil type
Voltage detected
Drone altitude

These inputs were fed into a machine learning model capable of predicting the presence and type of buried mines.

Citation: "Yilmaz, C., Kahraman, H. T., & Söyler, S. (2018). Passive mine detection and classification method based on hybrid model. IEEE Access, 6, 47870-47888."

Inspired by this, I built a simplified tabular classification model using PyTorch which achieved 80% accuracy in detecting underground mines in a simulated dataset.

I used textbook applied Machine Learning with the PyTorch library, with nothing fancy:

import numpy as np
import torch
import pandas as pd
import torch.nn.functional as F


data = pd.read_csv('data2/mines.csv')
t_dep = torch.tensor(data['label'], dtype=torch.long)
t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)
t_indep = (t_indep - t_indep.mean(dim=0)) / t_indep.std(dim=0)
n_coeffs = t_indep.shape[1]

def init_coeffs():
    hiddens = [10, 8] # Update for each hidden layer
    sizes = [n_coeffs] + hiddens + [5]
    n = len(sizes)
    layers, consts = [], []
    for i in range(n-1):
        w = torch.empty(sizes[i], sizes[i+1])
        torch.nn.init.kaiming_uniform_(w, a=0)  # stable for ReLU
        layers.append(w.requires_grad_())

        b = torch.zeros(1, sizes[i+1], requires_grad=True)  # biases start at 0
        consts.append(b)
    return layers, consts
def update_coeffs(coeffs, lr):
    layers, consts = coeffs
    for layer in layers + consts:
        if layer.grad is not None:
            layer.sub_(layer.grad * lr)
            layer.grad.zero_()
    return layers, consts

def calc_preds(coeffs, indeps):
    layers, consts = coeffs
    n = len(layers)
    y_pred = indeps

    for i, layer in enumerate(layers):
        y_pred = y_pred @ layer + consts[i]
        if i != n-1: y_pred = F.relu(y_pred)
    preds = F.softmax(y_pred, dim=1)
    logits = y_pred
    return preds, logits

def one_epoch(t_dep, t_indep, coeffs, lr):
    preds, logits = calc_preds(coeffs, t_indep)
    loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
    loss.backward()
    with torch.no_grad(): return update_coeffs(coeffs, lr)

def train_model(t_dep, t_indep, epochs=699, lr=1, loss_arr=[], acc_arr=[]):
    torch.manual_seed(777)
    coeffs = init_coeffs()
    for i in range(epochs):
        coeffs = one_epoch(t_dep, t_indep, coeffs, lr)
        preds, logits = calc_preds(coeffs, t_indep)
        loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
        acc = accuracy(coeffs, t_dep, t_indep) * 100
        print(f"Iteration: {i:03d} | Loss: {loss:.4f} | Accuracy: {acc:.2f}%")
        loss_arr.append(loss.item())
        acc_arr.append(acc)

    return coeffs, loss_arr, acc_arr

Again, this was not industrial-grade research, but a demonstration of accessibility.

In the context on Lebanon

Lebanon continues to suffer from landmines and unexploded ordnance. Areas affected by conflict remain potentially dangerous even after ceasefires. Mine action teams taking too long to sweep the areas can mean danger for returning citizens. And this is exactly what happened with Lebanon after the ceasefire a year ago, evacuated citizens returned back to their potentially landmine-contaminated land not knowing of the dangers that awaited them.

Instead of this truth, imagine this scenario:

On day one after the ceasfire, drone swarms are deployed
Equipped with simple voltage and soil sensors
AI model analyzes terrain in real time
Minefields are mapped quickly and precisely
Safe and dangerous areas become known to returning civilians
Demining teams know exactly where each mine is and what type it is before entering

This level of operational efficiency could substantially decrease potential casualties and make recovery way faster.

Again, while my models alone certaintly cannot achieve that:

If a high school student with limited resources can prototype these models, the barrier to mine action groups innovating in this field is clearly not technical inability.

And really, an important question comes up:

"Why must there be more developers working on making mines, but not against them?"

This Was ML

lucirie (Ziad Alezzi) · GitHub

lucirie has 27 repositories available. Follow their code on GitHub.

github.com

Star Multi-Class Classification Neural Network With Pytorch

Ziad Alezzi — Sun, 16 Nov 2025 14:14:20 +0000

Introduction:

Whenever I hear any stories about stars far far away from earth, Im talking millions of light years away not possibly seen visually through any telescope, I always wonder:

"How on earth (pun intended) do they find out what typa star it is from this far away?!"

And to satisfy this curiousity, I did a little google search and found out that every star has its own life cycle, ranging from a few million to trillions of years, and its properties change as it ages. And by measuring these properties, we can deduce the type of star.

But I'm not satisfied until I do it myself. So in this notebook I'll be writing a model that classifies a star's type based on it's features. (And also serve as a good practice for Pytorch)

Dataset

The dataset consists of 6 collumns.

The Temperature of the star
The Luminosity of the star
It's Radius
It's Absolute Magnitude
It's General Color of Spectrum
The Spectral Class

Everything should be obvious, however the Spectral Class might be news.

An asteroid spectral type is assigned to asteroids based on their reflectance spectrum (its effectiveness in reflecting radiant energy)

import os
from pathlib import Path

iskaggle = os.environ.get("KAGGLE_KERNEL_RUN_TYPE", '')
if iskaggle: path = Path('../input/star-type-classification')
else:
    path = Path('data')

Here's a quick look at the data:

import pandas as pd

data = pd.read_csv(path/'Stars.csv')
data

output:

	temperature	luminosity	radius	magnitude	color	class	label
0	3068	0.002400	0.1700	16.12	Red	M	0
1	3042	0.000500	0.1542	16.60	Red	M	0
2	2600	0.000300	0.1020	18.70	Red	M	0
3	2800	0.000200	0.1600	16.65	Red	M	0
4	1939	0.000138	0.1030	20.06	Red	M	0
...	...	...	...	...	...	...	...
235	38940	374830.000000	1356.0000	-9.93	Blue	O	5
236	30839	834042.000000	1194.0000	-10.63	Blue	O	5
237	8829	537493.000000	1423.0000	-10.73	White	A	5
238	9235	404940.000000	1112.0000	-11.23	White	A	5
239	37882	294903.000000	1783.0000	-7.80	Blue	O	5

240 rows × 7 columns

Sampling the dataset is cool and all, but we wanna see some more important information

data.describe()

output:

	temperature	luminosity	radius	magnitude	label
count	240.000000	240.000000	240.000000	240.000000	240.000000
mean	10497.462500	107188.361635	237.157781	4.382396	2.500000
std	9552.425037	179432.244940	517.155763	10.532512	1.711394
min	1939.000000	0.000080	0.008400	-11.920000	0.000000
25%	3344.250000	0.000865	0.102750	-6.232500	1.000000
50%	5776.000000	0.070500	0.762500	8.313000	2.500000
75%	15055.500000	198050.000000	42.750000	13.697500	4.000000
max	40000.000000	849420.000000	1948.500000	20.060000	5.000000

Quick Overview

Some values differ alot, and this outliers are not very good for writing a model that can generalize to new data.

The smallest value for luminosity is as low as 0.00008 (That's 4 zeros!!) whereas the largest gets up to ~850,000

Now come's a dillemma:

"Should I normalize the data, or preserve their differences?"

Best way to know, is to check if these overly small/huge values are truly outliers and must be normalized.

Almost half of the dataset has it's luminosity in the decimals!! With the rest being in the hundreds of thousands.

Alright, we'll have to normalize this. I'll use Z-Score which is very simple, you subtract the original value by the mean of the column, and divide that by the standard deviation.

z = (x - μ) / σ

import torch
import matplotlib.pyplot as plt

def z_score(x): return (x - torch.mean(x)) / torch.std(x)

normalized_luminosity = z_score(torch.tensor(data.luminosity))



plt.subplot(2, 1, 1)
plt.scatter(data['luminosity'].to_numpy(), range(240))
plt.title('Before')
plt.subplot(2, 1, 2)
plt.scatter(normalized_luminosity.numpy(), range(240))
plt.title('After')
plt.tight_layout()
plt.show()

From hundreds of thousand's to single digits, while still preserving the data's structure! Perfect!

Preparation

Now ill convert the dataframe into a tensor and prepare it to be chucked into a neural network.

This dataset already is very well-put together. There are no nil values, however there is a need for dummy columns.

So let's make this quick, and get it over with!

First up is defining the dependant and independant variables as tensors

import numpy as np

t_dep = torch.tensor(data['label'])
t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[22], line 4
      1 import numpy as np
      3 t_dep = torch.tensor(data['label'])
----> 4 t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\generic.py:6643, in NDFrame.astype(self, dtype, copy, errors)
   6637     results = [
   6638         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6639     ]
   6641 else:
   6642     # else, only a single dtype is given
-> 6643     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6644     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6645     return res.__finalize__(self, method="astype")


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
    427 elif using_copy_on_write():
    428     copy = False
--> 430 return self.apply(
    431     "astype",
    432     dtype=dtype,
    433     copy=copy,
    434     errors=errors,
    435     using_cow=using_copy_on_write(),
    436 )


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    755         raise ValueError("Can not squeeze with more than one column.")
    756     values = values[0, :]  # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    760 new_values = maybe_coerce_values(new_values)
    762 refs = None


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
    179     values = values.astype(dtype, copy=copy)
    181 else:
--> 182     values = _astype_nansafe(values, dtype, copy=copy)
    184 # in pandas we don't store numpy str dtypes, so convert to object
    185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
    129     raise ValueError(msg)
    131 if copy or arr.dtype == object or dtype == object:
    132     # Explicit copy, or required since NumPy can't view from / to object.
--> 133     return arr.astype(dtype, copy=True)
    135 return arr.astype(dtype, copy=copy)


ValueError: could not convert string to float: 'Red'

Oh golly jargon!! Damn, what went wrong?!
Let's see here..

ValueError: could not convert string to float: 'Red'

We tried to convert a word into a number!

Ah.. almost forgot math needs numbers..

Dummy Columns

data['color'].unique()

output:

array(['Red', 'Blue White', 'White', 'Yellowish White', 'Blue white',
       'Pale yellow orange', 'Blue', 'Blue-white', 'Whitish',
       'yellow-white', 'Orange', 'White-Yellow', 'white', 'yellowish',
       'Yellowish', 'Orange-Red', 'Blue-White'], dtype=object)

As we can see, the color column has 17 possible colors a star can be.

However, since we can only accept numbers, an easy fix is using Dummy Columns

Ill give you an example, say we had data for a group of people. One of the columns is "Gender" where each person could have the value "Female" or "Male". To turn these into numbers, we'll create 2 new columns:

"Female?"
"Male?"

Say you have bob, a professional male specializing in being a man. His female column would be "0" for false, and his male column would be "1" for true.

Boom!! Problem solved.

So to apply this here, we'll just make 17 columns for every color of star.

data = pd.get_dummies(data, columns=['color', 'class'])
data[["color_Red", "color_Blue White", "color_White", "color_Yellowish White", "color_Blue white", "color_Pale yellow orange", "color_Blue", "color_Blue-white", "color_Whitish", "color_yellow-white", "color_Orange", "color_White-Yellow", "color_white", "color_yellowish", "color_Yellowish", "color_Orange-Red", "color_Blue-White"]].sample(5)

output:

	color_Red	color_Blue White	color_White	color_Yellowish White	color_Blue white	color_Pale yellow orange	color_Blue	color_Blue-white	color_Whitish	color_yellow-white	color_Orange	color_White-Yellow	color_white	color_yellowish	color_Yellowish	color_Orange-Red	color_Blue-White
202	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	False
56	True	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
103	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	False
176	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	False
8	True	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False

Phew, let's try this again now.

import numpy as np
t_dep = torch.tensor(data['label'])
t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)
t_indep = (t_indep - t_indep.mean()) / t_indep.std()

No errors! Eureka!

Defining the Neural Network

For the math-focused nerds, you can imagine a neural network simply as one big ol' composite function. Multiple layers, with each layer containing multiple units, and each unit taking in a matrix as input, multipliying it with a matrix of coefficients.

The main import thing to remember for the coefficients, is their matrix dimensions. Let's say we have the shape of a matrix of coefficients as [x, y]:

x: Number of units of previous layer
y: Number of units of next layer

First function we must define is one to initialize the coeffs
We'll create the number of hiddens layers, and units in each layer. Then, we'll loop over each hidden layer to create randomized matricies of parameters and constants in correct shape.

Quick note:

The values in the matricies of parameters MUST be randomized. Since during backpropagation, initializing all parameters to 0 will result in identical gradients that would effectivally cancel the training out.

initializing the coeffs

n_coeffs = t_indep.shape[1]

def init_coeffs():
    hiddens = [10, 10] # Update for each hidden layer
    sizes = [n_coeffs] + hiddens + [6]
    n = len(sizes)
    layers = [(torch.randn(sizes[i], sizes[i+1])) * 0.1 for i in range(n-1)]
    consts = [torch.randn(1, sizes[i+1]) for i in range(n-1)]
    for layer in layers + consts: layer.requires_grad_()
    return layers, consts

If this was NumPy, I would be forced to calculate the derivatives for all parameters in the neural network. Yes, id automate it in a loop. But since I really only have an intuative understanding of derivatives, i never fully understand whats going on.

Luckily for us, in PyTorch, the derivatives are automatically calculated aslong as we define the cost function!! While intializing the coeffs, simply adding the line layer.requires_grad_() told PyTorch to start tracking that layer to layer calculate it's gradients.

import torch.nn.functional as F

def calc_preds(coeffs, indeps):
    layers, consts = coeffs
    n = len(layers)
    y_pred = indeps

    for i, layer in enumerate(layers):
        y_pred = y_pred @ layer + consts[i]
        if i != n-1: y_pred = F.relu(y_pred)
    preds = F.softmax(y_pred, dim=1)
    logits = y_pred
    return preds, logits

So what exactly did we do here? To answer, not much.

Simply itterated over each layer (containing the randomized coeffs) and matrix multiplied the coeffs by the input independants.
Output to a layer becoming the input to the next. With each time, applying the ReLU (Rectified Linear Unit) to the output.

The ReLU is simply a linear equation thats cut off at zero. Meaning that, any number less than 0 will be turned into a zero.

The ReLU and Tanh activation functions are the most common for neural network hidden layers. I picked ReLU for this example because it's the one I've used the most. We then finish it off with a sigmoid activation to get out binary prediction.

Ah ah ah!! Stop right there. If we were doing binary classification, we'd use a Sigmoid function. But since we have multiple outputs (multiple types of stars) this is a multiclass classification problem! We must use the SoftMAX

Updating The Parameters

Pytorch automatically tracks gradients, so subtracting the gradients is as easy as using #sub_(#grad()) :D

def update_coeffs(coeffs, lr):
    layers, consts = coeffs
    for layer in layers + consts:
        if layer.grad is not None:
            layer.sub_(layer.grad * lr)
            layer.grad.zero_()
    return layers, consts

Crucial step Needed

We've talked about Pytorch automatically tracking the gradients.. But it can't do any of that without the loss function!

So, let's refresh our memory:

This is a multiclass classification model that uses a softmax output

Thus, an appropriate loss function would be the Categorical Cross-Entropy. However, CCE uses one-hot encoded labels. Frankly, i feel too lazy to write add in one-hot encoding. So instead, we'll use Sparse CCE which works with basic integer labels.

Sparse Categorical Cross Entropy

Here's something about me, I HAAAATE jargon.

The loss function's name is "Sparse Categorical Cross-Entropy" and i think that's the stupidest thing ever.

All this is, is fancy worded jargon meant to boost the egos of those who use it.

The bad downside, is that this deters so many people from Machine learning because of how complicated it sounds.

In reality, "Sparse Categorical Cross-Entropy" is defined as:

L = -log(P(y))

wow. how complicated.

Remainder of the code used:

def accuracy(coeffs, t_dep, t_indep):
    preds, logits = calc_preds(coeffs, t_indep)
    predicted_classes = torch.argmax(preds, dim=1)
    correct = (predicted_classes == t_dep).float()
    return correct.mean().item()

def one_epoch(t_dep, t_indep, coeffs, lr):
    preds, logits = calc_preds(coeffs, t_indep)
    loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
    loss.backward()
    with torch.no_grad(): return update_coeffs(coeffs, lr)

def train_model(t_dep, t_indep, epochs=300000, lr=0.00055, loss_arr=[], acc_arr=[]):
    torch.manual_seed(777)
    coeffs = init_coeffs()
    for i in range(epochs):
        coeffs = one_epoch(t_dep, t_indep, coeffs, lr)

        if i % 1000 == 0:
            preds, logits = calc_preds(coeffs, t_indep)
            loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
            acc = accuracy(coeffs, t_dep, t_indep) * 100
            print(f"Iteration: {i:03d} | Loss: {loss:.4f} | Accuracy: {acc:.2f}%")
            loss_arr.append(loss.item())
            acc_arr.append(acc)

    return coeffs, loss_arr, acc_arr

coeffs, loss_arr, acc_arr = train_model(t_dep, t_indep)

Iteration: 299000 | Loss: 0.6482 | Accuracy: 65.83%

Showing the results:

import matplotlib.pyplot as plt

plt.plot(range(300), loss_arr)
plt.xlabel("Itterations (in thousands)")
plt.ylabel("Loss")
plt.title("Variation of Loss During Training")
plt.show()

Output:

plt.plot(range(300), acc_arr)
plt.xlabel("Itterations (in thousands)")
plt.ylabel("Accuracy")
plt.title("Variation of Accuracy During Training")
plt.show()

Output:

This Was PyTorch

lucirie (Ziad Alezzi) · GitHub

lucirie has 27 repositories available. Follow their code on GitHub.

github.com

Creating a Twitter Sentiment Analysis NLP Model For Video Games

Ziad Alezzi — Tue, 02 Sep 2025 08:03:00 +0000

Before we jump into the data and code, I want to first clear up what even IS "Sentiment Analysis". If you dont know what it means, it feels like fancy jargon. If you know what it means, it feels as if it's useless.

So to clear up:

What is Sentiment Analysis?

It means using a Language Model to analyze text (usually reviews) and deduce whether the message is positive or negative.

Why use Sentiment Analysis?

It allows you to gain insight on public opinion or customer feedback on a large scale. Super useful for when you have a product you're trying to sell.

Writing a Sentiment Model using Transformers

Here's an overview of all the libraries used in this implementation:

import pandas as pd
from datasets import Dataset, DatasetDict
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import Trainer, TrainingArguments

Before we get to any code, we need to understand the magical data that will allow us to fine tune a language model to do what we want.

The data consists of Twitter posts discussing different video games. Some of them are happy (they loved the game), some were pissed (they hated the game), and some were neutral.

Here's an example of two rows of data:

Phew! One person's such a fan they made a whole wallpaper, while the other's day is flat out ruined!!

I always LOVE using excel to visualize my data, and play around with it. Some labels in this dataset were "Irrelevant", so I used excel's find feature to find and delete all rows with it. I also used excel to convert the labels from text to numbers (Negative --> -1 | Neutral --> 0 | Positive --> 1)

So the data contains 3 columns:

Game (The name of the game that the Twitter post is refering to)
Text (The content of the Twitter post)
Labels (The sentiment \ Can be Positive, Negative or Neutral)

Using the pandas library we're able to turn a csv file into a DataFrame.

Code:

df = pd.read_csv('data/training.csv').dropna()
df.describe(include=object)

Output:

	Game	Text
count	61121	61121
unique	32	57294
top	TomClancysGhostRecon
freq	2297	139

With 60,000+ rows, this dataset is HUGE!! Which is perfect for a NLP task, but not so much for my poor gpu. So in training I instead took a smaller subset of 5,000 shuffled rows.

Next, we should make an extra column in the df that'd be the actual input to the model.

df['input'] = 'Game: ' + df.Game + '; Text: ' + df.Text + ';'

This allows you to format your input like this:
'Game: Borderlands; Text: I am coming to the borders and I will kill you all,;'

Ofcourse, transformers takes as input a dataset object, not a dataframe.

--> ds = Dataset.from_pandas(df)

Dataset({
    features: ['Game', 'labels', 'Text', 'input', '__index_level_0__'],
    num_rows: 61121
})

Using a pretrained model

Every deep learning model is simply just a big fancy math function. And ofcourse, you can't do math on text. (spoilers: YOU NEED NUMBERS!)

So How do you convert text to numbers?
Using tokenization ofcourse! (ah.. What's that?)
Tokenization means cutting up text into smaller, more digestable, pieces.

This could mean cutting up the text by words (each token is a word), or cutting sub-words.

After Tokenization, there is Numericalization (yay more jargon)

Numericalization is turning those tokens into numbers. It's doing this by having a really big dictionary of words, and the number representing each word/token is it's index in the dictionary.

For this NLP task, we'll use a pretrained Language Model by Microsoft called Deberta.

Here I used the xsmall model which contains 22 million parameters. I tried using the large (300 million parameters) and base (100+ million parameters) but kept getting CUDA: Out Of Memory

tokz = AutoTokenizer.from_pretrained('microsoft/deberta-v3-xsmall')

Here we used AutoTokenizer from the Transformers library. It allowed us to get a tokenizer from the pretrained model. Here's an example of how a tokenized scentance looks like:

--> tokz.tokenize(df['input'][1])

['▁Game',
 ':',
 '▁Borderlands',
 ';',
 '▁Text',
 ':',
 '▁I',
 '▁am',
 '▁coming',
 '▁to',
 '▁the',
 '▁borders',
 '▁and',
 '▁I',
 '▁will',
 '▁kill',
 '▁you',
 '▁all',
 ',',
 ';']

Now we write a basic tokenization function and map it to the entire dataset.

def tok_func(x): return tokz(x["input"], truncation=True, padding=True, max_length=512)
tok_ds = ds.map(tok_func, batched=True)

This gives us 2 new collumns to work with: input and input_ids
This will show you how tokenization really works:

--> tok_ds[1]["input"], tok_ds[1]["input_ids"][0:10]

('Game: Borderlands; Text: I am coming to the borders and I will kill you all;'
 [3179, 294, 72459, 346, 7655, 294, 273, 481, 882])

Observing an Input alongside its corresponding ID, something becomes clear

Focus on the first word in the input game "Game". And the first number in the input_ids "3179".
Hmmm, now lets look up the vocabulary for the tokenized word "Game"..

--> tokz.vocab[tokz.tokenize("Game")[0]]

3179

I hope you just had your very own Eureka moment.

For the last step before actually training the model, we'll split our data into training and test sets.

--> dds = tok_ds.train_test_split(0.25, seed=42)

DatasetDict({
    train: Dataset({
        features: ['Game', 'labels', 'Text', 'input', '__index_level_0__', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 45840
    })
    test: Dataset({
        features: ['Game', 'labels', 'Text', 'input', '__index_level_0__', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 15281
    })
})

Creating the Training Arguments, and the Trainer

Now's where it gets serious, get ready for alot of jargon!!

batch_size = 16
epochs = 4
lr = 1e-4

arg = TrainingArguments(
    'outputs',
    learning_rate=lr,
    warmup_ratio=0.1,
    lr_scheduler_type='cosine',
    fp16=True,
    evaluation_strategy='epoch',
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size*2,
    num_train_epochs=epochs,
    weight_decay=0.01,
    report_to='none'
)

model = AutoModelForSequenceClassification.from_pretrained('microsoft/deberta-v3-xsmall', num_labels=3)

trainer = Trainer(
    model,
    arg,
    train_dataset=dds['train'].shuffle(seed=42).select(range(5000)),
    eval_dataset=dds['test'].shuffle(seed=42).select(range(1000)),
    tokenizer=tokz
)

trainer.train()

Alot of this is boilerplate code, so to understand what's going on there's 3 layers:

Creating the Training Arguments (Passing in the batch size, learning rate, and number of epoches are the only things you really adjust. The rest is mostly boilerplate)
Creating the Model (Using the Transformers library, we load the Microsoft model into a variable)
Creating the Trainer (Simply pass in the Model variable, Training Arguments, Dataset, and the Tokenizer function we defined earlier)

	Epoch	Training Loss	Validation Loss
1	1	No log	0.859168
2	2	0.894000	0.744544
3	3	0.894000	0.714117
4	4	0.588000	0.779513

Ofcourse, if I had let it train for abit longer it'll have way better performance, but im content with this for experimentational purposes.
Our model got a total Accuracy of 74% :D

For fun, I decided to make an excel sheet, showing the twitter posts alongside my model's prediction and the actual label. Let's have some fun with these!

Alot of times my model was spot on

Sometimes my model was actually smarter that mislabeled data!

I barely understand this twitter user.. But it's so obvious he's damn pissed!

Ofcourse, my model also did some mistakes (awww my poor baby's learning)

But cmon man.. How is THAT positive?!!

Conclusion

NLP is a field of Deep Learning with many uses in almost any field! Infact, if you've heard a little about "ChatGPT" you'll realize that NLP has already taken over your entire life ;]

The "GPT" in ChatGPT itself stands for "Generative Pretrained Transformer".
It's a Pretrained model that Generates text using Transformers (No mom, not the "Bumblebee" transformers)

They all work the same way, with the only difference simply being how the layout of your data (and how many gpus you have..)

Thank you for reading into my nerdy little program, and as a man from CS50 once said:

This Was NLP

lucirie (Ziad Alezzi) · GitHub

lucirie has 27 repositories available. Follow their code on GitHub.

github.com

Answering Some Common Questions In Deep Learning Foundations

Ziad Alezzi — Fri, 15 Aug 2025 08:29:01 +0000

Based On FastAI Book Chapter 4 Questionnare

At the end of every FastAI Chapter, there's a series of questions to encourage further research. Even without taking FastAI, some of these answers may surprise you!

Q1: How is a grayscale image represented on a computer? How about a color image?

A1: A greyscale image, for computer vision models, is represented as a matrix of values ranging from 0 to 255. Each element in the matrix represents a pixel, and it's number shows it's color (pixel intensity value).
The closer to 0, the darker the pixel
The closer to 255, the lighter the pixel

Here's an example:

Can you guess what number that is? ;]

To explain how a color image is represented on a computer, we first need to understand how a color image even works. Any color can be made by mixing shades of Red Green and Blue. That's why a pixel has those three colors. Therefore, each of a color image's pixels is simply a combination of three numbers representing how much Red Green or Blue is in that pixel.

Now how is represented in computer vision? Well, you might have a hint already. It is represented not as a matrix, but as 3 matricies stacked on top of each other. The first matrix, could represent how much red is in the image, the second matrix represents blue, the third representing green. And it is again, with each element of each matrix containing a number from 0-255 for the intensity.

Here's an example:

Q2: Explain how the "pixel similarity" approach to classifying digits works.

In the book, instead of taking the traditional approach to computer vision (Training Data --> Train), a simpler method was used to introduce how computer vision works.

They took hundreds of greyscale images of say the number 3, and for each element they averaged all previous pixel intensities of the same pixel location in every image to construct a new image. This new image is kinda like a "Average" of every other image.
The "clasification" was just comparing the pixel intensities in the input image, to the pixel intensities in the "Average" image.
So there are no parameters, and no optimization.

Q3: What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.

A list comprehension is an efficient and concise way to itterate over data, and append it to an array.
While usually, this would require a couple lines of code, list comprehension is like merging a for loop and append() in one line.
Here's how it looks:

doubled_list = [i*2 for i in numbers_array]

You can add an if statement at the end optionally too.

Q4: What is a "rank-3 tensor"?

A 3D cube.
Rank 1: Vector
Rank 2: Matrix
Rank 3: Cube

Q5: What is the difference between tensor rank and shape? How do you get the rank from the shape?

A tensor rank is simply its dimension. An image, as previously stated, is a matrix of pixel intensities. An image is 2 dimensional (2D) thus it is rank-2
A shape is the number of elements in each axis of the data. In a matrix, that'd be it's columns and rows.
The rank is the number of axis in the shape.

Q6: What are RMSE and L1 norm?

Both of these are methods for testing the accuracy of the model. RMSE (Root Mean Square Error) is very similar to the regular MSE (mean square error) which is the prediction of the model subtracted by the actual label then squared.
(Pred - Label)**2

L1 norm, in math, is the distance or magnitude between two vectors. This is usful for telling how "off" the model is.

Q7: How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?

Using parallel computation (its as it sounds). Which requires a gpu. For this, its best not to use Python loops, and always go for built-in alternatives like in pytorch. Since it'd be implemented with C/CUDA that would use gpu acceleration.

Q8: Create a 3×3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.

import torch

tensor_numbers = torch.tensor([1, 2, 3], 
                              [4, 5, 6],
                              [7, 8, 9])
tensor_numbers * 2
tensor_numbers[-2:, -2:]

[-2:, -2:] means the final 2 rows, and final 2 collomns to select the numbers [5, 6, 8, 9]

Q9: What is broadcasting?

When we did tensor_numbers * 2 onto the 3D matrix, what pytorch did was turn the number 2 into a 3D matrix of twos.

2 --> [2, 2, 2]
      [2, 2, 2]
      [2, 2, 2]

That's broadcasting in a nutshell.

Q10: Are metrics generally calculated using the training set, or the validation set? Why?

Metrics are calculated using the validation set, as to avoid falsely thinking the model would do good on new data when it's actually simply overfitting the training set.

Q11: What is SGD?

Stochiastic Gradient Descent. A very simple optimization method used to update the parameters of a model according to a loss function.

Q12: Why does SGD use mini-batches?

During training, you can use a batch the size of the training set or a size of 1, or in-between.

What's the difference? Well using the entire training set would take very long to do 1 single step of gradient descent. Sometimes an epoch (training on entire dataset) could take weeks or months. This is impractical for any changes you might want to do (changing learning rate, finding mislabeled data, testing different optimization techniques or model architectures) since you'd have to wait a whole month for every test.

Training on a batch the size of 1 (a single training example) would be too little information to properly run gradient descent, resulting in the results being inconsitent and may take longer to diverge (successfully train)

Thus, it's best to go in the middle, and use mini-batches of the training data.

Q13: What are the seven steps in SGD for machine learning?

Initialize Parameters --> Select Mini-Batch --> Compute Predictions --> Compute the Loss --> Compute Gradients --> Update Parameters --> Repeat From Step 2

Q14: How do we initialize the weights in a model?

import pytorch
import pytorn.nn as nn

linear_layer = nn.Linear((shape of model))

Q15: Why can't we always use a high learning rate?

This image will explain:

Q16: Do you need to know how to calculate gradients yourself?

If you're using numpy, yes you will use math. If using pytorch then no, pytorch automatically tracks your gradients which can be activated by using requires_grad=True when initializing a tensor. And then performing some calculation (for example, a loss function) and calling .backwards on the loss function like:

x = torch.tensor(arr, requires_grad=True)
loss = (x - y) ** 2
loss.backwards

You can access the gradients using x.grads and zero the gradients with x.grads.zero_()

Q17: Why can't we use accuracy as a loss function?

Accuracy is showing the perfomance of the model to a human, a loss function is showing the perfomance to a computer.
The loss can go from 0.1 to 0.2 (model getting worse), but the accuracy would still show as 60% (for example)
Thus, the gradients will be zero, and the model will be stuck.

Basically: The loss is a more fine and precise measure of performance

Q18: What does the `DataLoader` class do?

Say you have some data: data = [[1, 2], [4, 5], [6, 7], [8, 9]]
A DataLoader class will give you an easy way to organize this into a dataset for your model. Offering features like mini-batches and shuffling the data.

data_set = DataLoader(data, batch_size=2, shuffle=True)

Q19: Create a function that, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?

Alright so first I wanna show you abit more of how the DataLoader looks like:

data = range(10)
loader = DataLoader(data, batch_size=5)
print(list(loader))

--> [tensor([0, 1, 2, 3, 4]), 
     tensor([5, 6, 7, 8, 9]),]

That's the basic idea of a dataloader. However when training a model, we also need labels for the data (x = data, y = labels)
Therefore we introduce a very basic dataset, which is basically just a collection of tuples:

dataset = L(enumerate(string.ascii_lowercase))
print(dataset)

--> [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g')...

Now passing this into the DataLoader, We get mini batches of these tuples with their label attached

data_loader = DataLoader(dataset, batch_size=3)
print(list(data_loader))

--> [(tensor([0, 1, 2]), ('a', 'b', 'c')), 
     (tensor([3, 4, 5]), ('d', 'e', 'f')), 
     (tensor([6, 7, 8]), ('g', 'h', 'i')),]

What's special about this, is well, we have a functioning paired dataset now.

Q20: What does `view` do in PyTorch?

view allows you to reshape your tensors.

Example:

tensors = torch.arange(6)  # tensor([0, 1, 2, 3, 4, 5])
tensors.shape  # torch.Size([6])

tensors.view(2, 3) # Reshaping to 2 rows and 3 collumns

Output:

tensor([[0, 1, 2],
        [3, 4, 5]])

Q21: Why do we have to zero the gradients?

In pytorch, when your gradients are saved in grad, they are not overwritten but rather accumulate.
Meaning that, the gradients from training session 1, will be added with the gradients from training session 2.

You might ask why? Well it's because your gpu can only handle a limited ammount of space.
Another reason why mini-batches exist, is that sometimes you dont have enough ram/vram to store the entire dataset. Thus you break it down into mini-batches.
However, you can simulate running larger batches if you accumilate all the gradients, and add all gradients from each mini-batch before taking one large training step.

But what if you dont want to simulate multiple larger mini-batches than your gpu can handle? Well in that case after every training step you must zero your gradients using .zero_grad()

Q22: What information do we have to pass to `Learner`?

To create a Learner that allows you to call easy functions like .fit(), you must pass the following information:

Your DataLoader
Your Model Architecture (activation)
Your Optimizer (Gradient Descent)
Your Loss Function
Optionally: A Metric

Here's an example:

learn = Learner(dataloader, nn.Linear(28*28,1), opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)

Each one of these were predefined functions.

We can then easily run:

learn.fit(10, lr=0.01)

10 means how many epochs/training rounds, lr means the learning rate.

Q23: The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?

Wowie, praise the jargon. Lets break down this mouthful.
The Universal Approximation Theorem (how fancy) simply states: "You can solve any problem using just one reaaally big layer and non-linearity"

Basically, instead of having a neural network thats 100 layers, you'd have a neural network with just 1 really big layer with alot of neurons/units, and an output non-linearity (sigmoid, tanh, etc)

Here's the issue with this, it is EXTREMELY inneffecient. To "approximate" (solve the problem), it might need millions of neurons whereas a deeper neural network can solve the same problem with far less neurons.
Especially in Computer Vision, where each layer detects different features, with more complex features the deeper you go.
Here's an example of the different features that a deep neural network with multiple layers can learn:

As you can see, the deeper the neural network goes, the more complex features you can extract.

So while yes, a neural network with 1 absurdly big layer can indeed solve any problem in deep learning, however it is so inneffecient that you're better off just slapping in a couple more layers.

It'll save you time, space, and sanity.

Conclusion

Thank you so much for reading this far into my nerdy litle blog!! At the end of every fastai chapter, there's a questionnare. So I decided to solve it in a blog as maybe it can help teach or simply intruige someone out there!!

As a wise man from CS50 once said: "This Was Deep Learning"

Check out my github, where you'll see many more of my nerdy projects. Like the time I made a Cancer Classification Neural Network from scratch with just Numpy and Math! ;]

lucirie (Ziad Alezzi) · GitHub

lucirie has 27 repositories available. Follow their code on GitHub.

github.com

How Milliseconds Become Hours at Scale

Ziad Alezzi — Sun, 10 Aug 2025 10:40:15 +0000

In the first 30 minutes of Lesson 3 of fast.ai's course, Jeremy Howard introduced a neat graph he has showing how different models vary by accuracy and speed:

With the x-axis being the time taken by the model, and the y-axis being its accuracy, you'd want your model to have a balance trying to be closer to the left on the x-axis and up on the y-axis.

However, I thought: "Why not just use the most accurate?"

If difference in time taken is simply microseconds, why bother considering it that much?

So I conducted an experiment where I trained 3 models on the same data. Only difference is their architecture. The 3 models are:

Control: A balanced model (Levit 256)
Accuracy Only: The highest accuracy, regardless of time (Beit Large Patch)
Speed Only: The fastest model, regardless of accuracy (Levit 128)

My hypothesis is that the Accuracy Only model will have the most positives out of all. And the true test is their ability to scale.

Step 1: The Data

For the training data, I decided to use images of my orange cat "Fluffy" and images of random cats from the internet to make a "Fluffy Classifier".
Here's a sample of the training data:

path = Path('fluffy or not')
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(224, method='squish')]
).dataloaders(path, bs=6)
dls.show_batch(max_n=6)

Step 2: The Models

To test out the models easily without much code, I will use import timm which will allow me to get the full names of these models and a high-level deep-learning library called Fast.ai with from fastai.vision.all import *

With those two, I will be able to easily implement these models with just 2 lines of code.

First, getting the model names with the timm module:

--> timm.list_models('Levit*')

['levit_128',
 'levit_128s',
 'levit_192',
 'levit_256',
 'levit_256d',
 'levit_384',
 'levit_384_s8',
 'levit_512'
 ...]

Then taking the name levit_192 and putting it into a vision learner and training it completing our implementation with just 2 lines of code:

(all models are trained with the same data, for 3 epoches)

control_learn = vision_learner(dls, 'levit_192', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
control_learn.fine_tune(3)

accuracy_learn = vision_learner(dls, 'beit_large_patch16_512', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
accuracy_learn.fine_tune(3)

speed_learn = vision_learner(dls, 'levit_128s', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
speed_learn.fine_tune(3)

Step 3: Comparing and Concluding!!

Now that we've finished training all 3 models, its time to test them!! Here's a couple tests I wanna try:

Compare their accuracy on the dev set
Compare their accuracy on a picture of Fluffy (my cat)
Compare their speed on a picture of Fluffy
Compare their speed when ran 100 times in a row

First up is their accuracy on the dev set!!
While training, Fastai automatically seperated the dataset into a training and development set with 20% going to the dev set as mentioned here in the DataBlock: splitter=RandomSplitter(valid_pct=0.2, seed=42)

The Control Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.615277	1.333486	0.625000	00:05
1	0.548994	0.619186	0.375000	00:05
2	0.515302	0.512149	0.375000	00:05
Final Accuracy	62.5%

The Accuracy Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.133967	0.564495	0.250000	04:18
1	0.146496	0.214864	0.250000	04:00
2	0.149206	0.329258	0.250000	03:58
Final Accuracy	75.0%

The Speed Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.535390	0.778340	0.375000	00:07
1	0.538943	0.307765	0.250000	00:07
2	0.469959	0.237307	0.125000	00:06
Final Accuracy	87.5%

Hmmm wait.. Why is the Accuracy Model which is supposed to outperform the speed model, doing so much worse?
The Accuracy Model took so much vram that I got a OutOfMemoryError: CUDA error. While other models only took ~20 seconds to train, this one took 15 Minutes!!
So.. why is it so much worse?
Well this is something I learned while writing this blog. Picture the BEiT Large Patch model (Accuracy Model) as a jack-of-all-traits that does a large number of complex tasks like Semantic Segmantation. Semantic segmentation is a computer vision technique that assigns a class label to each pixel in an image. For example, self-driving cars use Semantic Segmentation:

Whereas the LeViT model focuses on just on skill: Image Classification (Convolutional model) Thus it outperforms it in this specific task.

So I decided to go for the next best, using the "BEiT Base Patch" model I got these results:

epoch	train_loss	valid_loss	error_rate	time
0	0.248595	0.017972	0.000000	00:05
1	0.163321	0.006299	0.000000	00:05
2	0.120323	0.007245	0.000000	00:05
Final Accuracy	100%

Wow!! 100% accuracy, I'll always be able to find Fluffy this way >:D

Next Im going to test all these models with the same picture of my cat Fluffy, and see:

How long does it take to classify
How confident is the model

Here's the image I timed and used for all 3 models: (What a good table)

Model	Prediction	Confidence	Time (ms)
Control	fluffy	76.20%	65.83
Accuracy	fluffy	90.91%	170.16
Speed	fluffy	93.56%	42.95

Now this is very interesting. What I have to admit is how shocked I was with the Speed Model. I mean, it outperformed the Control and the Accuracy models in every field!! It had the lowest runtime at only 42.95 ms and was most confident at 93.56%
The Accuracy model, while yes it was indeed "Accurate", it took WAY too damn long at 170 ms!! Which is baffling.
The Control model was average all way around.

Now, how would it scale? Lets test it with:

100 users
10,000 users
100,000 users

This is very simple multiplication. And will show you how even the slightest difference in milliseconds can have a way larger effect when scaled.

Model	100 users	10,000 users	100,000 users
Control	6.58 sec	10.97 min	1.83 hr
Accuracy	17.02 sec	28.36 min	4.73 hr
Speed	4.30 sec	7.16 min	1.19 hr

Now you understand why Scalability is such an important matter to consider. Because when you're working on a level close to OpenAI, serving hundreds of thousands of users all at once, you need speed.
This also shows you the negatives of the Accuracy model. Since while ~100 ms difference doesn't seem that bad, when scaled to 100,000 users this makes a difference of 3 hours!! Imagine your users waiting 3 hours just to classify their Fluffy images. Personally, I'd be pissed.

Conclusion:

As a great man at CS50 once said: "This was-" Optimization

From this we can learn that model accuracy is not restricted to how much time it takes while running, but that it depends. You can find a model that takes half as long as others, but does twice as good. As we innovate, and the deep learning community creates newer models, we will get better results with even more optimized models.

Thank you for reading my nerdy little article! I'm currently progressing through the fast.ai course, and wanted to practice a little with a question I had on mind.

Check out my github to see more of my nerdy projects, like the time I made a Neural Network to classify cancer using just Numpy and Math!! >:D

lucirie (Ziad Alezzi) · GitHub

lucirie has 27 repositories available. Follow their code on GitHub.

github.com

DEV Community: Ziad Alezzi

I made an AI that detects landmines

Part 1: The Problem

Part 2: The Solution

Why image models alone aren't enough

In the context on Lebanon

lucirie (Ziad Alezzi) · GitHub

Star Multi-Class Classification Neural Network With Pytorch

Introduction:

Dataset

Here's a quick look at the data:

Sampling the dataset is cool and all, but we wanna see some more important information

Quick Overview

From hundreds of thousand's to single digits, while still preserving the data's structure! Perfect!

Preparation

Dummy Columns

Defining the Neural Network

initializing the coeffs

Updating The Parameters

Crucial step Needed

Sparse Categorical Cross Entropy

Showing the results:

lucirie (Ziad Alezzi) · GitHub

Creating a Twitter Sentiment Analysis NLP Model For Video Games

What is Sentiment Analysis?

Why use Sentiment Analysis?

Writing a Sentiment Model using Transformers

Using a pretrained model

Creating the Training Arguments, and the Trainer

Conclusion

lucirie (Ziad Alezzi) · GitHub

Answering Some Common Questions In Deep Learning Foundations

Q1: How is a grayscale image represented on a computer? How about a color image?

Q2: Explain how the "pixel similarity" approach to classifying digits works.

Q3: What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.

Q4: What is a "rank-3 tensor"?

Q5: What is the difference between tensor rank and shape? How do you get the rank from the shape?

Q6: What are RMSE and L1 norm?

Q7: How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?

Q8: Create a 3×3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.

Q9: What is broadcasting?

Q10: Are metrics generally calculated using the training set, or the validation set? Why?

Q11: What is SGD?

Q12: Why does SGD use mini-batches?

Q13: What are the seven steps in SGD for machine learning?

Q14: How do we initialize the weights in a model?

Q15: Why can't we always use a high learning rate?

Q16: Do you need to know how to calculate gradients yourself?

Q17: Why can't we use accuracy as a loss function?

Q18: What does the DataLoader class do?

Q19: Create a function that, if passed two arguments [1,2,3,4] and 'abcd', returns [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]. What is special about that output data structure?

Q20: What does view do in PyTorch?

Q21: Why do we have to zero the gradients?

Q22: What information do we have to pass to Learner?

Q23: The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?

Conclusion

lucirie (Ziad Alezzi) · GitHub

How Milliseconds Become Hours at Scale

Step 1: The Data

Step 2: The Models

Step 3: Comparing and Concluding!!

Conclusion:

lucirie (Ziad Alezzi) · GitHub

Q18: What does the `DataLoader` class do?

Q19: Create a function that, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?

Q20: What does `view` do in PyTorch?

Q22: What information do we have to pass to `Learner`?