Ziad Alezzi

Posted on Aug 10 • Edited on Nov 16

How Milliseconds Become Hours at Scale

#deeplearning #machinelearning #productivity

In the first 30 minutes of Lesson 3 of fast.ai's course, Jeremy Howard introduced a neat graph he has showing how different models vary by accuracy and speed:

With the x-axis being the time taken by the model, and the y-axis being its accuracy, you'd want your model to have a balance trying to be closer to the left on the x-axis and up on the y-axis.

However, I thought: "Why not just use the most accurate?"

If difference in time taken is simply microseconds, why bother considering it that much?

So I conducted an experiment where I trained 3 models on the same data. Only difference is their architecture. The 3 models are:

Control: A balanced model (Levit 256)
Accuracy Only: The highest accuracy, regardless of time (Beit Large Patch)
Speed Only: The fastest model, regardless of accuracy (Levit 128)

My hypothesis is that the Accuracy Only model will have the most positives out of all. And the true test is their ability to scale.

Step 1: The Data

For the training data, I decided to use images of my orange cat "Fluffy" and images of random cats from the internet to make a "Fluffy Classifier".
Here's a sample of the training data:

path = Path('fluffy or not')
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(224, method='squish')]
).dataloaders(path, bs=6)
dls.show_batch(max_n=6)

Step 2: The Models

To test out the models easily without much code, I will use import timm which will allow me to get the full names of these models and a high-level deep-learning library called Fast.ai with from fastai.vision.all import *

With those two, I will be able to easily implement these models with just 2 lines of code.

First, getting the model names with the timm module:

--> timm.list_models('Levit*')

['levit_128',
 'levit_128s',
 'levit_192',
 'levit_256',
 'levit_256d',
 'levit_384',
 'levit_384_s8',
 'levit_512'
 ...]

Then taking the name levit_192 and putting it into a vision learner and training it completing our implementation with just 2 lines of code:

(all models are trained with the same data, for 3 epoches)

control_learn = vision_learner(dls, 'levit_192', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
control_learn.fine_tune(3)

accuracy_learn = vision_learner(dls, 'beit_large_patch16_512', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
accuracy_learn.fine_tune(3)

speed_learn = vision_learner(dls, 'levit_128s', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
speed_learn.fine_tune(3)

Step 3: Comparing and Concluding!!

Now that we've finished training all 3 models, its time to test them!! Here's a couple tests I wanna try:

Compare their accuracy on the dev set
Compare their accuracy on a picture of Fluffy (my cat)
Compare their speed on a picture of Fluffy
Compare their speed when ran 100 times in a row

First up is their accuracy on the dev set!!
While training, Fastai automatically seperated the dataset into a training and development set with 20% going to the dev set as mentioned here in the DataBlock: splitter=RandomSplitter(valid_pct=0.2, seed=42)

The Control Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.615277	1.333486	0.625000	00:05
1	0.548994	0.619186	0.375000	00:05
2	0.515302	0.512149	0.375000	00:05
Final Accuracy	62.5%

The Accuracy Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.133967	0.564495	0.250000	04:18
1	0.146496	0.214864	0.250000	04:00
2	0.149206	0.329258	0.250000	03:58
Final Accuracy	75.0%

The Speed Model:

epoch	train_loss	valid_loss	error_rate	time
0	0.535390	0.778340	0.375000	00:07
1	0.538943	0.307765	0.250000	00:07
2	0.469959	0.237307	0.125000	00:06
Final Accuracy	87.5%

Hmmm wait.. Why is the Accuracy Model which is supposed to outperform the speed model, doing so much worse?
The Accuracy Model took so much vram that I got a OutOfMemoryError: CUDA error. While other models only took ~20 seconds to train, this one took 15 Minutes!!
So.. why is it so much worse?
Well this is something I learned while writing this blog. Picture the BEiT Large Patch model (Accuracy Model) as a jack-of-all-traits that does a large number of complex tasks like Semantic Segmantation. Semantic segmentation is a computer vision technique that assigns a class label to each pixel in an image. For example, self-driving cars use Semantic Segmentation:

Whereas the LeViT model focuses on just on skill: Image Classification (Convolutional model) Thus it outperforms it in this specific task.

So I decided to go for the next best, using the "BEiT Base Patch" model I got these results:

epoch	train_loss	valid_loss	error_rate	time
0	0.248595	0.017972	0.000000	00:05
1	0.163321	0.006299	0.000000	00:05
2	0.120323	0.007245	0.000000	00:05
Final Accuracy	100%

Wow!! 100% accuracy, I'll always be able to find Fluffy this way >:D

Next Im going to test all these models with the same picture of my cat Fluffy, and see:

How long does it take to classify
How confident is the model

Here's the image I timed and used for all 3 models: (What a good table)

Model	Prediction	Confidence	Time (ms)
Control	fluffy	76.20%	65.83
Accuracy	fluffy	90.91%	170.16
Speed	fluffy	93.56%	42.95

Now this is very interesting. What I have to admit is how shocked I was with the Speed Model. I mean, it outperformed the Control and the Accuracy models in every field!! It had the lowest runtime at only 42.95 ms and was most confident at 93.56%
The Accuracy model, while yes it was indeed "Accurate", it took WAY too damn long at 170 ms!! Which is baffling.
The Control model was average all way around.

Now, how would it scale? Lets test it with:

100 users
10,000 users
100,000 users

This is very simple multiplication. And will show you how even the slightest difference in milliseconds can have a way larger effect when scaled.

Model	100 users	10,000 users	100,000 users
Control	6.58 sec	10.97 min	1.83 hr
Accuracy	17.02 sec	28.36 min	4.73 hr
Speed	4.30 sec	7.16 min	1.19 hr

Now you understand why Scalability is such an important matter to consider. Because when you're working on a level close to OpenAI, serving hundreds of thousands of users all at once, you need speed.
This also shows you the negatives of the Accuracy model. Since while ~100 ms difference doesn't seem that bad, when scaled to 100,000 users this makes a difference of 3 hours!! Imagine your users waiting 3 hours just to classify their Fluffy images. Personally, I'd be pissed.

Conclusion:

As a great man at CS50 once said: "This was-" Optimization

From this we can learn that model accuracy is not restricted to how much time it takes while running, but that it depends. You can find a model that takes half as long as others, but does twice as good. As we innovate, and the deep learning community creates newer models, we will get better results with even more optimized models.

Thank you for reading my nerdy little article! I'm currently progressing through the fast.ai course, and wanted to practice a little with a question I had on mind.

Check out my github to see more of my nerdy projects, like the time I made a Neural Network to classify cancer using just Numpy and Math!! >:D