DEV Community

Cover image for How Milliseconds Become Hours at Scale
Ziad Alezzi
Ziad Alezzi

Posted on

How Milliseconds Become Hours at Scale

In the first 30 minutes of Lesson 3 of fast.ai's course, Jeremy Howard introduced a neat graph he has showing how different models vary by accuracy and speed:

Plot of different models varying in accuracy and speed

With the x-axis being the time taken by the model, and the y-axis being its accuracy, you'd want your model to have a balance trying to be closer to the left on the x-axis and up on the y-axis.

However, I thought: "Why not just use the most accurate?"

If difference in time taken is simply microseconds, why bother considering it that much?

So I conducted an experiment where I trained 3 models on the same data. Only difference is their architecture. The 3 models are:

  • Control: A balanced model (Levit 256)
  • Accuracy Only: The highest accuracy, regardless of time (Beit Large Patch)
  • Speed Only: The fastest model, regardless of accuracy (Levit 128)

My hypothesis is that the Accuracy Only model will have the most positives out of all. And the true test is their ability to scale.

Step 1: The Data

For the training data, I decided to use images of my orange cat "Fluffy" and images of random cats from the internet to make a "Fluffy Classifier".
Here's a sample of the training data:

path = Path('fluffy or not')
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(224, method='squish')]
).dataloaders(path, bs=6)
dls.show_batch(max_n=6)
Enter fullscreen mode Exit fullscreen mode

Fluffy vs Not Fluffy Training Images

Step 2: The Models

To test out the models easily without much code, I will use import timm which will allow me to get the full names of these models and a high-level deep-learning library called Fast.ai with from fastai.vision.all import *

With those two, I will be able to easily implement these models with just 2 lines of code.

First, getting the model names with the timm module:

--> timm.list_models('Levit*')

['levit_128',
 'levit_128s',
 'levit_192',
 'levit_256',
 'levit_256d',
 'levit_384',
 'levit_384_s8',
 'levit_512'
 ...]
Enter fullscreen mode Exit fullscreen mode

Then taking the name levit_192 and putting it into a vision learner and training it completing our implementation with just 2 lines of code:

(all models are trained with the same data, for 3 epoches)

control_learn = vision_learner(dls, 'levit_192', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
control_learn.fine_tune(3)
Enter fullscreen mode Exit fullscreen mode
accuracy_learn = vision_learner(dls, 'beit_large_patch16_512', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
accuracy_learn.fine_tune(3)
Enter fullscreen mode Exit fullscreen mode
speed_learn = vision_learner(dls, 'levit_128s', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
speed_learn.fine_tune(3)
Enter fullscreen mode Exit fullscreen mode

Step 3: Comparing and Concluding!!

Now that we've finished training all 3 models, its time to test them!! Here's a couple tests I wanna try:

  1. Compare their accuracy on the dev set
  2. Compare their accuracy on a picture of Fluffy (my cat)
  3. Compare their speed on a picture of Fluffy
  4. Compare their speed when ran 100 times in a row

First up is their accuracy on the dev set!!
While training, Fastai automatically seperated the dataset into a training and development set with 20% going to the dev set as mentioned here in the DataBlock: splitter=RandomSplitter(valid_pct=0.2, seed=42)

The Control Model:

epoch train_loss valid_loss error_rate time
0 0.615277 1.333486 0.625000 00:05
1 0.548994 0.619186 0.375000 00:05
2 0.515302 0.512149 0.375000 00:05
Final Accuracy 62.5%

The Accuracy Model:

epoch train_loss valid_loss error_rate time
0 0.133967 0.564495 0.250000 04:18
1 0.146496 0.214864 0.250000 04:00
2 0.149206 0.329258 0.250000 03:58
Final Accuracy 75.0%

The Speed Model:

epoch train_loss valid_loss error_rate time
0 0.535390 0.778340 0.375000 00:07
1 0.538943 0.307765 0.250000 00:07
2 0.469959 0.237307 0.125000 00:06
Final Accuracy 87.5%

Hmmm wait.. Why is the Accuracy Model which is supposed to outperform the speed model, doing so much worse?
The Accuracy Model took so much vram that I got a OutOfMemoryError: CUDA error. While other models only took ~20 seconds to train, this one took 15 Minutes!!
So.. why is it so much worse?
Well this is something I learned while writing this blog. Picture the BEiT Large Patch model (Accuracy Model) as a jack-of-all-traits that does a large number of complex tasks like Semantic Segmantation. Semantic segmentation is a computer vision technique that assigns a class label to each pixel in an image. For example, self-driving cars use Semantic Segmentation:
Semantic Segmentation Example

Whereas the LeViT model focuses on just on skill: Image Classification (Convolutional model) Thus it outperforms it in this specific task.

So I decided to go for the next best, using the "BEiT Base Patch" model I got these results:

epoch train_loss valid_loss error_rate time
0 0.248595 0.017972 0.000000 00:05
1 0.163321 0.006299 0.000000 00:05
2 0.120323 0.007245 0.000000 00:05
Final Accuracy 100%

Wow!! 100% accuracy, I'll always be able to find Fluffy this way >:D


Next Im going to test all these models with the same picture of my cat Fluffy, and see:

  1. How long does it take to classify
  2. How confident is the model

Here's the image I timed and used for all 3 models: (What a good table)
Fluffy as a table

Model Prediction Confidence Time (ms)
Control fluffy 76.20% 65.83
Accuracy fluffy 90.91% 170.16
Speed fluffy 93.56% 42.95

Now this is very interesting. What I have to admit is how shocked I was with the Speed Model. I mean, it outperformed the Control and the Accuracy models in every field!! It had the lowest runtime at only 42.95 ms and was most confident at 93.56%
The Accuracy model, while yes it was indeed "Accurate", it took WAY too damn long at 170 ms!! Which is baffling.
The Control model was average all way around.


Now, how would it scale? Lets test it with:

  1. 100 users
  2. 10,000 users
  3. 100,000 users

This is very simple multiplication. And will show you how even the slightest difference in milliseconds can have a way larger effect when scaled.

Model 100 users 10,000 users 100,000 users
Control 6.58 sec 10.97 min 1.83 hr
Accuracy 17.02 sec 28.36 min 4.73 hr
Speed 4.30 sec 7.16 min 1.19 hr

Now you understand why Scalability is such an important matter to consider. Because when you're working on a level close to OpenAI, serving hundreds of thousands of users all at once, you need speed.
This also shows you the negatives of the Accuracy model. Since while ~100 ms difference doesn't seem that bad, when scaled to 100,000 users this makes a difference of 3 hours!! Imagine your users waiting 3 hours just to classify their Fluffy images. Personally, I'd be pissed.


Conclusion:

As a great man at CS50 once said: "This was Optimization"

From this we can learn that model accuracy is not restricted to how much time it takes while running, but that it depends. You can find a model that takes half as long as others, but does twice as good. As we innovate, and the deep learning community creates newer models, we will get better results with even more optimized models.

Thank you for reading my nerdy little article! I'm currently progressing through the fast.ai course, and wanted to practice a little with a question I had on mind.

Check out my github to see more of my nerdy projects, like the time I made a Neural Network to classify cancer using just Numpy and Math!! >:D

lucirie (Ziad Alezzi) · GitHub

lucirie has 16 repositories available. Follow their code on GitHub.

favicon github.com

Top comments (0)