In the first 30 minutes of Lesson 3 of fast.ai's course, Jeremy Howard introduced a neat graph he has showing how different models vary by accuracy and speed:
With the x-axis being the time taken by the model, and the y-axis being its accuracy, you'd want your model to have a balance trying to be closer to the left on the x-axis and up on the y-axis.
However, I thought: "Why not just use the most accurate?"
If difference in time taken is simply microseconds, why bother considering it that much?
So I conducted an experiment where I trained 3 models on the same data. Only difference is their architecture. The 3 models are:
- Control: A balanced model (Levit 256)
- Accuracy Only: The highest accuracy, regardless of time (Beit Large Patch)
- Speed Only: The fastest model, regardless of accuracy (Levit 128)
My hypothesis is that the Accuracy Only model will have the most positives out of all. And the true test is their ability to scale.
Step 1: The Data
For the training data, I decided to use images of my orange cat "Fluffy" and images of random cats from the internet to make a "Fluffy Classifier".
Here's a sample of the training data:
path = Path('fluffy or not')
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(224, method='squish')]
).dataloaders(path, bs=6)
dls.show_batch(max_n=6)
Step 2: The Models
To test out the models easily without much code, I will use import timm
which will allow me to get the full names of these models and a high-level deep-learning library called Fast.ai with from fastai.vision.all import *
With those two, I will be able to easily implement these models with just 2 lines of code.
First, getting the model names with the timm
module:
--> timm.list_models('Levit*')
['levit_128',
'levit_128s',
'levit_192',
'levit_256',
'levit_256d',
'levit_384',
'levit_384_s8',
'levit_512'
...]
Then taking the name levit_192
and putting it into a vision learner and training it completing our implementation with just 2 lines of code:
(all models are trained with the same data, for 3 epoches)
control_learn = vision_learner(dls, 'levit_192', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
control_learn.fine_tune(3)
accuracy_learn = vision_learner(dls, 'beit_large_patch16_512', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
accuracy_learn.fine_tune(3)
speed_learn = vision_learner(dls, 'levit_128s', metrics=error_rate, pretrained=True, pretrained_cfg=None, img_size=224)
speed_learn.fine_tune(3)
Step 3: Comparing and Concluding!!
Now that we've finished training all 3 models, its time to test them!! Here's a couple tests I wanna try:
- Compare their accuracy on the dev set
- Compare their accuracy on a picture of Fluffy (my cat)
- Compare their speed on a picture of Fluffy
- Compare their speed when ran 100 times in a row
First up is their accuracy on the dev set!!
While training, Fastai automatically seperated the dataset into a training and development set with 20% going to the dev set as mentioned here in the DataBlock: splitter=RandomSplitter(valid_pct=0.2, seed=42)
The Control Model:
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.615277 | 1.333486 | 0.625000 | 00:05 |
1 | 0.548994 | 0.619186 | 0.375000 | 00:05 |
2 | 0.515302 | 0.512149 | 0.375000 | 00:05 |
Final Accuracy | 62.5% |
The Accuracy Model:
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.133967 | 0.564495 | 0.250000 | 04:18 |
1 | 0.146496 | 0.214864 | 0.250000 | 04:00 |
2 | 0.149206 | 0.329258 | 0.250000 | 03:58 |
Final Accuracy | 75.0% |
The Speed Model:
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.535390 | 0.778340 | 0.375000 | 00:07 |
1 | 0.538943 | 0.307765 | 0.250000 | 00:07 |
2 | 0.469959 | 0.237307 | 0.125000 | 00:06 |
Final Accuracy | 87.5% |
Hmmm wait.. Why is the Accuracy Model which is supposed to outperform the speed model, doing so much worse?
The Accuracy Model took so much vram that I got a OutOfMemoryError: CUDA
error. While other models only took ~20 seconds to train, this one took 15 Minutes!!
So.. why is it so much worse?
Well this is something I learned while writing this blog. Picture the BEiT Large Patch model (Accuracy Model) as a jack-of-all-traits that does a large number of complex tasks like Semantic Segmantation. Semantic segmentation is a computer vision technique that assigns a class label to each pixel in an image. For example, self-driving cars use Semantic Segmentation:
Whereas the LeViT model focuses on just on skill: Image Classification (Convolutional model) Thus it outperforms it in this specific task.
So I decided to go for the next best, using the "BEiT Base Patch" model I got these results:
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.248595 | 0.017972 | 0.000000 | 00:05 |
1 | 0.163321 | 0.006299 | 0.000000 | 00:05 |
2 | 0.120323 | 0.007245 | 0.000000 | 00:05 |
Final Accuracy | 100% |
Wow!! 100% accuracy, I'll always be able to find Fluffy this way >:D
Next Im going to test all these models with the same picture of my cat Fluffy, and see:
- How long does it take to classify
- How confident is the model
Here's the image I timed and used for all 3 models: (What a good table)
Model | Prediction | Confidence | Time (ms) |
---|---|---|---|
Control | fluffy | 76.20% | 65.83 |
Accuracy | fluffy | 90.91% | 170.16 |
Speed | fluffy | 93.56% | 42.95 |
Now this is very interesting. What I have to admit is how shocked I was with the Speed Model. I mean, it outperformed the Control and the Accuracy models in every field!! It had the lowest runtime at only 42.95 ms and was most confident at 93.56%
The Accuracy model, while yes it was indeed "Accurate", it took WAY too damn long at 170 ms!! Which is baffling.
The Control model was average all way around.
Now, how would it scale? Lets test it with:
- 100 users
- 10,000 users
- 100,000 users
This is very simple multiplication. And will show you how even the slightest difference in milliseconds can have a way larger effect when scaled.
Model | 100 users | 10,000 users | 100,000 users |
---|---|---|---|
Control | 6.58 sec | 10.97 min | 1.83 hr |
Accuracy | 17.02 sec | 28.36 min | 4.73 hr |
Speed | 4.30 sec | 7.16 min | 1.19 hr |
Now you understand why Scalability is such an important matter to consider. Because when you're working on a level close to OpenAI, serving hundreds of thousands of users all at once, you need speed.
This also shows you the negatives of the Accuracy model. Since while ~100 ms difference doesn't seem that bad, when scaled to 100,000 users this makes a difference of 3 hours!! Imagine your users waiting 3 hours just to classify their Fluffy images. Personally, I'd be pissed.
Conclusion:
As a great man at CS50 once said: "This was Optimization"
From this we can learn that model accuracy is not restricted to how much time it takes while running, but that it depends. You can find a model that takes half as long as others, but does twice as good. As we innovate, and the deep learning community creates newer models, we will get better results with even more optimized models.
Thank you for reading my nerdy little article! I'm currently progressing through the fast.ai course, and wanted to practice a little with a question I had on mind.
Check out my github to see more of my nerdy projects, like the time I made a Neural Network to classify cancer using just Numpy and Math!! >:D
Top comments (0)