DEV Community

Cover image for I learned about Microsoft's Hummingbird library today. 1000x performance??
Haseeb Mohammed
Haseeb Mohammed

Posted on

I learned about Microsoft's Hummingbird library today. 1000x performance??

tl;dr Hummingbird is fast, but I didn't see the 1000x claim they posted today in MS Ignite. That too with their sample code and dataset (only slightly modified).

My Colab notebook for your cloning purposes.

--- BEGIN

1000x Faster!!!11!oneone!

This scrolled past today during MS Ignite (which you should totally be watching).

It seemed too good to be true (spoilers spoiled), and I wanted to try it out ASAP.

I took their sample code from Github and tweaked it to spit out times for each model's prediction, as well as increase the number of rows to 5 million. I used Google's Colab and selected GPU for my hardware accelerator. This gives an option to run code on GPU, not that all computations will happen on the GPU.

Snippet of code below, you can find the entire code on the link above to my notebook.

start_time = time.time()
# Use SKLearn to convert the model to PyTorch
skl_model.predict(X)
print("--- %s seconds to predict on CPU w/o HB ---" % (time.time() - start_time))

# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')

start_time = time.time()
# Run predictions on CPU
model.predict(X)
print("--- %s seconds to predict on CPU w/ HB ---" % (time.time() - start_time))

start_time = time.time()
# Run predictions on GPU
model.to('cuda')
model.predict(X)
print("--- %s seconds to predict on GPU w/ HB ---" % (time.time() - start_time))
Enter fullscreen mode Exit fullscreen mode
--- 3.707867383956909 seconds to predict on CPU w/o HB ---
--- 8.640001058578491 seconds to predict on CPU w/ HB ---
--- 0.5877633094787598 seconds to predict on GPU w/ HB ---
Enter fullscreen mode Exit fullscreen mode

Hummingbird converts SKLearn's trained RandomForestClassifier model to a PyTorch neural network (NN). When you run the regular ML model for 5million predictions on a CPU, it runs in 3.7 seconds. When you run the NN on 5million predictions on a CPU it took 8.6 seconds. On a GPU the NN took .06 seconds for the 5 million predictions.

Could anyone help me understand where I went wrong? Under what scenario will I get the claimed 5x on CPU and 1000x on GPU?

I'm still amazed it went from 3.7 to .6 seconds (6x?), but that's still far from 1000x.

I'll try out a couple more datasets and see if I can't reproduce the claim.

Top comments (1)

Collapse
 
interesaaat profile image
Matteo Interlandi

Hi Hasseb, thanks for trying Hummingbird! You did everything correctly, for trees we don't get 1000x speedup. We saw 1000x speedup when we translate end to end ML model pipelines. I suggest you to check the paper, we have more information there.

We are also planning to open source the scripts we used for the experiments in the paper, so that it will be easier for everyone to reproduce our results!

We welcome contributions and feedbacks!