Summary posted by: Sangam SwadiK
Intro to serving models with PyTorch
Machine learning/Deep learning models are rarely deployed across the industry. The polls (Venturebeat, KdNuggets) indicate roughly 80-90 percent of the models developed never make it into production. This could be due to various reasons such as ROI, ineffective leadership, business needs or failure to incorporate MLOps.
As a Data Scientist/ML engineer, one of your responsibilities is to ensure a well designed and functional pipeline. And model deployment is an important part of the pipeline.
This is where the PyTorch ecosystem comes to rescue! PyTorch, TorchServe and many other projects built on PyTorch, can be used from model development until model deployment. There has also been an upward trend in PyTorch due to ease of usage.
Intro to Event
This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production. It will cover post training steps that should be taken to optimize the model such as quantization and TorchScript. It will also walk the user in packaging and serving the model through Facebook's TorchServe.
Video
Resources
- Repo: GitHub Repository
Section Timestamps of Video
- 00:00:00 About session
- 00:00:47 About Data Umbrella
- 00:04:18 Introduction
- 00:05:16 Session agenda
- 00:06:01 Machine learning at Walmart
- 00:12:11 Review of some deep learning concepts
- 00:15:24 BERT: Different architectures
- 00:16:07 Bi-LSTM vs BERT
- 00:21:59 Model inference
- 00:24:21 Load the model
- 00:25:21 Test prediction
- 00:28:01 Inference review(inference time vs accuracy tradeoff)
- 00:29:17 BERT large
- 00:30:03 Distilled-BERT
- 00:33:54 Optimizing model for production
- 00:34:03 Post training optimization: Quantization
- 00:35:50 Types of Quantization
- 00:37:35 Quantization results
- 00:38:23 Post training optimization: Distillation
- 00:39:44 Distillation results
- 00:40:35 Eager execution vs Script mode
- 00:42:02 TorchScript JIT: Tracing vs Scripting
- 00:43:11 TorchScript Timing
- 00:45:21 Optimizing the model(Hands On)
- 00:47:36 Quantization(Hands On)
- 00:52:00 TorchScript(Hands On)
- 00:56:33 Deploying the model
- 00:57:13 Options for deploying Pytorch model
- 00:57:42 Benefits of TorchServe
- 00:59:41 Packaging a model/MAR
- 01:00:00 Pytorch BaseHandler
- 01:03:00 Built in handlers
- 01:04:15 Serving
- 01:05:10 APIs
- 01:05:32 Deploying the Model(Hands On)
- 01:22:11 Lessons Learned
- 01:23:50 Q/A
About the Speakers
Bio
Nidhin Pattaniyil is a Machine Learning Engineer in Walmart Search.
Connect with the Speaker
- Nidhin's LinkedIn: Nidhin Pattaniyil
- Nidhin's GitHub: @npatta01
Top comments (0)