Serving PyTorch Models in Production

#deeplearning #pytorch #deployment #machinelearning

Intro to serving models with PyTorch

Machine learning/Deep learning models are rarely deployed across the industry. The polls (Venturebeat, KdNuggets) indicate roughly 80-90 percent of the models developed never make it into production. This could be due to various reasons such as ROI, ineffective leadership, business needs or failure to incorporate MLOps.

As a Data Scientist/ML engineer, one of your responsibilities is to ensure a well designed and functional pipeline. And model deployment is an important part of the pipeline.

This is where the PyTorch ecosystem comes to rescue! PyTorch, TorchServe and many other projects built on PyTorch, can be used from model development until model deployment. There has also been an upward trend in PyTorch due to ease of usage.

Intro to Event

This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production. It will cover post training steps that should be taken to optimize the model such as quantization and TorchScript. It will also walk the user in packaging and serving the model through Facebook's TorchServe.

Video

Resources

Repo: GitHub Repository

Section Timestamps of Video

00:00:00 About session
00:00:47 About Data Umbrella
00:04:18 Introduction
00:05:16 Session agenda
00:06:01 Machine learning at Walmart
00:12:11 Review of some deep learning concepts
00:15:24 BERT: Different architectures
00:16:07 Bi-LSTM vs BERT
00:21:59 Model inference
00:24:21 Load the model
00:25:21 Test prediction
00:28:01 Inference review(inference time vs accuracy tradeoff)
00:29:17 BERT large
00:30:03 Distilled-BERT
00:33:54 Optimizing model for production
00:34:03 Post training optimization: Quantization
00:35:50 Types of Quantization
00:37:35 Quantization results
00:38:23 Post training optimization: Distillation
00:39:44 Distillation results
00:40:35 Eager execution vs Script mode
00:42:02 TorchScript JIT: Tracing vs Scripting
00:43:11 TorchScript Timing
00:45:21 Optimizing the model(Hands On)
00:47:36 Quantization(Hands On)
00:52:00 TorchScript(Hands On)
00:56:33 Deploying the model
00:57:13 Options for deploying Pytorch model
00:57:42 Benefits of TorchServe
00:59:41 Packaging a model/MAR
01:00:00 Pytorch BaseHandler
01:03:00 Built in handlers
01:04:15 Serving
01:05:10 APIs
01:05:32 Deploying the Model(Hands On)
01:22:11 Lessons Learned
01:23:50 Q/A

DEV Community

Serving PyTorch Models in Production

Intro to serving models with PyTorch

Intro to Event

Video

Resources

Section Timestamps of Video

About the Speakers

Bio

Connect with the Speaker

Key Links

Top comments (0)

Read next

From Data to Decisions: How Machine Learning Works in 2025

New Adam Modification Unlocks Optimal Convergence for Any Beta2 Value

Claude 3.5 AI Assistant Achieves 87% Success Rate in Computer Interface Navigation Study

ViT Enhancements for Abstract Visual Reasoning: 2D Positions and Objects