DEV Community

Cover image for From Jupyter Notebook to Production: Complete Guide to ML Model Deployment
Aarav Joshi
Aarav Joshi

Posted on

From Jupyter Notebook to Production: Complete Guide to ML Model Deployment

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

When I first started working with machine learning, I thought building a accurate model was the hardest part. I spent weeks tuning algorithms and celebrating high accuracy scores. Then I tried to put my model into a real application, and everything fell apart. The model that worked perfectly on my laptop crashed when others tried to use it. Predictions were slow, and I had no idea why the performance dropped over time. That's when I discovered the importance of proper deployment techniques. Getting a model from a Jupyter notebook to a production system requires a different set of skills. I want to share the methods that helped me turn my experimental code into reliable services.

Saving your trained model is the first step toward deployment. Without this, you would need to retrain the model every time you want to make a prediction, which is impractical. I use joblib for this because it handles large numerical arrays efficiently. Pickle is another option, but joblib is optimized for scikit-learn models. Here is a simple way to save and load a model.

import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Save the model to a file
joblib.dump(model, 'random_forest_model.joblib')

# Later, load the model and make predictions
loaded_model = joblib.load('random_forest_model.joblib')
predictions = loaded_model.predict(X_test)
print(f"Predictions: {predictions[:5]}")  # Show first 5 predictions
Enter fullscreen mode Exit fullscreen mode

I remember once I lost a model because I didn't save it properly after training. I had to rerun a days worth of computation. Now, I always save models immediately after training. This habit has saved me countless hours.

Containers solve the problem of inconsistent environments. Your model might work on your machine but fail on a server due to different library versions. Docker packages your code, model, and dependencies into a single unit that runs anywhere. I started using Docker after a deployment failed because the production server had an older version of NumPy.

Here is a basic Dockerfile for a machine learning application. It sets up a Python environment, installs dependencies, and runs a simple web app.

# Dockerfile
FROM python:3.9-slim

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model and application code
COPY model.joblib .
COPY app.py .

# Expose port and run the application
EXPOSE 8000
CMD ["python", "app.py"]
Enter fullscreen mode Exit fullscreen mode

The requirements.txt file might look like this:

# requirements.txt
scikit-learn==1.0.2
joblib==1.1.0
fastapi==0.68.0
uvicorn==0.15.0
Enter fullscreen mode Exit fullscreen mode

Building and running the container is straightforward with Docker commands. This consistency means I can develop locally and deploy to cloud servers without worrying about environment differences.

Creating a web API allows other applications to use your model. FastAPI is my go-to framework because it's fast and provides automatic documentation. I built my first API with Flask, but switching to FastAPI reduced boilerplate code and improved performance.

Here is a complete example of a FastAPI app that serves predictions.

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Model API", version="1.0")

# Define request model
class PredictionRequest(BaseModel):
    features: list[float]

# Load model at startup
model = joblib.load('random_forest_model.joblib')

@app.post("/predict")
def predict(request: PredictionRequest):
    try:
        # Convert features to numpy array and reshape for prediction
        features_array = np.array(request.features).reshape(1, -1)
        prediction = model.predict(features_array)
        return {"prediction": int(prediction[0])}
    except Exception as e:
        return {"error": str(e)}

@app.get("/")
def read_root():
    return {"message": "ML Model API is running"}

# Run with: uvicorn app:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

I once deployed an API without proper error handling, and it crashed when users sent malformed data. Now, I always include try-except blocks to return helpful errors.

Monitoring your model in production is crucial. Models can degrade over time as data changes. I set up monitoring to track prediction latency and accuracy. Early on, I missed a performance drop that cost our team significant resources. Now, I use custom decorators to log metrics.

Here is a Python decorator that measures prediction time and logs it.

import time
import logging
from functools import wraps

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitor_performance(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start_time
        logger.info(f"Function {func.__name__} took {duration:.3f} seconds")
        # In practice, send this to a monitoring system like Prometheus
        return result
    return wrapper

@monitor_performance
def make_prediction(features):
    # Simulate model prediction
    model = joblib.load('random_forest_model.joblib')
    return model.predict([features])

# Example usage
features = [0.5, -0.2, 0.1, 0.7, -0.5, 0.3, -0.1, 0.4, -0.3, 0.6, 0.2, -0.4, 0.8, -0.6, 0.9, -0.7, 0.1, -0.8, 0.3, -0.9]
prediction = make_prediction(features)
print(f"Prediction: {prediction}")
Enter fullscreen mode Exit fullscreen mode

I integrate this with tools like Prometheus for alerting when latency exceeds thresholds.

Keeping track of model versions prevents confusion and ensures reproducibility. I use MLflow to log experiments, parameters, and models. Once, I couldn't reproduce a model that performed well in testing because I didn't record the hyperparameters. MLflow solved that problem.

Here is how I use MLflow in a training script.

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Set up MLflow
mlflow.set_experiment("my_experiment")

with mlflow.start_run():
    # Log parameters
    n_estimators = 100
    mlflow.log_param("n_estimators", n_estimators)

    # Generate and split data
    X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)

    # Evaluate and log metrics
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    mlflow.log_metric("accuracy", accuracy)

    # Log model
    mlflow.sklearn.log_model(model, "model")

    print(f"Logged model with accuracy: {accuracy}")

# To load a model later
model_uri = "runs:/<run_id>/model"  # Replace <run_id> with actual run ID
loaded_model = mlflow.pyfunc.load_model(model_uri)
Enter fullscreen mode Exit fullscreen mode

MLflow's UI lets me compare runs and choose the best model for deployment.

Automating deployment saves time and reduces errors. I use GitHub Actions to trigger model training and deployment when code changes. My first manual deployments were error-prone; automation made the process reliable.

Here is a GitHub Actions workflow that trains, validates, and deploys a model.

# .github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  push:
    branches: [ main ]

jobs:
  train-and-deploy:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'

    - name: Install dependencies
      run: |
        pip install -r requirements.txt

    - name: Train model
      run: python train.py

    - name: Validate model
      run: python validate.py

    - name: Deploy if improved
      run: |
        if [ -f "model_improved" ]; then
          python deploy.py
        else
          echo "Model did not improve, skipping deployment"
        fi
Enter fullscreen mode Exit fullscreen mode

The train.py script might include MLflow logging, and validate.py checks if the new model is better than the current one. If so, it deploys.

Handling many users requires scalable infrastructure. Kubernetes manages multiple instances of your application. I learned this when a sudden traffic spike overwhelmed my single server. Kubernetes automatically scales replicas based on load.

Here is a Kubernetes deployment configuration for a model service.

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: model-app
  template:
    metadata:
      labels:
        app: model-app
    spec:
      containers:
      - name: model-container
        image: your-username/model-app:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: model-service
spec:
  selector:
    app: model-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
  type: LoadBalancer
Enter fullscreen mode Exit fullscreen mode

I use kubectl to apply this configuration, and Kubernetes handles the rest. It ensures that if one pod fails, others take over.

Testing new models before full rollout avoids bad updates. A/B testing routes traffic between model versions. I once deployed a model that seemed better in tests but hurt user experience. Now, I always test with a small group first.

Here is how I implement A/B testing using feature flags with the Unleash client.

from unleash import UnleashClient
import joblib

# Initialize Unleash client
client = UnleashClient(url="http://unleash-server:4242/api", app_name="ml-service")
client.initialize_client()

def get_model_for_user(user_id):
    if client.is_enabled("new-model-rollout", context={"userId": user_id}):
        # Load new model version
        return joblib.load('model_v2.joblib')
    else:
        # Load old model version
        return joblib.load('model_v1.joblib')

# Example usage
user_id = "user123"
model = get_model_for_user(user_id)
features = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, -0.1, -0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.8, -0.9, -1.0]
prediction = model.predict([features])
print(f"Prediction for user {user_id}: {prediction[0]}")
Enter fullscreen mode Exit fullscreen mode

I gradually increase traffic to the new model while monitoring key metrics. If performance drops, I roll back easily.

These techniques form a foundation for robust machine learning operations. I combine them based on project needs. For high-throughput applications, I focus on scalable serving and monitoring. For iterative projects, version control and continuous deployment are key. Every deployment teaches me something new. Start with serialization and APIs, then add complexity as needed. The goal is to make your models reliable and valuable in production.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)