Ajor

Posted on Dec 23

Deploying Your AI/ML Models: A Practical Guide from Training to Production

#ai #fastapi #computervision #deeplearning

In the fast-evolving world of AI and machine learning, training models is just the beginning. To make your research impactful, you need to deploy them so others can interact with your work—whether it's for real-world applications, demos, or further experimentation. This tutorial focuses on a streamlined workflow for deploying ML/deep learning models to the cloud, wrapped in a user-friendly API. We'll keep things general so you can apply this to any AI/ML project, but I'll use my own computer vision research on fish species classification as a concrete example.

The process breaks down into these key steps:

Train your model on a platform like Kaggle.
Download the trained model.
Wrap the model in a FastAPI application for API access.
Dockerize the app for easy portability.
Deploy to the cloud.

By the end, you'll have a production-ready setup that lets users interact with your model via simple HTTP requests. Let's dive in!

Why This Workflow?

Training on Kaggle: Free GPUs/TPUs, easy dataset management, and version control for notebooks.
FastAPI: Modern, fast, and auto-generates interactive docs (Swagger UI) for your API.
Docker: Ensures consistency across environments.
Cloud Deployment: Scalable, accessible from anywhere.

This setup is ideal for researchers: Train once, deploy anywhere, and focus on innovation rather than infrastructure.

Diagram: High-Level Workflow

Here's a simple flowchart of the process:

Step 1: Training Your Model on Kaggle

Kaggle is perfect for training, especially for compute-intensive tasks like computer vision. It handles datasets, notebooks, and hardware for free.
General Steps:

Upload or Find a Dataset: Go to Kaggle Datasets and upload your data or use an existing one. Ensure it's structured (e.g., images in folders by class for classification).
Create a Notebook: Start a new notebook in Kaggle Notebooks.
Install Libraries: Use !pip install for any extras (e.g., timm for transformers).
Load and Preprocess Data: Use libraries like PyTorch or TensorFlow.
Train Models: Experiment with architectures. Save the best model using torch.save or similar.
Version and Run: Commit your notebook to save versions. Run it with GPU/TPU acceleration.
Output Artifacts: Save models, weights, and metadata (e.g., class labels as JSON).

Example: Fish Species Classification

For my research, I used a dataset of fish images (e.g., from Kaggle's "Fish Species" datasets). I trained computer vision models like DeiT (Data-efficient Image Transformer), ViT (Vision Transformer), and VGG16.

Dataset: ~12,000 images across 32 fish species
Models Trained:

DeiT: Lightweight transformer, great for efficiency.
ViT: Standard vision transformer for baseline.
VGG16: CNN classic for comparison.

Step 2: Downloading the Trained Model

Once training is done:

In your Kaggle notebook, commit and run to generate outputs.
Go to the notebook's "Output" tab and download the files (e.g., model weights and JSON).

For sharing large files, upload to Google Drive (as in my example) or GitHub.

In my case, I uploaded deit_tiny_fish_model.pth to Google Drive and used its ID for downloading in the API.

Step 3: Wrapping the Model with FastAPI

Now, turn your model into an API. FastAPI is async, type-safe, and includes auto-docs.
General Steps:

Set Up Project: Create a folder, install deps: pip install fastapi uvicorn torch timm pillow gdown.
Load Model: Handle device (CPU/GPU), download if needed.
Preprocess Inputs: Match training transforms.
Create Endpoints: /predict for inference, health checks, etc.
Add CORS: For web apps.
Run Locally: uvicorn main:app --reload.

Example Code: Generalizable API Wrapper
Here's the full code from my fish classifier, adapted to be general (replace "fish" with your domain, adjust classes/models).

import os
import io
import json
import numpy as np
import torch
import torch.nn as nn
from PIL import Image
from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
import timm
import gdown

# Define device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Model configuration (customize these)
NUM_CLASSES = 32  # Your number of classes
MODEL_PATH = os.path.join(os.path.dirname(__file__), "your_model.pth")  # e.g., deit_tiny_fish_model.pth
CLASSES_PATH = os.path.join(os.path.dirname(__file__), "classes.json")

# Download model if not exists (e.g., from Google Drive)
if not os.path.exists(MODEL_PATH):
    print("Downloading model...")
    gdown.download(id="YOUR_GOOGLE_DRIVE_ID", output=MODEL_PATH, quiet=False)

# Load class names
try:
    with open(CLASSES_PATH, "r") as f:
        class_names = json.load(f)
except Exception:
    class_names = ["class1", "class2"]  # Fallback

# Initialize model (customize for your architecture)
def create_model():
    model = timm.create_model('deit_tiny_distilled_patch16_224', pretrained=False)
    model.head = nn.Linear(model.head.in_features, NUM_CLASSES)
    model.head_dist = nn.Linear(model.head_dist.in_features, NUM_CLASSES)
    return model

# Load model
model = create_model()
model.load_state_dict(torch.load(MODEL_PATH, map_location=device, weights_only=False))
model.to(device)
model.eval()

# Preprocessing (match your training)
transform = Compose([
    Resize(224),
    CenterCrop(224),
    ToTensor(),
    Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])

# FastAPI app
app = FastAPI(title="Your ML Model API", description="API for your AI/ML model")

# CORS
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"])

# Response model
class Prediction(BaseModel):
    filename: str
    predicted_class: str
    confidence: float
    top_3_predictions: list

# Preprocess function
def preprocess_image(image: Image.Image) -> torch.Tensor:
    image = image.convert("RGB")
    return transform(image).unsqueeze(0).to(device)

# Health check
@app.get("/")
async def root():
    return {"message": "API is running", "classes": len(class_names)}

# Prediction endpoint
@app.post("/predict", response_model=Prediction)
async def predict(file: UploadFile = File(...)):
    try:
        if not file.content_type.startswith('image/'):
            raise HTTPException(400, "Must be an image")
        image_data = await file.read()
        image = Image.open(io.BytesIO(image_data))
        input_tensor = preprocess_image(image)
        with torch.no_grad():
            outputs = model(input_tensor)
            probabilities = torch.nn.functional.softmax(outputs, dim=1)
            confidence, predicted_idx = torch.max(probabilities, 1)
            top_3_prob, top_3_idx = torch.topk(probabilities, 3, dim=1)
        predicted_class = class_names[predicted_idx.item()]
        top_3 = [{"class": class_names[top_3_idx[0][i].item()], "confidence": top_3_prob[0][i].item()} for i in range(3)]
        return Prediction(filename=file.filename, predicted_class=predicted_class, confidence=confidence.item(), top_3_predictions=top_3)
    except Exception as e:
        raise HTTPException(500, f"Error: {str(e)}")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

For my fish example, this API takes an image upload and returns the species with confidence. Test it locally: Upload an image to /predict via tools like Postman or the Swagger UI at /docs

Diagram: API Flow

Step 4: Dockerizing the API

Containerize for easy deployment.
General Steps:

Create Dockerfile and requirements.txt (list deps like fastapi, torch, etc.).
Build: docker build -t your-app .
Run Locally: docker run -p 8000:8000 your-app
Push to Registry: e.g., Docker Hub.

Example Dockerfile:

FROM python:3.9-slim

WORKDIR /app

RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir gdown

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Step 5: Deploying to Render or Railway

Both Render and Railway are excellent for deploying ML models due to their simplicity and free tiers. I’ll outline steps for both, as they’re similar but have slight differences.

Diagram: Deployment Pipeline

Conclusion

This workflow makes deploying AI/ML models straightforward, letting you focus on research. For my fish classification, it turned a Kaggle notebook into a live API in hours. Adapt it to your project—swap models, datasets, or endpoints as needed. If you build something cool, share it!
Questions? Drop a comment below. Happy deploying! 🚀

DEV Community