MLflow + FastAPI: $2K/Month Model Serving Side Project

#mlflow #fastapi #modelserving #mlops

The Stack That Actually Works

I'll cut to the chase: MLflow tracking + FastAPI serving + DigitalOcean droplet + Stripe webhooks. That's the blueprint. The model is a gradient boosting classifier predicting customer churn for small SaaS companies at $0.03 per prediction. Monthly revenue hovers around $2,100 with 70k API calls.

Why this stack? Because it's boring, well-documented, and doesn't break at 3am. I tried the "modern" approach first—containerized everything with Kubernetes, set up auto-scaling, added Prometheus metrics. Burned through my first month's revenue on infrastructure costs before a single paying customer.

Here's the actual production setup running right now.


python
# app/main.py
from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel, Field
import mlflow
import numpy as np
from typing import Optional
import hashlib
import time
from functools import lru_cache

app = FastAPI(title="Churn Prediction API", version="1.2.3")

# Load model once at startup - not on every request
@lru_cache(maxsize=1)
def get_model():
    model_uri = "models:/churn-predictor/production"
    model = mlflow.pyfunc.load_model(model_uri)
    return model

class ChurnRequest(BaseModel):
    customer_id: str
    monthly_charges: float = Field(..., gt=0, description="MRR in USD")
    tenure_days: int = Field(..., ge=0)
    support_tickets: int = Field(default=0, ge=0)
    feature_usage_pct: float = Field(..., ge=0, le=100)

---

*Continue reading the full article on [TildAlice](https://tildalice.io/mlflow-fastapi-model-serving/)*

DEV Community

MLflow + FastAPI: $2K/Month Model Serving Side Project

The Stack That Actually Works

Top comments (0)