The Stack That Actually Works
I'll cut to the chase: MLflow tracking + FastAPI serving + DigitalOcean droplet + Stripe webhooks. That's the blueprint. The model is a gradient boosting classifier predicting customer churn for small SaaS companies at $0.03 per prediction. Monthly revenue hovers around $2,100 with 70k API calls.
Why this stack? Because it's boring, well-documented, and doesn't break at 3am. I tried the "modern" approach first—containerized everything with Kubernetes, set up auto-scaling, added Prometheus metrics. Burned through my first month's revenue on infrastructure costs before a single paying customer.
Here's the actual production setup running right now.
python
# app/main.py
from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel, Field
import mlflow
import numpy as np
from typing import Optional
import hashlib
import time
from functools import lru_cache
app = FastAPI(title="Churn Prediction API", version="1.2.3")
# Load model once at startup - not on every request
@lru_cache(maxsize=1)
def get_model():
model_uri = "models:/churn-predictor/production"
model = mlflow.pyfunc.load_model(model_uri)
return model
class ChurnRequest(BaseModel):
customer_id: str
monthly_charges: float = Field(..., gt=0, description="MRR in USD")
tenure_days: int = Field(..., ge=0)
support_tickets: int = Field(default=0, ge=0)
feature_usage_pct: float = Field(..., ge=0, le=100)
---
*Continue reading the full article on [TildAlice](https://tildalice.io/mlflow-fastapi-model-serving/)*
Top comments (0)