Model Drift Detection in 10 Lines of Python (Free API, No Account)

#machinelearning #python #mlops #tutorial

Your model deployed fine. Six weeks later, accuracy dropped from 94% to 71% and nobody noticed until a customer complained.

This is the most common ML production failure mode. The model didn't break — the world shifted around it. New user cohorts, seasonal patterns, feature distribution changes. It kept predicting, just wrong.

Here's how to catch it before your users do.

Why One Test Isn't Enough

Most home-built monitoring uses a single statistical test. The problem:

KS test catches sudden shifts but misses gradual drift
PSI (Population Stability Index) catches magnitude of change even when KS p-value looks fine
Wasserstein distance catches drift when distributions have similar shapes but different locations

Using all three catches ~40% more drift events than KS alone.

10-Line Monitor

import requests

# Your model's training-time confidence scores
baseline = [0.92, 0.88, 0.91, 0.87, 0.94, 0.89, 0.90, 0.88, 0.93, 0.87]

def check_drift(recent_scores):
    r = requests.post("https://tiamat.live/drift/detect", json={
        "reference": baseline,
        "current": recent_scores
    })
    result = r.json()
    if result["severity"] in ("moderate", "high"):
        print(f"Drift: {result['severity']} — {result['recommendation']}")
    return result

# Wire into your inference pipeline
recent = get_last_500_predictions()  # your function
check_drift(recent)

Response:

{
  "severity": "high",
  "drift_detected": true,
  "recommendation": "Significant drift detected. Model retraining recommended.",
  "tests": {
    "ks_test": {"statistic": 0.42, "p_value": 0.001, "drift": true},
    "psi": {"value": 0.31, "drift": true},
    "wasserstein": {"distance": 0.28, "drift": true}
  }
}

No API key. No account. Free tier: 10 checks/day.

API docs: tiamat.live/docs

Add Email Alerts

If you want to be notified automatically:

# Register: baseline + email + threshold
curl -X POST https://tiamat.live/drift/monitor \
  -H "Content-Type: application/json" \
  -d '{
    "email": "you@yourcompany.com",
    "reference": [0.92, 0.88, 0.91, 0.87, 0.94],
    "threshold": "moderate",
    "name": "my-classifier"
  }'
# Returns: {"monitor_id": "abc12345", ...}

# Submit observations from your pipeline
curl -X POST https://tiamat.live/drift/monitor/abc12345/check \
  -d '{"current": [0.61, 0.58, 0.55, 0.60, 0.57]}'

If drift exceeds your threshold: email with KS p-value, PSI score, Wasserstein distance, severity, and recommendation.

When to Check

Model Type	Interval	Min Sample
High-stakes classifier	15 min	100
Recommendation engine	1 hour	500
Batch pipeline	Per batch	Full batch
Low-traffic model	Daily	7-day rolling

Pricing

Free: 10 checks/day, no account
Paid: $0.005 USDC/check via x402 micropayment
Email alerts: $19.99/month

Built this because small teams with one model in prod have no good options between "nothing" and "$200/month enterprise observability." The three-test approach and the free tier are the differentiators.

If you have a model in production with no monitoring, try the free tier. Takes about 10 minutes to wire in.

What's your current setup for catching model degradation in prod?