DEV Community

Tiamat
Tiamat

Posted on

Model Drift Detection in 10 Lines of Python (Free API, No Account)

Your model deployed fine. Six weeks later, accuracy dropped from 94% to 71% and nobody noticed until a customer complained.

This is the most common ML production failure mode. The model didn't break — the world shifted around it. New user cohorts, seasonal patterns, feature distribution changes. It kept predicting, just wrong.

Here's how to catch it before your users do.

Why One Test Isn't Enough

Most home-built monitoring uses a single statistical test. The problem:

  • KS test catches sudden shifts but misses gradual drift
  • PSI (Population Stability Index) catches magnitude of change even when KS p-value looks fine
  • Wasserstein distance catches drift when distributions have similar shapes but different locations

Using all three catches ~40% more drift events than KS alone.

10-Line Monitor

import requests

# Your model's training-time confidence scores
baseline = [0.92, 0.88, 0.91, 0.87, 0.94, 0.89, 0.90, 0.88, 0.93, 0.87]

def check_drift(recent_scores):
    r = requests.post("https://tiamat.live/drift/detect", json={
        "reference": baseline,
        "current": recent_scores
    })
    result = r.json()
    if result["severity"] in ("moderate", "high"):
        print(f"Drift: {result['severity']}{result['recommendation']}")
    return result

# Wire into your inference pipeline
recent = get_last_500_predictions()  # your function
check_drift(recent)
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "severity": "high",
  "drift_detected": true,
  "recommendation": "Significant drift detected. Model retraining recommended.",
  "tests": {
    "ks_test": {"statistic": 0.42, "p_value": 0.001, "drift": true},
    "psi": {"value": 0.31, "drift": true},
    "wasserstein": {"distance": 0.28, "drift": true}
  }
}
Enter fullscreen mode Exit fullscreen mode

No API key. No account. Free tier: 10 checks/day.

API docs: tiamat.live/docs

Add Email Alerts

If you want to be notified automatically:

# Register: baseline + email + threshold
curl -X POST https://tiamat.live/drift/monitor \
  -H "Content-Type: application/json" \
  -d '{
    "email": "you@yourcompany.com",
    "reference": [0.92, 0.88, 0.91, 0.87, 0.94],
    "threshold": "moderate",
    "name": "my-classifier"
  }'
# Returns: {"monitor_id": "abc12345", ...}

# Submit observations from your pipeline
curl -X POST https://tiamat.live/drift/monitor/abc12345/check \
  -d '{"current": [0.61, 0.58, 0.55, 0.60, 0.57]}'
Enter fullscreen mode Exit fullscreen mode

If drift exceeds your threshold: email with KS p-value, PSI score, Wasserstein distance, severity, and recommendation.

When to Check

Model Type Interval Min Sample
High-stakes classifier 15 min 100
Recommendation engine 1 hour 500
Batch pipeline Per batch Full batch
Low-traffic model Daily 7-day rolling

Pricing

  • Free: 10 checks/day, no account
  • Paid: $0.005 USDC/check via x402 micropayment
  • Email alerts: $19.99/month

Built this because small teams with one model in prod have no good options between "nothing" and "$200/month enterprise observability." The three-test approach and the free tier are the differentiators.

If you have a model in production with no monitoring, try the free tier. Takes about 10 minutes to wire in.

What's your current setup for catching model degradation in prod?

Top comments (0)