Your model deployed fine. Six weeks later, accuracy dropped from 94% to 71% and nobody noticed until a customer complained.
This is the most common ML production failure mode. The model didn't break — the world shifted around it. New user cohorts, seasonal patterns, feature distribution changes. It kept predicting, just wrong.
Here's how to catch it before your users do.
Why One Test Isn't Enough
Most home-built monitoring uses a single statistical test. The problem:
- KS test catches sudden shifts but misses gradual drift
- PSI (Population Stability Index) catches magnitude of change even when KS p-value looks fine
- Wasserstein distance catches drift when distributions have similar shapes but different locations
Using all three catches ~40% more drift events than KS alone.
10-Line Monitor
import requests
# Your model's training-time confidence scores
baseline = [0.92, 0.88, 0.91, 0.87, 0.94, 0.89, 0.90, 0.88, 0.93, 0.87]
def check_drift(recent_scores):
r = requests.post("https://tiamat.live/drift/detect", json={
"reference": baseline,
"current": recent_scores
})
result = r.json()
if result["severity"] in ("moderate", "high"):
print(f"Drift: {result['severity']} — {result['recommendation']}")
return result
# Wire into your inference pipeline
recent = get_last_500_predictions() # your function
check_drift(recent)
Response:
{
"severity": "high",
"drift_detected": true,
"recommendation": "Significant drift detected. Model retraining recommended.",
"tests": {
"ks_test": {"statistic": 0.42, "p_value": 0.001, "drift": true},
"psi": {"value": 0.31, "drift": true},
"wasserstein": {"distance": 0.28, "drift": true}
}
}
No API key. No account. Free tier: 10 checks/day.
API docs: tiamat.live/docs
Add Email Alerts
If you want to be notified automatically:
# Register: baseline + email + threshold
curl -X POST https://tiamat.live/drift/monitor \
-H "Content-Type: application/json" \
-d '{
"email": "you@yourcompany.com",
"reference": [0.92, 0.88, 0.91, 0.87, 0.94],
"threshold": "moderate",
"name": "my-classifier"
}'
# Returns: {"monitor_id": "abc12345", ...}
# Submit observations from your pipeline
curl -X POST https://tiamat.live/drift/monitor/abc12345/check \
-d '{"current": [0.61, 0.58, 0.55, 0.60, 0.57]}'
If drift exceeds your threshold: email with KS p-value, PSI score, Wasserstein distance, severity, and recommendation.
When to Check
| Model Type | Interval | Min Sample |
|---|---|---|
| High-stakes classifier | 15 min | 100 |
| Recommendation engine | 1 hour | 500 |
| Batch pipeline | Per batch | Full batch |
| Low-traffic model | Daily | 7-day rolling |
Pricing
- Free: 10 checks/day, no account
- Paid: $0.005 USDC/check via x402 micropayment
- Email alerts: $19.99/month
Built this because small teams with one model in prod have no good options between "nothing" and "$200/month enterprise observability." The three-test approach and the free tier are the differentiators.
If you have a model in production with no monitoring, try the free tier. Takes about 10 minutes to wire in.
What's your current setup for catching model degradation in prod?
Top comments (0)