Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Model Monitoring Dashboard

#machinelearning #python #mlops #datascience

Model Monitoring Dashboard

Models don't fail with exceptions — they fail silently. Accuracy degrades gradually as data distributions shift, features go stale, or upstream systems change schemas. By the time someone notices, you've been serving bad predictions for weeks. This toolkit gives you drift detection algorithms, real-time performance monitoring, and pre-built Grafana dashboards with alerting rules that catch degradation early. You get the complete observability stack for deployed ML models: data drift, prediction drift, feature importance shifts, and business metric correlation.

Key Features

Data Drift Detection — Population Stability Index (PSI), Kolmogorov-Smirnov tests, and Jensen-Shannon divergence computed per-feature on configurable schedules.
Prediction Drift Monitoring — Track prediction distribution changes even when ground truth labels aren't available yet.
Grafana Dashboards — Pre-built dashboard JSON with panels for model performance, feature drift, latency percentiles, and error rates.
Alerting Rules — Prometheus alerting rules for drift thresholds, latency spikes, error rate increases, and data pipeline staleness.
Performance Tracking — Accuracy, precision, recall, and custom metrics computed on incoming labeled data with sliding window aggregations.
Feature Importance Monitoring — SHAP-based feature importance tracking that alerts when the model starts relying on different features than during training.
Outlier Detection — Flag individual predictions made on out-of-distribution inputs so downstream systems can handle them appropriately.
Report Generator — Weekly and monthly model health reports in HTML and PDF with trend analysis and recommendations.

Quick Start

unzip model-monitoring-dashboard.zip && cd model-monitoring-dashboard
pip install -r requirements.txt

# Start the monitoring stack
docker compose up -d  # Starts Prometheus + Grafana + monitoring service

# Import Grafana dashboards
python src/model_monitoring/setup.py import-dashboards \
  --grafana-url http://localhost:3000 \
  --api-key YOUR_GRAFANA_API_KEY_HERE

# config.example.yaml
monitoring:
  model_name: churn_predictor_v3
  check_interval_minutes: 60
  reference_dataset: ./data/training_reference.parquet

drift_detection:
  features:
    numerical: { method: ks_test, threshold: 0.05 }
    categorical: { method: chi_squared, threshold: 0.05 }
  prediction: { method: psi, threshold: 0.1 }
  schedule: "*/60 * * * *"

performance:
  metrics: [accuracy, precision, recall, f1, auc_roc]
  window_size: 1000  # predictions
  sliding_step: 100
  ground_truth_delay_hours: 24  # how long until labels arrive

  alerting:
  channels: [slack, pagerduty]
  rules:
    - { name: feature_drift_alert, condition: "psi > 0.2 for any feature", severity: warning }
    - { name: performance_drop, condition: "accuracy < 0.80", severity: critical }
    - { name: latency_spike, condition: "p99_latency_ms > 200", severity: warning }
    - { name: data_staleness, condition: "last_prediction_age > 30m", severity: critical }

grafana:
  url: http://localhost:3000
  dashboards_dir: ./dashboards/
  datasource: prometheus

Architecture

┌────────────────┐     ┌────────────────┐     ┌────────────────┐
│  Prediction    │────>│  Metrics       │────>│  Prometheus    │
│  Service       │     │  Collector     │     │                │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                       │
┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐
│  Reference     │────>│  Drift         │────>│  Grafana       │
│  Dataset       │     │  Detector      │     │  Dashboards    │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                       │
                       ┌────────────────┐     ┌───────▼────────┐
                       │  Report        │<────│  Alert         │
                       │  Generator     │     │  Manager       │
                       └────────────────┘     └────────────────┘

Usage Examples

Set Up Drift Detection

from model_monitoring.core import DriftDetector
import pandas as pd

# Load reference data (your training distribution)
reference = pd.read_parquet("./data/training_reference.parquet")

detector = DriftDetector(
    reference_data=reference,
    numerical_method="ks_test",
    categorical_method="chi_squared",
)

# Check drift on new production data
production_batch = pd.read_parquet("./data/production_batch_20260323.parquet")
report = detector.check_drift(production_batch)

for feature, result in report.items():
    status = "DRIFT" if result["drifted"] else "OK"
    print(f"{feature}: {status} (p={result['p_value']:.4f}, PSI={result['psi']:.4f})")

Configure Prometheus Metrics

from model_monitoring.core import MetricsCollector
from prometheus_client import start_http_server

start_http_server(port=8000)
collector = MetricsCollector(model_name="churn_predictor_v3")

# Log predictions from your serving code
collector.log_prediction(features={"age": 35, "tenure_months": 24}, prediction=0.73, latency_ms=12.5)

# Log ground truth when labels arrive (async, hours/days later)
collector.log_ground_truth(prediction_id="pred_abc123", actual_label=1)

Generate a Model Health Report

from model_monitoring.core import ReportGenerator

generator = ReportGenerator.from_config("config.example.yaml")
report = generator.generate(period="weekly", start_date="2026-03-16", end_date="2026-03-23")
report.save_html("./reports/model_health_week_12.html")
print(f"Health: {report.health_score}/100 | Drifted: {report.drifted_features} | Trend: {report.performance_trend}")

Configuration Reference

Parameter	Type	Default	Description
`drift_detection.features.numerical.method`	str	`ks_test`	Statistical test for numeric drift
`drift_detection.features.numerical.threshold`	float	`0.05`	p-value threshold for drift detection
`drift_detection.prediction.threshold`	float	`0.1`	PSI threshold for prediction drift
`performance.window_size`	int	`1000`	Sliding window size for metrics
`performance.ground_truth_delay_hours`	int	`24`	Expected delay for label arrival
`alerting.rules.*.severity`	str	`warning`	Alert severity: info, warning, critical

Best Practices

Monitor prediction distributions, not just performance — Ground truth labels can take days or weeks to arrive. Prediction drift is your early warning system.
Set per-feature drift thresholds — Not all features matter equally. Set tighter thresholds on high-importance features and looser ones on low-importance features.
Use PSI for business stakeholder communication — PSI is more intuitive than KS test p-values. PSI < 0.1 = stable, 0.1-0.2 = moderate shift, > 0.2 = significant.
Track feature importance over time — If the model starts relying heavily on a feature that was unimportant during training, something has changed fundamentally.
Establish a retraining trigger — Define explicit criteria: "retrain when 3+ features show PSI > 0.2 or accuracy drops below 0.80 for 3 consecutive windows."

Troubleshooting

Issue	Cause	Fix
All features show drift	Reference dataset is too old	Refresh reference dataset with recent training data
Grafana dashboard shows no data	Prometheus scrape target not configured	Verify Prometheus config has the metrics endpoint URL and port
Drift alerts firing too frequently	Threshold too sensitive for noisy features	Increase `threshold` for noisy features, or use longer aggregation windows
Performance metrics show NaN	Ground truth labels not arriving	Check `ground_truth_delay_hours`, verify label pipeline is running

This is 1 of 11 resources in the ML Engineer Toolkit toolkit. Get the complete [Model Monitoring Dashboard] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Engineer Toolkit bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →

DEV Community

Model Monitoring Dashboard

Model Monitoring Dashboard

Key Features

Quick Start

Architecture

Usage Examples

Set Up Drift Detection

Configure Prometheus Metrics

Generate a Model Health Report

Configuration Reference

Best Practices

Troubleshooting

Related Articles

Top comments (0)