DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Model Monitoring Dashboard

Model Monitoring Dashboard

Models don't fail with exceptions — they fail silently. Accuracy degrades gradually as data distributions shift, features go stale, or upstream systems change schemas. By the time someone notices, you've been serving bad predictions for weeks. This toolkit gives you drift detection algorithms, real-time performance monitoring, and pre-built Grafana dashboards with alerting rules that catch degradation early. You get the complete observability stack for deployed ML models: data drift, prediction drift, feature importance shifts, and business metric correlation.

Key Features

  • Data Drift Detection — Population Stability Index (PSI), Kolmogorov-Smirnov tests, and Jensen-Shannon divergence computed per-feature on configurable schedules.
  • Prediction Drift Monitoring — Track prediction distribution changes even when ground truth labels aren't available yet.
  • Grafana Dashboards — Pre-built dashboard JSON with panels for model performance, feature drift, latency percentiles, and error rates.
  • Alerting Rules — Prometheus alerting rules for drift thresholds, latency spikes, error rate increases, and data pipeline staleness.
  • Performance Tracking — Accuracy, precision, recall, and custom metrics computed on incoming labeled data with sliding window aggregations.
  • Feature Importance Monitoring — SHAP-based feature importance tracking that alerts when the model starts relying on different features than during training.
  • Outlier Detection — Flag individual predictions made on out-of-distribution inputs so downstream systems can handle them appropriately.
  • Report Generator — Weekly and monthly model health reports in HTML and PDF with trend analysis and recommendations.

Quick Start

unzip model-monitoring-dashboard.zip && cd model-monitoring-dashboard
pip install -r requirements.txt

# Start the monitoring stack
docker compose up -d  # Starts Prometheus + Grafana + monitoring service

# Import Grafana dashboards
python src/model_monitoring/setup.py import-dashboards \
  --grafana-url http://localhost:3000 \
  --api-key YOUR_GRAFANA_API_KEY_HERE
Enter fullscreen mode Exit fullscreen mode
# config.example.yaml
monitoring:
  model_name: churn_predictor_v3
  check_interval_minutes: 60
  reference_dataset: ./data/training_reference.parquet

drift_detection:
  features:
    numerical: { method: ks_test, threshold: 0.05 }
    categorical: { method: chi_squared, threshold: 0.05 }
  prediction: { method: psi, threshold: 0.1 }
  schedule: "*/60 * * * *"

performance:
  metrics: [accuracy, precision, recall, f1, auc_roc]
  window_size: 1000  # predictions
  sliding_step: 100
  ground_truth_delay_hours: 24  # how long until labels arrive

  alerting:
  channels: [slack, pagerduty]
  rules:
    - { name: feature_drift_alert, condition: "psi > 0.2 for any feature", severity: warning }
    - { name: performance_drop, condition: "accuracy < 0.80", severity: critical }
    - { name: latency_spike, condition: "p99_latency_ms > 200", severity: warning }
    - { name: data_staleness, condition: "last_prediction_age > 30m", severity: critical }

grafana:
  url: http://localhost:3000
  dashboards_dir: ./dashboards/
  datasource: prometheus
Enter fullscreen mode Exit fullscreen mode

Architecture

┌────────────────┐     ┌────────────────┐     ┌────────────────┐
│  Prediction    │────>│  Metrics       │────>│  Prometheus    │
│  Service       │     │  Collector     │     │                │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                       │
┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐
│  Reference     │────>│  Drift         │────>│  Grafana       │
│  Dataset       │     │  Detector      │     │  Dashboards    │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                       │
                       ┌────────────────┐     ┌───────▼────────┐
                       │  Report        │<────│  Alert         │
                       │  Generator     │     │  Manager       │
                       └────────────────┘     └────────────────┘
Enter fullscreen mode Exit fullscreen mode

Usage Examples

Set Up Drift Detection

from model_monitoring.core import DriftDetector
import pandas as pd

# Load reference data (your training distribution)
reference = pd.read_parquet("./data/training_reference.parquet")

detector = DriftDetector(
    reference_data=reference,
    numerical_method="ks_test",
    categorical_method="chi_squared",
)

# Check drift on new production data
production_batch = pd.read_parquet("./data/production_batch_20260323.parquet")
report = detector.check_drift(production_batch)

for feature, result in report.items():
    status = "DRIFT" if result["drifted"] else "OK"
    print(f"{feature}: {status} (p={result['p_value']:.4f}, PSI={result['psi']:.4f})")
Enter fullscreen mode Exit fullscreen mode

Configure Prometheus Metrics

from model_monitoring.core import MetricsCollector
from prometheus_client import start_http_server

start_http_server(port=8000)
collector = MetricsCollector(model_name="churn_predictor_v3")

# Log predictions from your serving code
collector.log_prediction(features={"age": 35, "tenure_months": 24}, prediction=0.73, latency_ms=12.5)

# Log ground truth when labels arrive (async, hours/days later)
collector.log_ground_truth(prediction_id="pred_abc123", actual_label=1)
Enter fullscreen mode Exit fullscreen mode

Generate a Model Health Report

from model_monitoring.core import ReportGenerator

generator = ReportGenerator.from_config("config.example.yaml")
report = generator.generate(period="weekly", start_date="2026-03-16", end_date="2026-03-23")
report.save_html("./reports/model_health_week_12.html")
print(f"Health: {report.health_score}/100 | Drifted: {report.drifted_features} | Trend: {report.performance_trend}")
Enter fullscreen mode Exit fullscreen mode

Configuration Reference

Parameter Type Default Description
drift_detection.features.numerical.method str ks_test Statistical test for numeric drift
drift_detection.features.numerical.threshold float 0.05 p-value threshold for drift detection
drift_detection.prediction.threshold float 0.1 PSI threshold for prediction drift
performance.window_size int 1000 Sliding window size for metrics
performance.ground_truth_delay_hours int 24 Expected delay for label arrival
alerting.rules.*.severity str warning Alert severity: info, warning, critical

Best Practices

  1. Monitor prediction distributions, not just performance — Ground truth labels can take days or weeks to arrive. Prediction drift is your early warning system.
  2. Set per-feature drift thresholds — Not all features matter equally. Set tighter thresholds on high-importance features and looser ones on low-importance features.
  3. Use PSI for business stakeholder communication — PSI is more intuitive than KS test p-values. PSI < 0.1 = stable, 0.1-0.2 = moderate shift, > 0.2 = significant.
  4. Track feature importance over time — If the model starts relying heavily on a feature that was unimportant during training, something has changed fundamentally.
  5. Establish a retraining trigger — Define explicit criteria: "retrain when 3+ features show PSI > 0.2 or accuracy drops below 0.80 for 3 consecutive windows."

Troubleshooting

Issue Cause Fix
All features show drift Reference dataset is too old Refresh reference dataset with recent training data
Grafana dashboard shows no data Prometheus scrape target not configured Verify Prometheus config has the metrics endpoint URL and port
Drift alerts firing too frequently Threshold too sensitive for noisy features Increase threshold for noisy features, or use longer aggregation windows
Performance metrics show NaN Ground truth labels not arriving Check ground_truth_delay_hours, verify label pipeline is running

This is 1 of 11 resources in the ML Engineer Toolkit toolkit. Get the complete [Model Monitoring Dashboard] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Engineer Toolkit bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)