DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Experiment Tracking Pack

Experiment Tracking Pack

Production-ready experiment tracking with Weights & Biases and MLflow. Stop losing track of what you tried — log every hyperparameter, metric, and artifact automatically. Compare runs side-by-side, reproduce any experiment, and share results with your team.

Key Features

  • Dual-backend tracking — log to W&B and MLflow simultaneously with a unified API
  • Custom comparison dashboards — pre-built templates for metric visualization across runs
  • Hyperparameter sweep tracking — structured logging for grid, random, and Bayesian searches
  • Artifact versioning — automatically version model checkpoints, datasets, and configs
  • Reproducibility configs — capture environment, git hash, and random seeds per experiment
  • Team collaboration — shared project dashboards with role-based access patterns
  • Alerting on metric regression — configurable thresholds that flag degraded runs early
  • Export and reporting — generate PDF/HTML reports from tracked experiments

Quick Start

# 1. Copy and edit the config
cp config.example.yaml config.yaml

# 2. Set credentials
export WANDB_API_KEY=YOUR_API_KEY_HERE
export MLFLOW_TRACKING_URI=http://localhost:5000

# 3. Run your first tracked experiment
python examples/tracked_experiment.py
Enter fullscreen mode Exit fullscreen mode
"""Minimal tracked training loop."""
import wandb
import mlflow
from tracker import ExperimentTracker

config = {
    "learning_rate": 0.001,
    "epochs": 50,
    "batch_size": 32,
    "model": "resnet18",
}

tracker = ExperimentTracker(
    project="image-classification",
    backends=["wandb", "mlflow"],
    config=config,
)

with tracker.start_run(run_name="baseline-v1"):
    for epoch in range(config["epochs"]):
        train_loss = train_one_epoch(model, dataloader)
        val_acc = evaluate(model, val_loader)

        tracker.log({
            "train/loss": train_loss,
            "val/accuracy": val_acc,
            "epoch": epoch,
        })

    tracker.log_artifact("model.pt", artifact_type="model")
    tracker.log_artifact("config.yaml", artifact_type="config")
Enter fullscreen mode Exit fullscreen mode

Architecture

experiment-tracking-pack/
├── config.example.yaml          # Tracking backend configuration
├── templates/
│   ├── tracker.py               # Unified ExperimentTracker class
│   ├── callbacks.py             # Training framework callbacks (PyTorch, sklearn)
│   ├── dashboards/              # Pre-built W&B dashboard JSON exports
│   │   ├── training_overview.json
│   │   └── hyperparam_comparison.json
│   └── reports/                 # Report generation templates
├── docs/
│   ├── overview.md              # Full architecture walkthrough
│   ├── patterns/                # Tracking patterns for common scenarios
│   └── checklists/
│       └── pre-deployment.md    # Go-live checklist
└── examples/
    ├── tracked_experiment.py    # Basic usage
    └── sweep_tracking.py        # Hyperparameter sweep logging
Enter fullscreen mode Exit fullscreen mode

The ExperimentTracker wraps both W&B and MLflow behind a single interface. You call tracker.log() once and metrics flow to both backends. Switch backends by editing config.yaml — zero code changes.

Usage Examples

PyTorch Lightning Callback

from tracker import ExperimentTracker
import pytorch_lightning as pl

class TrackingCallback(pl.Callback):
    def __init__(self, tracker: ExperimentTracker):
        self.tracker = tracker

    def on_train_epoch_end(self, trainer, pl_module):
        metrics = trainer.callback_metrics
        self.tracker.log({
            "train/loss": metrics["train_loss"].item(),
            "val/accuracy": metrics.get("val_acc", 0.0),
            "epoch": trainer.current_epoch,
        })
Enter fullscreen mode Exit fullscreen mode

Comparing Runs Programmatically

from tracker import ExperimentTracker

tracker = ExperimentTracker(project="image-classification")
runs = tracker.get_runs(filters={"tag": "baseline"}, order_by="val/accuracy")

for run in runs[:5]:
    print(f"{run.name}: acc={run.metrics['val/accuracy']:.4f}, "
          f"lr={run.config['learning_rate']}")
Enter fullscreen mode Exit fullscreen mode

Configuration

# config.example.yaml
project_name: "my-ml-project"

backends:
  wandb:
    enabled: true
    entity: "your-team"         # W&B team or username
    log_model: true             # Upload model artifacts
    log_code: true              # Snapshot source code

  mlflow:
    enabled: true
    tracking_uri: "http://localhost:5000"
    registry_uri: "sqlite:///mlflow.db"
    auto_log: true              # Enable MLflow autologging

logging:
  log_frequency: 10             # Log every N steps
  log_system_metrics: true      # GPU utilization, memory
  capture_git_hash: true        # Record git commit
  capture_env: true             # Record pip freeze
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Log config at run start — always pass your full hyperparameter dict to the tracker before training begins
  2. Use tags, not names, for filtering — run names should be human-readable; use tags like ["baseline", "v2", "augmented"] for programmatic queries
  3. Set metric summary modes — configure W&B to track min(val/loss) and max(val/accuracy) for leaderboard views
  4. Version your tracking config — commit config.yaml to git so experiment setup is reproducible
  5. Use run groups for sweeps — group related hyperparameter search runs for cleaner dashboards

Troubleshooting

Problem Cause Fix
wandb: ERROR Run initialization failed Invalid API key or network issue Verify WANDB_API_KEY with wandb login --verify
Metrics not appearing in MLflow UI Wrong tracking_uri or MLflow server down Check mlflow server is running; test with curl $MLFLOW_TRACKING_URI/api/2.0/mlflow/experiments/list
Duplicate runs on resume Missing resume flag Set tracker.start_run(resume="must") for resumed training
Slow logging with large artifacts Synchronous upload blocking training Enable async_upload: true in config or log artifacts only at end of run

This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [Experiment Tracking Pack] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)