Experiment Tracking Pack
Production-ready experiment tracking with Weights & Biases and MLflow. Stop losing track of what you tried — log every hyperparameter, metric, and artifact automatically. Compare runs side-by-side, reproduce any experiment, and share results with your team.
Key Features
- Dual-backend tracking — log to W&B and MLflow simultaneously with a unified API
- Custom comparison dashboards — pre-built templates for metric visualization across runs
- Hyperparameter sweep tracking — structured logging for grid, random, and Bayesian searches
- Artifact versioning — automatically version model checkpoints, datasets, and configs
- Reproducibility configs — capture environment, git hash, and random seeds per experiment
- Team collaboration — shared project dashboards with role-based access patterns
- Alerting on metric regression — configurable thresholds that flag degraded runs early
- Export and reporting — generate PDF/HTML reports from tracked experiments
Quick Start
# 1. Copy and edit the config
cp config.example.yaml config.yaml
# 2. Set credentials
export WANDB_API_KEY=YOUR_API_KEY_HERE
export MLFLOW_TRACKING_URI=http://localhost:5000
# 3. Run your first tracked experiment
python examples/tracked_experiment.py
"""Minimal tracked training loop."""
import wandb
import mlflow
from tracker import ExperimentTracker
config = {
"learning_rate": 0.001,
"epochs": 50,
"batch_size": 32,
"model": "resnet18",
}
tracker = ExperimentTracker(
project="image-classification",
backends=["wandb", "mlflow"],
config=config,
)
with tracker.start_run(run_name="baseline-v1"):
for epoch in range(config["epochs"]):
train_loss = train_one_epoch(model, dataloader)
val_acc = evaluate(model, val_loader)
tracker.log({
"train/loss": train_loss,
"val/accuracy": val_acc,
"epoch": epoch,
})
tracker.log_artifact("model.pt", artifact_type="model")
tracker.log_artifact("config.yaml", artifact_type="config")
Architecture
experiment-tracking-pack/
├── config.example.yaml # Tracking backend configuration
├── templates/
│ ├── tracker.py # Unified ExperimentTracker class
│ ├── callbacks.py # Training framework callbacks (PyTorch, sklearn)
│ ├── dashboards/ # Pre-built W&B dashboard JSON exports
│ │ ├── training_overview.json
│ │ └── hyperparam_comparison.json
│ └── reports/ # Report generation templates
├── docs/
│ ├── overview.md # Full architecture walkthrough
│ ├── patterns/ # Tracking patterns for common scenarios
│ └── checklists/
│ └── pre-deployment.md # Go-live checklist
└── examples/
├── tracked_experiment.py # Basic usage
└── sweep_tracking.py # Hyperparameter sweep logging
The ExperimentTracker wraps both W&B and MLflow behind a single interface. You call tracker.log() once and metrics flow to both backends. Switch backends by editing config.yaml — zero code changes.
Usage Examples
PyTorch Lightning Callback
from tracker import ExperimentTracker
import pytorch_lightning as pl
class TrackingCallback(pl.Callback):
def __init__(self, tracker: ExperimentTracker):
self.tracker = tracker
def on_train_epoch_end(self, trainer, pl_module):
metrics = trainer.callback_metrics
self.tracker.log({
"train/loss": metrics["train_loss"].item(),
"val/accuracy": metrics.get("val_acc", 0.0),
"epoch": trainer.current_epoch,
})
Comparing Runs Programmatically
from tracker import ExperimentTracker
tracker = ExperimentTracker(project="image-classification")
runs = tracker.get_runs(filters={"tag": "baseline"}, order_by="val/accuracy")
for run in runs[:5]:
print(f"{run.name}: acc={run.metrics['val/accuracy']:.4f}, "
f"lr={run.config['learning_rate']}")
Configuration
# config.example.yaml
project_name: "my-ml-project"
backends:
wandb:
enabled: true
entity: "your-team" # W&B team or username
log_model: true # Upload model artifacts
log_code: true # Snapshot source code
mlflow:
enabled: true
tracking_uri: "http://localhost:5000"
registry_uri: "sqlite:///mlflow.db"
auto_log: true # Enable MLflow autologging
logging:
log_frequency: 10 # Log every N steps
log_system_metrics: true # GPU utilization, memory
capture_git_hash: true # Record git commit
capture_env: true # Record pip freeze
Best Practices
- Log config at run start — always pass your full hyperparameter dict to the tracker before training begins
-
Use tags, not names, for filtering — run names should be human-readable; use tags like
["baseline", "v2", "augmented"]for programmatic queries -
Set metric summary modes — configure W&B to track
min(val/loss)andmax(val/accuracy)for leaderboard views -
Version your tracking config — commit
config.yamlto git so experiment setup is reproducible - Use run groups for sweeps — group related hyperparameter search runs for cleaner dashboards
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
wandb: ERROR Run initialization failed |
Invalid API key or network issue | Verify WANDB_API_KEY with wandb login --verify
|
| Metrics not appearing in MLflow UI | Wrong tracking_uri or MLflow server down |
Check mlflow server is running; test with curl $MLFLOW_TRACKING_URI/api/2.0/mlflow/experiments/list
|
| Duplicate runs on resume | Missing resume flag |
Set tracker.start_run(resume="must") for resumed training |
| Slow logging with large artifacts | Synchronous upload blocking training | Enable async_upload: true in config or log artifacts only at end of run |
This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [Experiment Tracking Pack] with all files, templates, and documentation for $29.
Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.
Top comments (0)