MLflow Starter Kit
Production-ready MLflow setup with experiment tracking, model registry, and deployment configurations. Go from ad-hoc notebook experiments to a structured, reproducible ML workflow with a self-hosted tracking server you control.
Key Features
- Self-hosted tracking server — Docker Compose with PostgreSQL and S3-compatible artifact storage
- Model registry workflows — stage transitions (Staging → Production → Archived)
- Autologging — one-line setup for PyTorch, sklearn, XGBoost, LightGBM
- Experiment organization — naming conventions, tagging, project structuring
- Deployment configs — batch inference, REST API, and container deployment
- CI/CD model promotion — automated pipelines based on metric thresholds
Quick Start
# 1. Copy the config
cp config.example.yaml config.yaml
# 2. Start the MLflow tracking server
docker-compose -f templates/docker-compose.yaml up -d
# 3. Verify the server is running
curl http://localhost:5000/api/2.0/mlflow/experiments/list
# 4. Run the example experiment
python examples/train_example.py
"""Log a sklearn training run to MLflow."""
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("classification-baseline")
X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
with mlflow.start_run(run_name="random-forest-v1"):
params = {"n_estimators": 200, "max_depth": 10, "min_samples_leaf": 5}
mlflow.log_params(params)
model = RandomForestClassifier(**params, random_state=42)
model.fit(X_train, y_train)
accuracy = accuracy_score(y_test, model.predict(X_test))
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("n_samples", len(X_train))
# Log model with signature for serving
signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "model", signature=signature)
print(f"Run logged. Accuracy: {accuracy:.4f}")
Architecture
mlflow-starter-kit/
├── config.example.yaml # MLflow server and client configuration
├── templates/
│ ├── docker-compose.yaml # MLflow + PostgreSQL + MinIO stack
│ ├── Dockerfile.mlflow # Custom MLflow server image
│ ├── registry/
│ │ ├── promote_model.py # Stage transition automation
│ │ └── compare_models.py # Compare candidate vs production
│ ├── deployment/
│ │ ├── batch_predict.py # Batch inference from registry
│ │ ├── serve_model.sh # MLflow model serving CLI
│ │ └── export_model.py # Export for external deployment
│ └── ci/
│ └── model_promotion.yaml # GitHub Actions auto-promotion
├── docs/
│ ├── overview.md
│ └── patterns/
│ └── registry_workflow.md # Model registry best practices
└── examples/
├── train_example.py # Basic experiment logging
└── autolog_example.py # Framework autologging
Usage Examples
Model Registry: Promote to Production
"""Promote a model from Staging to Production."""
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient(tracking_uri="http://localhost:5000")
model_name = "fraud-detector"
# Get the latest staging model
staging_versions = client.get_latest_versions(model_name, stages=["Staging"])
if not staging_versions:
raise ValueError("No model in Staging")
staging = staging_versions[0]
prod_versions = client.get_latest_versions(model_name, stages=["Production"])
if prod_versions:
prod_acc = client.get_run(prod_versions[0].run_id).data.metrics.get("accuracy", 0)
staging_acc = client.get_run(staging.run_id).data.metrics.get("accuracy", 0)
if staging_acc <= prod_acc:
raise SystemExit(f"Staging ({staging_acc:.4f}) not better than prod ({prod_acc:.4f})")
# Promote
client.transition_model_version_stage(
name=model_name,
version=staging.version,
stage="Production",
archive_existing_versions=True,
)
print(f"Model {model_name} v{staging.version} promoted to Production")
Autologging
import mlflow
mlflow.autolog() # Auto-logs params, metrics, and model for sklearn/pytorch/xgboost
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100)
model.fit(X_train, y_train) # Everything logged automatically
Configuration
# config.example.yaml
server:
tracking_uri: "http://localhost:5000"
artifact_root: "s3://mlflow-artifacts/"
backend_store_uri: "postgresql://mlflow:password@localhost:5432/mlflow"
client:
experiment_name: "my-project"
auto_log: true
log_system_metrics: true
registry:
model_name: "my-model"
promotion_metric: "accuracy" # Metric for promotion decisions
promotion_threshold: 0.01 # Must beat production by this margin
Best Practices
- Use experiments for projects, runs for iterations — one experiment per project, each training is a run
- Always log a model signature — enables input validation when serving
-
Tag runs for filtering — use tags like
{"team": "ml-platform"}for organizational queries - Archive, don't delete — move old versions to Archived; you may need to rollback
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
ConnectionRefusedError on set_tracking_uri
|
MLflow server not running | Run docker-compose up -d and verify with curl http://localhost:5000/health
|
| Artifacts not saving | S3/MinIO credentials missing | Check AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env vars |
RESOURCE_ALREADY_EXISTS on experiment |
Experiment name collision | Use unique names or call mlflow.set_experiment() which creates or reuses |
| Model registry empty | Models logged but not registered | Add registered_model_name="my-model" to log_model() call |
This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [MLflow Starter Kit] with all files, templates, and documentation for $39.
Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.
Top comments (0)