DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

MLflow Starter Kit

MLflow Starter Kit

Production-ready MLflow setup with experiment tracking, model registry, and deployment configurations. Go from ad-hoc notebook experiments to a structured, reproducible ML workflow with a self-hosted tracking server you control.

Key Features

  • Self-hosted tracking server — Docker Compose with PostgreSQL and S3-compatible artifact storage
  • Model registry workflows — stage transitions (Staging → Production → Archived)
  • Autologging — one-line setup for PyTorch, sklearn, XGBoost, LightGBM
  • Experiment organization — naming conventions, tagging, project structuring
  • Deployment configs — batch inference, REST API, and container deployment
  • CI/CD model promotion — automated pipelines based on metric thresholds

Quick Start

# 1. Copy the config
cp config.example.yaml config.yaml

# 2. Start the MLflow tracking server
docker-compose -f templates/docker-compose.yaml up -d

# 3. Verify the server is running
curl http://localhost:5000/api/2.0/mlflow/experiments/list

# 4. Run the example experiment
python examples/train_example.py
Enter fullscreen mode Exit fullscreen mode
"""Log a sklearn training run to MLflow."""
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("classification-baseline")

X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

with mlflow.start_run(run_name="random-forest-v1"):
    params = {"n_estimators": 200, "max_depth": 10, "min_samples_leaf": 5}
    mlflow.log_params(params)

    model = RandomForestClassifier(**params, random_state=42)
    model.fit(X_train, y_train)

    accuracy = accuracy_score(y_test, model.predict(X_test))
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("n_samples", len(X_train))

    # Log model with signature for serving
    signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
    mlflow.sklearn.log_model(model, "model", signature=signature)

    print(f"Run logged. Accuracy: {accuracy:.4f}")
Enter fullscreen mode Exit fullscreen mode

Architecture

mlflow-starter-kit/
├── config.example.yaml              # MLflow server and client configuration
├── templates/
│   ├── docker-compose.yaml          # MLflow + PostgreSQL + MinIO stack
│   ├── Dockerfile.mlflow            # Custom MLflow server image
│   ├── registry/
│   │   ├── promote_model.py         # Stage transition automation
│   │   └── compare_models.py        # Compare candidate vs production
│   ├── deployment/
│   │   ├── batch_predict.py         # Batch inference from registry
│   │   ├── serve_model.sh           # MLflow model serving CLI
│   │   └── export_model.py          # Export for external deployment
│   └── ci/
│       └── model_promotion.yaml     # GitHub Actions auto-promotion
├── docs/
│   ├── overview.md
│   └── patterns/
│       └── registry_workflow.md     # Model registry best practices
└── examples/
    ├── train_example.py             # Basic experiment logging
    └── autolog_example.py           # Framework autologging
Enter fullscreen mode Exit fullscreen mode

Usage Examples

Model Registry: Promote to Production

"""Promote a model from Staging to Production."""
import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient(tracking_uri="http://localhost:5000")

model_name = "fraud-detector"

# Get the latest staging model
staging_versions = client.get_latest_versions(model_name, stages=["Staging"])
if not staging_versions:
    raise ValueError("No model in Staging")

staging = staging_versions[0]
prod_versions = client.get_latest_versions(model_name, stages=["Production"])

if prod_versions:
    prod_acc = client.get_run(prod_versions[0].run_id).data.metrics.get("accuracy", 0)
    staging_acc = client.get_run(staging.run_id).data.metrics.get("accuracy", 0)
    if staging_acc <= prod_acc:
        raise SystemExit(f"Staging ({staging_acc:.4f}) not better than prod ({prod_acc:.4f})")

# Promote
client.transition_model_version_stage(
    name=model_name,
    version=staging.version,
    stage="Production",
    archive_existing_versions=True,
)
print(f"Model {model_name} v{staging.version} promoted to Production")
Enter fullscreen mode Exit fullscreen mode

Autologging

import mlflow
mlflow.autolog()  # Auto-logs params, metrics, and model for sklearn/pytorch/xgboost

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100)
model.fit(X_train, y_train)  # Everything logged automatically
Enter fullscreen mode Exit fullscreen mode

Configuration

# config.example.yaml
server:
  tracking_uri: "http://localhost:5000"
  artifact_root: "s3://mlflow-artifacts/"
  backend_store_uri: "postgresql://mlflow:password@localhost:5432/mlflow"

client:
  experiment_name: "my-project"
  auto_log: true
  log_system_metrics: true

registry:
  model_name: "my-model"
  promotion_metric: "accuracy"       # Metric for promotion decisions
  promotion_threshold: 0.01          # Must beat production by this margin
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use experiments for projects, runs for iterations — one experiment per project, each training is a run
  2. Always log a model signature — enables input validation when serving
  3. Tag runs for filtering — use tags like {"team": "ml-platform"} for organizational queries
  4. Archive, don't delete — move old versions to Archived; you may need to rollback

Troubleshooting

Problem Cause Fix
ConnectionRefusedError on set_tracking_uri MLflow server not running Run docker-compose up -d and verify with curl http://localhost:5000/health
Artifacts not saving S3/MinIO credentials missing Check AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env vars
RESOURCE_ALREADY_EXISTS on experiment Experiment name collision Use unique names or call mlflow.set_experiment() which creates or reuses
Model registry empty Models logged but not registered Add registered_model_name="my-model" to log_model() call

This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [MLflow Starter Kit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)