DEV Community

Cover image for Deploying MLflow Open-Source Machine Learning Experiment Tracking on Ubuntu 24.04
Sanskriti Harmukh for Vultr

Posted on with Aashish Chaurasiya • Originally published at docs.vultr.com

Deploying MLflow Open-Source Machine Learning Experiment Tracking on Ubuntu 24.04

MLflow is an open-source platform for managing the machine learning lifecycle — experiment tracking, model registry, and reproducible runs. This guide deploys MLflow using Docker Compose with a PostgreSQL backend, S3-compatible artifact storage, basic-auth, and Traefik handling automatic HTTPS, then logs a sample scikit-learn run. By the end, you'll have MLflow recording experiments at your domain over HTTPS.

Prerequisite: An S3-compatible bucket (e.g. Vultr Object Storage) with access key, secret key, region, and endpoint URL.


Set Up the Directory Structure

1. Create the project directory:

$ mkdir -p ~/mlflow
$ cd ~/mlflow
Enter fullscreen mode Exit fullscreen mode

2. Create the environment file:

$ nano .env
Enter fullscreen mode Exit fullscreen mode
DOMAIN=mlflow.example.com
LETSENCRYPT_EMAIL=admin@example.com

POSTGRES_USER=mlflow
POSTGRES_PASSWORD=StrongDatabasePassword123

MLFLOW_AUTH_CONFIG_PATH=/app/basic_auth.ini
MLFLOW_FLASK_SERVER_SECRET_KEY=GENERATED_SECRET_KEY

S3_BUCKET=mlflow-artifacts
S3_ACCESS_KEY=YOUR_ACCESS_KEY
S3_SECRET_KEY=YOUR_SECRET_KEY
S3_REGION=YOUR_REGION
S3_ENDPOINT=https://YOUR_OBJECT_STORAGE_ENDPOINT
Enter fullscreen mode Exit fullscreen mode

3. Create the basic-auth configuration:

$ nano basic_auth.ini
Enter fullscreen mode Exit fullscreen mode
[mlflow]
default_permission = READ
database_uri = sqlite:///basic_auth.db
admin_username = admin
admin_password = ADMIN_PASSWORD
authorization_function = mlflow.server.auth:authenticate_request_basic_auth
Enter fullscreen mode Exit fullscreen mode

4. Create a Dockerfile that adds the auth-server extras and Postgres/S3 clients to the official image:

$ nano Dockerfile
Enter fullscreen mode Exit fullscreen mode
FROM ghcr.io/mlflow/mlflow:v3.10.1

RUN pip install --no-cache-dir psycopg2-binary boto3 'mlflow[auth]'
Enter fullscreen mode Exit fullscreen mode

Deploy with Docker Compose

1. Create the Compose manifest:

$ nano docker-compose.yml
Enter fullscreen mode Exit fullscreen mode
services:
  traefik:
    image: traefik:v3.6
    container_name: traefik
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
      - "--certificatesresolvers.letsencrypt.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./letsencrypt:/letsencrypt"
    restart: unless-stopped

  postgres:
    image: postgres:15
    container_name: mlflow-postgres
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: mlflow
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
      interval: 10s
      retries: 5
    restart: unless-stopped

  mlflow:
    build: .
    container_name: mlflow
    expose:
      - "5000"
    environment:
      AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY}
      AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY}
      AWS_DEFAULT_REGION: ${S3_REGION}
      AWS_S3_FORCE_PATH_STYLE: "true"
      MLFLOW_S3_ENDPOINT_URL: ${S3_ENDPOINT}
      MLFLOW_AUTH_CONFIG_PATH: ${MLFLOW_AUTH_CONFIG_PATH}
      MLFLOW_FLASK_SERVER_SECRET_KEY: ${MLFLOW_FLASK_SERVER_SECRET_KEY}
    volumes:
      - ./basic_auth.ini:/app/basic_auth.ini:ro
      - ./mlflow_auth:/app
    command: >
      mlflow server
      --backend-store-uri "postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/mlflow"
      --default-artifact-root "s3://${S3_BUCKET}/"
      --serve-artifacts
      --host 0.0.0.0
      --port 5000
      --allowed-hosts "${DOMAIN},https://${DOMAIN}"
      --app-name basic-auth
    depends_on:
      postgres:
        condition: service_healthy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.mlflow.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.mlflow.entrypoints=websecure"
      - "traefik.http.routers.mlflow.tls.certresolver=letsencrypt"
    restart: unless-stopped
Enter fullscreen mode Exit fullscreen mode

2. Build and start the stack:

$ docker compose up -d --build
Enter fullscreen mode Exit fullscreen mode

3. Verify the services and tail logs:

$ docker compose ps
$ docker compose logs
Enter fullscreen mode Exit fullscreen mode

Sign In and Log a Sample Experiment

1. Open https://mlflow.example.com and authenticate with admin / the password from basic_auth.ini.

2. On the workstation, create a virtualenv and install dependencies:

$ sudo apt install python3-venv -y
$ python3 -m venv mlflow-env
$ source mlflow-env/bin/activate
$ pip install mlflow scikit-learn pandas numpy boto3
Enter fullscreen mode Exit fullscreen mode

3. Save a demo experiment script:

$ nano mlflow_demo.py
Enter fullscreen mode Exit fullscreen mode
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_diabetes
import pandas as pd
import numpy as np

diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.Series(diabetes.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

mlflow.set_tracking_uri("https://mlflow.example.com")
mlflow.set_experiment("official_demo_experiment")

with mlflow.start_run():
    alpha, l1_ratio = 0.5, 0.5
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)

    model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)

    mlflow.sklearn.log_model(model, "model")
Enter fullscreen mode Exit fullscreen mode

4. Export credentials and run the script:

$ export MLFLOW_TRACKING_USERNAME=admin
$ export MLFLOW_TRACKING_PASSWORD=ADMIN_PASSWORD
$ export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
$ export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
$ export MLFLOW_S3_ENDPOINT_URL=https://YOUR_OBJECT_STORAGE_ENDPOINT
$ python3 mlflow_demo.py
Enter fullscreen mode Exit fullscreen mode

Refresh the MLflow UI — the run appears under official_demo_experiment with logged parameters, metrics, and the persisted model artifact.


Next Steps

MLflow is running with PostgreSQL persistence and S3 artifacts. From here you can:

  • Register top runs in the Model Registry for staged promotion
  • Wire mlflow.autolog() into your training code for hands-off tracking
  • Add team users and per-experiment permissions in the basic-auth database

For the full guide with additional tips, visit the original article on Vultr Docs.

Top comments (0)