Ayomide olofinsawe

Posted on Jan 15

From Local Model to Production API: A Practical CI/CD Workflow for Machine Learning

#machinelearning #mlops #docker #cicd

Machine learning projects rarely fail because the model is bad they fail when it hits production. A model might run perfectly on your laptop, but moving it into a stable, repeatable environment is a different story. Ad-hoc Docker commands, manual server setup, repeated SSH sessions, and fragile deployment processes turn every code change into a risky, time-consuming operation slowing iteration and making infrastructure management overshadow actual model improvement.

Most ML deployment guides assume access to GPUs, Kubernetes, or specialized MLOps platforms. They often gloss over the real-world headaches of shipping a model that just works. This tutorial takes a different approach: you’ll deploy a CPU-optimized ML API on a single cloud server, bake the model into a Docker container at build time, and automate updates with GitHub Actions. The result is a repeatable, reliable, low-cost workflow that works in real production not just in a notebook or local environment.

In this guide, you’ll build a production-ready deployment pipeline for a lightweight sentiment analysis API using FastAPI and Hugging Face, provision a Vultr Cloud Compute instance, and set up a CI/CD workflow that automatically updates your deployment on every push. By the end, you’ll have a workflow capable of taking any ML model from local development to a live, continuously deployed API.

Prerequisites

To follow along with this tutorial, ensure you have the following in place:

Access to an Ubuntu 24.04 server as a non-root user with sudo privileges
Docker installed on both your local machine and the server
A Docker Hub account for storing container images
A GitHub account with GitHub Actions enabled
Basic familiarity with FastAPI, Docker, and Git-based workflows

This guide focuses on automating machine learning deployments rather than covering server provisioning or Docker installation. Refer to the Vultr documentation if you need assistance preparing your environment.

NOTE
This tutorial focuses on CPU-based deployment. The FastAPI service and Hugging Face model are configured to run on CPU only, making it suitable for cost-effective Vultr Cloud Compute instances without GPU support.

Build a Sample ML Model API

Before automating deployment, you need a machine learning service that behaves predictably in production. In this section, you’ll build a CPU-only sentiment analysis API using FastAPI and Hugging Face Transformers, designed specifically for containerized deployment on Vultr.

The application performs inference only. The model is downloaded once, stored locally, and loaded at startup to ensure fast and reliable requests.

Project Structure

Use the following project layout:

deployment_ml/
├─ app/
│  ├─ main.py
│  ├─ model.py
├─ download_model.py
├─ requirements.txt
└─ models/

This structure keeps the API logic, model artifacts, and dependency management clearly separated.

Define Python Dependencies

Create a requirements.txt file with the following contents:

--extra-index-url https://download.pytorch.org/whl/cpu

fastapi==0.115.6
uvicorn[standard]==0.30.6

torch==2.9.1
transformers>=4.46.3

pydantic>=2.7,<3
python-dotenv==1.0.1

This configuration ensures:

CPU-only PyTorch wheels are used
Compatibility with FastAPI and Pydantic v2
Predictable dependency resolution during Docker builds

Download the Model Locally

Instead of downloading the model at runtime, the application uses locally stored model artifacts. This improves startup time and avoids network dependencies during deployment.

Create a script named download_model.py:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import os

MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"
LOCAL_PATH = "./models/distilbert-sst2"

os.makedirs(LOCAL_PATH, exist_ok=True)

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

model.save_pretrained(LOCAL_PATH)
tokenizer.save_pretrained(LOCAL_PATH)

print(f"Model downloaded and saved to {LOCAL_PATH}")

Run the script once to populate the models/ directory.

Implement the Model Wrapper

Create app/model.py to load the model and handle inference:

from transformers import pipeline
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

DEVICE = -1  # Force CPU usage

LOCAL_MODEL_PATH = os.path.join(
    os.path.dirname(__file__),
    "../models/distilbert-sst2"
)

class SentimentModel:
    def __init__(self):
        logger.info(f"Loading sentiment model from {LOCAL_MODEL_PATH} ...")
        self.pipeline = pipeline(
            task="sentiment-analysis",
            model=LOCAL_MODEL_PATH,
            tokenizer=LOCAL_MODEL_PATH,
            device=DEVICE
        )
        logger.info("Sentiment model loaded successfully.")

    def predict(self, text: str):
        result = self.pipeline(text)[0]
        return {
            "sentiment": result["label"].lower(),
            "confidence": round(result["score"], 4)
        }

sentiment_model = SentimentModel()

def predict_sentiment(text: str):
    return sentiment_model.predict(text)

The model is initialized once at application startup, ensuring consistent performance under load.

Create the FastAPI Application

Create app/main.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from app.model import predict_sentiment
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(
    title="ML Sentiment Analysis API",
    description="CPU-only ML API deployed with CI/CD on Vultr",
    version="1.0.0"
)

class PredictionRequest(BaseModel):
    text: str = Field(..., min_length=1, example="I love this product")

class PredictionResponse(BaseModel):
    sentiment: str
    confidence: float

@app.get("/")
def health_check():
    return {"status": "Hello from the automated ML deployment!"}

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    try:
        return predict_sentiment(request.text)
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(
            status_code=500,
            detail="Sentiment prediction failed."
        )

Install Dependencies and Run the API

Create a virtual environment

$ python -m venv .venv 
$ source .venv/bin/activate

install all dependencies and update pip

 pip install --upgrade pip
 pip install -r requirements.txt

Download the model:

 python download_model.py

This will create a ./models/distilbert-sst2 folder containing the pretrained Hugging Face model and tokenizer.

Start the API:

 uvicorn app.main:app --host 0.0.0.0 --port 8000

Test the service:

Health check: Confirm the API is running

curl http://localhost:8000/

Output:

{
  "status": "Hello from the automated ML deployment!"
}

Sentiment prediction: Send a test request

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "This deployment workflow is impressive"}'

Expected outcome: The API should return a JSON object with the predicted sentiment and confidence score, for example:
A successful response confirms the API is ready for containerization.

{
  "sentiment": "positive",
  "confidence": 0.9987
}

Outcome

At this point, you have:

A CPU-optimized ML inference API
Locally packaged Hugging Face model artifacts
Deterministic dependency management
A service ready to be containerized and deployed automatically

Dockerize the ML API

To deploy your ML service reliably on Vultr, we need to containerize the API. Docker ensures that the environment is consistent, reproducible, and isolated, which is essential for CI/CD pipelines and automated deployments.

This section will show you how to:

Create a Docker image that includes your API and model
Bake in the Hugging Face model to avoid runtime downloads
Expose the service port for external access

Create the Dockerfile

In the root of your project directory, create a file named Dockerfile and paste the following content:

# ---- Base image ----
FROM python:3.14-slim

# ---- Environment variables ----
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/.hf_cache

# ---- Set working directory ----
WORKDIR /app

# ---- System dependencies ----
RUN apt-get update && apt-get install -y \
    git \
    && rm -rf /var/lib/apt/lists/*

# ---- Install Python dependencies ----
COPY requirements.txt ./
RUN pip install --upgrade pip \
    && pip install --no-cache-dir -r requirements.txt

# ---- Copy application code ----
COPY app ./app
COPY download_model.py ./download_model.py

# ---- Download model at build time ----
RUN python download_model.py

# ---- Expose port ----
EXPOSE 8000

# ---- Run the app ----
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Key Notes:

TRANSFORMERS_CACHE points to a local folder for Hugging Face caching.
download_model.py runs during build, so the container already includes the model.
Port 8000 is exposed for API access.
--no-cache-dir ensures Python packages don’t increase image size unnecessarily.

Create the .dockerignore File

In the root of your project directory, create a file named .dockerignore and paste the following content:

.venv
__pycache__
*.pyc
*.pyo
*.pyd
.git
.gitignore
.env
.cache
models/

[!NOTE]
models/ is ignored because download_model.py ensures the model is downloaded inside the container at build time.

Build the Docker Image

Run the following command in your project root:

 docker build -t sentiment-api .

[!NOTE]
The Docker image name sentiment-api is used throughout this tutorial for simplicity. You can rename it to anything you like, ust make sure to use the same name consistently in subsequent commands, including GitHub Actions workflows or Docker Hub pushes.

Output

The image is created with your Python dependencies and ML model
Logs during build show Loading… and Model downloaded and saved to /app/models/distilbert-sst2
The image size is optimized due to .dockerignore and --no-cache-dir

Run the Docker Container

Start the API in a container:

 docker run -p 8000:8000 sentiment-api

Output

The API runs inside the container
Health check:

 curl http://localhost:8000/

Output:

{"status": "Hello from the automated ML deployment!"}

Test prediction:

 curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Dockerized deployment is smooth!"}'

Output:

{
  "sentiment": "positive",
  "confidence": 0.9981
}

Provision a Vultr Instance

Before deploying your Dockerized ML API, ensure you have access to an Ubuntu 24.04 server as a non-root user with sudo privileges (as noted in the prerequisites).

This section shows how to add your SSH key to Vultr when creating or managing an instance and how to install Docker on your server.

Create an SSH Key (If You Don’t Have One)

If you don’t already have an SSH key, generate one on your local machine:

 ssh-keygen -t ed25519 -C "your_email@example.com"

Press Enter to accept the default file location.
Optionally, set a passphrase for added security.

Your public key will be saved as:

~/.ssh/id_ed25519.pub

Copy Your Public Key

Run:

cat ~/.ssh/id_ed25519.pub

The public key will appear in the terminal.
Select and copy the entire line (starts with ssh-ed25519 and ends with your email or usename).
This is the key you will add to your Vultr server for secure SSH access.

Add SSH Key to Your Vultr Instance

When creating a new Vultr Cloud Compute instance or managing an existing one:

Log in to your Vultr dashboard.
Navigate to Account.
Under SSH Keys, click Add SSH Key.
Paste your public key from Step 2.

Once the server is deployed, note the IP address — you’ll use it to connect via SSH.

Connect via SSH

Use your existing sudo-enabled user to log in:

 ssh username@YOUR_VULTR_IP

Replace username with your sudo-enabled user.
You are now ready to install Docker and deploy your ML API.

Install Docker

Run each command line by line as your sudo user:

Update package lists:

 sudo apt update

Install Docker and Docker Compose:

 sudo apt install -y docker.io docker-compose

Enable Docker to start on boot:

 sudo systemctl enable docker

Start Docker:

 sudo systemctl start docker

Verify installation:

 docker --version

Output:

Docker version 29.1.4, build 0e6fee6

Set up firewall rules for port 8000:

 ufw allow 8000/tcp
 ufw enable

Optional — Test Docker

Run a quick test container:

 docker run hello-world

You should see a confirmation message that Docker is installed and running correctly.

Set Up GitHub Actions Workflow

We’ll automate deployment using GitHub Actions. The workflow will:

Build the Docker image
Push it to Docker Hub
SSH into your Vultr server and deploy the container

Before setting up the workflow, make sure you can push code to GitHub via SSH.

First-Time SSH Setup for GitHub

To push your project over SSH, GitHub needs your SSH public key added to your account (only required once per machine).

Copy your public key:

 cat ~/.ssh/id_ed25519.pub

Copy the entire output (starts with ssh-ed25519 and ends with your email).
Add the SSH key to GitHub:

Log in to GitHub → click your profile → Settings → SSH and GPG keys → New SSH key.
Give it a descriptive title (e.g., “Laptop key”).
Paste your public key in the Key field and click Add SSH key.

Test the connection:

 ssh -T git@github.com

Output:

Hi your-username! You've successfully authenticated, but GitHub does not provide shell access.

You’re now ready to push code to GitHub securely.

Create a GitHub Repository

Go to GitHub and create a new repository for your project.
In your local project folder, initialize Git (if not already done):

 git init

Push Your Project to GitHub

Add your files, commit, and push to the new repository:

 git add .
 git commit -m "Initial commit"
 git branch -M main
 git remote add origin git@github.com:your-username/ml-api.git
 git push -u origin main

Replace your-username and ml-api with your GitHub username and repository name.

Add Repository Secrets

GitHub Actions needs secrets for Docker Hub and your Vultr server:

Secret	Description
`DOCKER_USERNAME`	Your Docker Hub username
`DOCKER_PASSWORD`	Your Docker Hub password
`VULTR_HOST`	Vultr server IP
`VULTR_USER`	SSH username (sudo-enabled user)
`VULTR_SSH_KEY`	Private SSH key corresponding to the public key on the server
`VULTR_PORT`	SSH port (default 22)

Add them via Settings → Secrets and Variables → Actions → New repository secret.

Create the Workflow File

In your project root, create the directory and file:

.github/workflows/deploy.yml

Paste the following workflow code:

name: CI/CD Deploy to Vultr

on:
  push:
    branches:
      - main

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Verify Docker username
        run: |
          if [ -z "${{ secrets.DOCKER_USERNAME }}" ]; then
            echo "Error: DOCKER_USERNAME secret is empty!"
            exit 1
          fi
          echo "Docker username is set ✅"

      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Build and Push Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
            ${{ secrets.DOCKER_USERNAME }}/sentiment-api:${{ github.run_number }}

      - name: Deploy to Vultr
        uses: appleboy/ssh-action@v0.1.6
        with:
          host: ${{ secrets.VULTR_HOST }}
          username: ${{ secrets.VULTR_USER }}
          key: ${{ secrets.VULTR_SSH_KEY }}
          port: ${{ secrets.VULTR_PORT }}
          script: |
            docker pull ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
            docker stop sentiment-api || true
            docker rm sentiment-api || true
            docker run -d \
              --name sentiment-api \
              -p 8000:8000 \
              --restart always \
              ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest

Push Workflow to Trigger Deployment

 git add .
 git commit -m "Add CI/CD workflow"
 git push origin main

GitHub Actions will automatically run the workflow.
Your Dockerized ML API will be built, pushed, and deployed to your Vultr server.

Test Automatic Deployment

With the GitHub Actions workflow in place, you can now verify that your Dockerized ML API deploys automatically to your Vultr server whenever you push changes.

Make a Code Change

For example, you can update the health check message in app/main.py:

@app.get("/", tags=["Health"])
def health_check():
    return {"status": "Hello from the automated ML deployment! tested and trusted ✅"}

Save your changes locally.
This minor update is enough to trigger the CI/CD workflow.

Commit and Push

 git add .
 git commit -m "Update health check message"
 git push origin main

GitHub Actions will automatically detect the push.
The workflow will build the Docker image, push it to Docker Hub, and deploy it to your Vultr server.

Monitor Deployment

Go to your repository → Actions tab.
Click on the latest workflow run.
You’ll see step-by-step logs:
Checkout repository ✅
Set up Docker Buildx ✅
Log in to Docker Hub ✅
Build and push Docker image ✅
Deploy to Vultr ✅

Any errors will appear in the logs, making debugging straightforward.

Verify on Vultr

Once the workflow completes, check your API:

 curl http://YOUR_VULTR_IP:8000/

Output

{
  "status": "Hello from the automated ML deployment! ✅"
}

Next, test the prediction endpoint:

 curl -X POST http://YOUR_VULTR_IP:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "I love this product!"}'

Output:

{
  "sentiment": "positive",
  "confidence": 0.9987
}

This confirms the latest code changes are live on your Vultr server.
Each subsequent push to main will automatically deploy updated containers.

Pro Tips:

If the workflow fails, check the Actions logs for errors in Docker build, push, or SSH deployment.
You can rename your container in the workflow if you want multiple APIs on the same server.
For safety, consider adding a rollback strategy (optional) if a deployment introduces a bug.

Add a Rollback Strategy (Optional)

In case a deployment introduces an issue, you can quickly roll back to a previous version of the API using Docker image tags.

Each deployment pushes two tags to Docker Hub:

latest – the most recent deployment
A numbered tag (for example, sentiment-api:42) representing a specific build

If the latest deployment fails, you can redeploy a previous version directly on your Vultr server.

SSH into your server
Stop and remove the current container
Run a previous image tag

Stop and remove the current container:

docker stop sentiment-api
docker rm sentiment-api

Run the previous stable image:

docker run -d \
  --name sentiment-api \
  -p 8000:8000 \
  --restart always \
  your-docker-username/sentiment-api:42

This immediately restores the last stable version without rebuilding or modifying the CI/CD pipeline.

Conclusion

Automating machine learning deployments is often the missing link between experimentation and production. By combining Docker, GitHub Actions, and Vultr Cloud Compute, you can turn a local ML model into a continuously deployed, production-ready API with minimal operational overhead.

In this tutorial, you:

Built a CPU-efficient ML API using FastAPI and Hugging Face
Containerized the application for consistent deployments
Prepared a Vultr server for hosting Dockerized workloads
Implemented a CI/CD pipeline that deploys automatically on every push
Verified live updates and added a lightweight rollback option

Vultr’s straightforward infrastructure and predictable pricing make it an excellent platform for deploying ML services—whether you’re shipping a prototype, running internal tools, or serving real users in production.

With this foundation in place, you can confidently extend the workflow to support additional models, environments, or scaling strategies as your MLOps needs grow.

DEV Community

From Local Model to Production API: A Practical CI/CD Workflow for Machine Learning

Prerequisites

Build a Sample ML Model API

Project Structure

Define Python Dependencies

Download the Model Locally

Implement the Model Wrapper

Create the FastAPI Application

Install Dependencies and Run the API

Outcome

Dockerize the ML API

Provision a Vultr Instance

Set Up GitHub Actions Workflow

Test Automatic Deployment

Add a Rollback Strategy (Optional)

Conclusion

Top comments (0)