DEV Community

Cover image for From Local Model to Production API: A Practical CI/CD Workflow for Machine Learning
Ayomide olofinsawe
Ayomide olofinsawe

Posted on

From Local Model to Production API: A Practical CI/CD Workflow for Machine Learning

Machine learning projects rarely fail because the model is bad they fail when it hits production. A model might run perfectly on your laptop, but moving it into a stable, repeatable environment is a different story. Ad-hoc Docker commands, manual server setup, repeated SSH sessions, and fragile deployment processes turn every code change into a risky, time-consuming operation slowing iteration and making infrastructure management overshadow actual model improvement.

Most ML deployment guides assume access to GPUs, Kubernetes, or specialized MLOps platforms. They often gloss over the real-world headaches of shipping a model that just works. This tutorial takes a different approach: you’ll deploy a CPU-optimized ML API on a single cloud server, bake the model into a Docker container at build time, and automate updates with GitHub Actions. The result is a repeatable, reliable, low-cost workflow that works in real production not just in a notebook or local environment.

In this guide, you’ll build a production-ready deployment pipeline for a lightweight sentiment analysis API using FastAPI and Hugging Face, provision a Vultr Cloud Compute instance, and set up a CI/CD workflow that automatically updates your deployment on every push. By the end, you’ll have a workflow capable of taking any ML model from local development to a live, continuously deployed API.

Prerequisites

To follow along with this tutorial, ensure you have the following in place:

  • Access to an Ubuntu 24.04 server as a non-root user with sudo privileges

  • Docker installed on both your local machine and the server

  • A Docker Hub account for storing container images

  • A GitHub account with GitHub Actions enabled

  • Basic familiarity with FastAPI, Docker, and Git-based workflows

This guide focuses on automating machine learning deployments rather than covering server provisioning or Docker installation. Refer to the Vultr documentation if you need assistance preparing your environment.

NOTE
This tutorial focuses on CPU-based deployment. The FastAPI service and Hugging Face model are configured to run on CPU only, making it suitable for cost-effective Vultr Cloud Compute instances without GPU support.

Build a Sample ML Model API

Before automating deployment, you need a machine learning service that behaves predictably in production. In this section, you’ll build a CPU-only sentiment analysis API using FastAPI and Hugging Face Transformers, designed specifically for containerized deployment on Vultr.

The application performs inference only. The model is downloaded once, stored locally, and loaded at startup to ensure fast and reliable requests.

Project Structure

Use the following project layout:

deployment_ml/
├─ app/
│  ├─ main.py
│  ├─ model.py
├─ download_model.py
├─ requirements.txt
└─ models/
Enter fullscreen mode Exit fullscreen mode

This structure keeps the API logic, model artifacts, and dependency management clearly separated.

Define Python Dependencies

Create a requirements.txt file with the following contents:

--extra-index-url https://download.pytorch.org/whl/cpu

fastapi==0.115.6
uvicorn[standard]==0.30.6

torch==2.9.1
transformers>=4.46.3

pydantic>=2.7,<3
python-dotenv==1.0.1
Enter fullscreen mode Exit fullscreen mode

This configuration ensures:

  • CPU-only PyTorch wheels are used
  • Compatibility with FastAPI and Pydantic v2
  • Predictable dependency resolution during Docker builds

Download the Model Locally

Instead of downloading the model at runtime, the application uses locally stored model artifacts. This improves startup time and avoids network dependencies during deployment.

Create a script named download_model.py:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import os

MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"
LOCAL_PATH = "./models/distilbert-sst2"

os.makedirs(LOCAL_PATH, exist_ok=True)

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

model.save_pretrained(LOCAL_PATH)
tokenizer.save_pretrained(LOCAL_PATH)

print(f"Model downloaded and saved to {LOCAL_PATH}")
Enter fullscreen mode Exit fullscreen mode

Run the script once to populate the models/ directory.

Implement the Model Wrapper

Create app/model.py to load the model and handle inference:

from transformers import pipeline
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

DEVICE = -1  # Force CPU usage

LOCAL_MODEL_PATH = os.path.join(
    os.path.dirname(__file__),
    "../models/distilbert-sst2"
)

class SentimentModel:
    def __init__(self):
        logger.info(f"Loading sentiment model from {LOCAL_MODEL_PATH} ...")
        self.pipeline = pipeline(
            task="sentiment-analysis",
            model=LOCAL_MODEL_PATH,
            tokenizer=LOCAL_MODEL_PATH,
            device=DEVICE
        )
        logger.info("Sentiment model loaded successfully.")

    def predict(self, text: str):
        result = self.pipeline(text)[0]
        return {
            "sentiment": result["label"].lower(),
            "confidence": round(result["score"], 4)
        }

sentiment_model = SentimentModel()

def predict_sentiment(text: str):
    return sentiment_model.predict(text)
Enter fullscreen mode Exit fullscreen mode

The model is initialized once at application startup, ensuring consistent performance under load.

Create the FastAPI Application

Create app/main.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from app.model import predict_sentiment
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(
    title="ML Sentiment Analysis API",
    description="CPU-only ML API deployed with CI/CD on Vultr",
    version="1.0.0"
)

class PredictionRequest(BaseModel):
    text: str = Field(..., min_length=1, example="I love this product")

class PredictionResponse(BaseModel):
    sentiment: str
    confidence: float

@app.get("/")
def health_check():
    return {"status": "Hello from the automated ML deployment!"}

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    try:
        return predict_sentiment(request.text)
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(
            status_code=500,
            detail="Sentiment prediction failed."
        )
Enter fullscreen mode Exit fullscreen mode

Install Dependencies and Run the API

  1. Create a virtual environment
$ python -m venv .venv 
$ source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode
  1. install all dependencies and update pip
 pip install --upgrade pip
 pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
  1. Download the model:
 python download_model.py
Enter fullscreen mode Exit fullscreen mode

This will create a ./models/distilbert-sst2 folder containing the pretrained Hugging Face model and tokenizer.

  1. Start the API:
 uvicorn app.main:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode
  1. Test the service:
  • Health check: Confirm the API is running
curl http://localhost:8000/
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "status": "Hello from the automated ML deployment!"
}
Enter fullscreen mode Exit fullscreen mode
  • Sentiment prediction: Send a test request
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "This deployment workflow is impressive"}'
Enter fullscreen mode Exit fullscreen mode

Expected outcome: The API should return a JSON object with the predicted sentiment and confidence score, for example:
A successful response confirms the API is ready for containerization.

{
  "sentiment": "positive",
  "confidence": 0.9987
}
Enter fullscreen mode Exit fullscreen mode

Outcome

At this point, you have:

  • A CPU-optimized ML inference API
  • Locally packaged Hugging Face model artifacts
  • Deterministic dependency management
  • A service ready to be containerized and deployed automatically

Dockerize the ML API

To deploy your ML service reliably on Vultr, we need to containerize the API. Docker ensures that the environment is consistent, reproducible, and isolated, which is essential for CI/CD pipelines and automated deployments.

This section will show you how to:

  • Create a Docker image that includes your API and model
  • Bake in the Hugging Face model to avoid runtime downloads
  • Expose the service port for external access
  1. Create the Dockerfile

In the root of your project directory, create a file named Dockerfile and paste the following content:

# ---- Base image ----
FROM python:3.14-slim

# ---- Environment variables ----
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/.hf_cache

# ---- Set working directory ----
WORKDIR /app

# ---- System dependencies ----
RUN apt-get update && apt-get install -y \
    git \
    && rm -rf /var/lib/apt/lists/*

# ---- Install Python dependencies ----
COPY requirements.txt ./
RUN pip install --upgrade pip \
    && pip install --no-cache-dir -r requirements.txt

# ---- Copy application code ----
COPY app ./app
COPY download_model.py ./download_model.py

# ---- Download model at build time ----
RUN python download_model.py

# ---- Expose port ----
EXPOSE 8000

# ---- Run the app ----
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

Key Notes:

  • TRANSFORMERS_CACHE points to a local folder for Hugging Face caching.
  • download_model.py runs during build, so the container already includes the model.
  • Port 8000 is exposed for API access.
  • --no-cache-dir ensures Python packages don’t increase image size unnecessarily.
  1. Create the .dockerignore File

In the root of your project directory, create a file named .dockerignore and paste the following content:

.venv
__pycache__
*.pyc
*.pyo
*.pyd
.git
.gitignore
.env
.cache
models/
Enter fullscreen mode Exit fullscreen mode

[!NOTE]
models/ is ignored because download_model.py ensures the model is downloaded inside the container at build time.

  1. Build the Docker Image

Run the following command in your project root:

 docker build -t sentiment-api .
Enter fullscreen mode Exit fullscreen mode

[!NOTE]
The Docker image name sentiment-api is used throughout this tutorial for simplicity. You can rename it to anything you like, ust make sure to use the same name consistently in subsequent commands, including GitHub Actions workflows or Docker Hub pushes.

Output

  • The image is created with your Python dependencies and ML model
  • Logs during build show Loading… and Model downloaded and saved to /app/models/distilbert-sst2
  • The image size is optimized due to .dockerignore and --no-cache-dir
  1. Run the Docker Container

Start the API in a container:

 docker run -p 8000:8000 sentiment-api
Enter fullscreen mode Exit fullscreen mode

Output

  • The API runs inside the container
  • Health check:
 curl http://localhost:8000/
Enter fullscreen mode Exit fullscreen mode

Output:

{"status": "Hello from the automated ML deployment!"}
Enter fullscreen mode Exit fullscreen mode
  • Test prediction:
 curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Dockerized deployment is smooth!"}'
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "sentiment": "positive",
  "confidence": 0.9981
}
Enter fullscreen mode Exit fullscreen mode

Provision a Vultr Instance

Before deploying your Dockerized ML API, ensure you have access to an Ubuntu 24.04 server as a non-root user with sudo privileges (as noted in the prerequisites).

This section shows how to add your SSH key to Vultr when creating or managing an instance and how to install Docker on your server.

  1. Create an SSH Key (If You Don’t Have One)

If you don’t already have an SSH key, generate one on your local machine:

 ssh-keygen -t ed25519 -C "your_email@example.com"
Enter fullscreen mode Exit fullscreen mode
  • Press Enter to accept the default file location.
  • Optionally, set a passphrase for added security.

Your public key will be saved as:

~/.ssh/id_ed25519.pub
Enter fullscreen mode Exit fullscreen mode
  1. Copy Your Public Key

Run:

cat ~/.ssh/id_ed25519.pub
Enter fullscreen mode Exit fullscreen mode
  • The public key will appear in the terminal.
  • Select and copy the entire line (starts with ssh-ed25519 and ends with your email or usename).
  • This is the key you will add to your Vultr server for secure SSH access.
  1. Add SSH Key to Your Vultr Instance

When creating a new Vultr Cloud Compute instance or managing an existing one:

  • Log in to your Vultr dashboard.
  • Navigate to Account.
  • Under SSH Keys, click Add SSH Key.
  • Paste your public key from Step 2.

Once the server is deployed, note the IP address — you’ll use it to connect via SSH.

  1. Connect via SSH

Use your existing sudo-enabled user to log in:

 ssh username@YOUR_VULTR_IP
Enter fullscreen mode Exit fullscreen mode
  • Replace username with your sudo-enabled user.
  • You are now ready to install Docker and deploy your ML API.
  1. Install Docker

Run each command line by line as your sudo user:

Update package lists:

 sudo apt update
Enter fullscreen mode Exit fullscreen mode

Install Docker and Docker Compose:

 sudo apt install -y docker.io docker-compose
Enter fullscreen mode Exit fullscreen mode

Enable Docker to start on boot:

 sudo systemctl enable docker
Enter fullscreen mode Exit fullscreen mode

Start Docker:

 sudo systemctl start docker
Enter fullscreen mode Exit fullscreen mode

Verify installation:

 docker --version
Enter fullscreen mode Exit fullscreen mode

Output:

Docker version 29.1.4, build 0e6fee6
Enter fullscreen mode Exit fullscreen mode

Set up firewall rules for port 8000:

 ufw allow 8000/tcp
 ufw enable
Enter fullscreen mode Exit fullscreen mode
  1. Optional — Test Docker

Run a quick test container:

 docker run hello-world
Enter fullscreen mode Exit fullscreen mode

You should see a confirmation message that Docker is installed and running correctly.

Set Up GitHub Actions Workflow

We’ll automate deployment using GitHub Actions. The workflow will:

  • Build the Docker image
  • Push it to Docker Hub
  • SSH into your Vultr server and deploy the container

Before setting up the workflow, make sure you can push code to GitHub via SSH.

  1. First-Time SSH Setup for GitHub

To push your project over SSH, GitHub needs your SSH public key added to your account (only required once per machine).

  • Copy your public key:
 cat ~/.ssh/id_ed25519.pub
Enter fullscreen mode Exit fullscreen mode
  • Copy the entire output (starts with ssh-ed25519 and ends with your email).

  • Add the SSH key to GitHub:

Log in to GitHub → click your profile → Settings → SSH and GPG keys → New SSH key.
Give it a descriptive title (e.g., “Laptop key”).
Paste your public key in the Key field and click Add SSH key.

  • Test the connection:
 ssh -T git@github.com
Enter fullscreen mode Exit fullscreen mode

Output:

Hi your-username! You've successfully authenticated, but GitHub does not provide shell access.
Enter fullscreen mode Exit fullscreen mode

You’re now ready to push code to GitHub securely.

  1. Create a GitHub Repository
  • Go to GitHub and create a new repository for your project.
  • In your local project folder, initialize Git (if not already done):
 git init
Enter fullscreen mode Exit fullscreen mode
  1. Push Your Project to GitHub

Add your files, commit, and push to the new repository:

 git add .
 git commit -m "Initial commit"
 git branch -M main
 git remote add origin git@github.com:your-username/ml-api.git
 git push -u origin main
Enter fullscreen mode Exit fullscreen mode

Replace your-username and ml-api with your GitHub username and repository name.

  1. Add Repository Secrets

GitHub Actions needs secrets for Docker Hub and your Vultr server:

Secret Description
DOCKER_USERNAME Your Docker Hub username
DOCKER_PASSWORD Your Docker Hub password
VULTR_HOST Vultr server IP
VULTR_USER SSH username (sudo-enabled user)
VULTR_SSH_KEY Private SSH key corresponding to the public key on the server
VULTR_PORT SSH port (default 22)

Add them via Settings → Secrets and Variables → Actions → New repository secret.

  1. Create the Workflow File

In your project root, create the directory and file:

.github/workflows/deploy.yml
Enter fullscreen mode Exit fullscreen mode

Paste the following workflow code:

name: CI/CD Deploy to Vultr

on:
  push:
    branches:
      - main

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Verify Docker username
        run: |
          if [ -z "${{ secrets.DOCKER_USERNAME }}" ]; then
            echo "Error: DOCKER_USERNAME secret is empty!"
            exit 1
          fi
          echo "Docker username is set ✅"

      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Build and Push Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
            ${{ secrets.DOCKER_USERNAME }}/sentiment-api:${{ github.run_number }}

      - name: Deploy to Vultr
        uses: appleboy/ssh-action@v0.1.6
        with:
          host: ${{ secrets.VULTR_HOST }}
          username: ${{ secrets.VULTR_USER }}
          key: ${{ secrets.VULTR_SSH_KEY }}
          port: ${{ secrets.VULTR_PORT }}
          script: |
            docker pull ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
            docker stop sentiment-api || true
            docker rm sentiment-api || true
            docker run -d \
              --name sentiment-api \
              -p 8000:8000 \
              --restart always \
              ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
Enter fullscreen mode Exit fullscreen mode
  1. Push Workflow to Trigger Deployment
 git add .
 git commit -m "Add CI/CD workflow"
 git push origin main
Enter fullscreen mode Exit fullscreen mode
  • GitHub Actions will automatically run the workflow.
  • Your Dockerized ML API will be built, pushed, and deployed to your Vultr server.

Test Automatic Deployment

With the GitHub Actions workflow in place, you can now verify that your Dockerized ML API deploys automatically to your Vultr server whenever you push changes.

  1. Make a Code Change

For example, you can update the health check message in app/main.py:

@app.get("/", tags=["Health"])
def health_check():
    return {"status": "Hello from the automated ML deployment! tested and trusted ✅"}
Enter fullscreen mode Exit fullscreen mode
  • Save your changes locally.
  • This minor update is enough to trigger the CI/CD workflow.
  1. Commit and Push
 git add .
 git commit -m "Update health check message"
 git push origin main
Enter fullscreen mode Exit fullscreen mode
  • GitHub Actions will automatically detect the push.
  • The workflow will build the Docker image, push it to Docker Hub, and deploy it to your Vultr server.
  1. Monitor Deployment
  • Go to your repository → Actions tab.
  • Click on the latest workflow run.
  • You’ll see step-by-step logs:

  • Checkout repository ✅

  • Set up Docker Buildx ✅

  • Log in to Docker Hub ✅

  • Build and push Docker image ✅

  • Deploy to Vultr ✅

Any errors will appear in the logs, making debugging straightforward.

  1. Verify on Vultr

Once the workflow completes, check your API:

 curl http://YOUR_VULTR_IP:8000/
Enter fullscreen mode Exit fullscreen mode

Output

{
  "status": "Hello from the automated ML deployment! ✅"
}
Enter fullscreen mode Exit fullscreen mode

Next, test the prediction endpoint:

 curl -X POST http://YOUR_VULTR_IP:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "I love this product!"}'
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "sentiment": "positive",
  "confidence": 0.9987
}
Enter fullscreen mode Exit fullscreen mode
  • This confirms the latest code changes are live on your Vultr server.
  • Each subsequent push to main will automatically deploy updated containers.

Pro Tips:

  • If the workflow fails, check the Actions logs for errors in Docker build, push, or SSH deployment.
  • You can rename your container in the workflow if you want multiple APIs on the same server.
  • For safety, consider adding a rollback strategy (optional) if a deployment introduces a bug.

Add a Rollback Strategy (Optional)

In case a deployment introduces an issue, you can quickly roll back to a previous version of the API using Docker image tags.

Each deployment pushes two tags to Docker Hub:

  • latest – the most recent deployment
  • A numbered tag (for example, sentiment-api:42) representing a specific build

If the latest deployment fails, you can redeploy a previous version directly on your Vultr server.

  1. SSH into your server
  2. Stop and remove the current container
  3. Run a previous image tag

Stop and remove the current container:

docker stop sentiment-api
docker rm sentiment-api
Enter fullscreen mode Exit fullscreen mode

Run the previous stable image:

docker run -d \
  --name sentiment-api \
  -p 8000:8000 \
  --restart always \
  your-docker-username/sentiment-api:42
Enter fullscreen mode Exit fullscreen mode

This immediately restores the last stable version without rebuilding or modifying the CI/CD pipeline.

Conclusion

Automating machine learning deployments is often the missing link between experimentation and production. By combining Docker, GitHub Actions, and Vultr Cloud Compute, you can turn a local ML model into a continuously deployed, production-ready API with minimal operational overhead.

In this tutorial, you:

  • Built a CPU-efficient ML API using FastAPI and Hugging Face
  • Containerized the application for consistent deployments
  • Prepared a Vultr server for hosting Dockerized workloads
  • Implemented a CI/CD pipeline that deploys automatically on every push
  • Verified live updates and added a lightweight rollback option

Vultr’s straightforward infrastructure and predictable pricing make it an excellent platform for deploying ML services—whether you’re shipping a prototype, running internal tools, or serving real users in production.

With this foundation in place, you can confidently extend the workflow to support additional models, environments, or scaling strategies as your MLOps needs grow.

Top comments (0)