Machine learning projects rarely fail because the model is bad they fail when it hits production. A model might run perfectly on your laptop, but moving it into a stable, repeatable environment is a different story. Ad-hoc Docker commands, manual server setup, repeated SSH sessions, and fragile deployment processes turn every code change into a risky, time-consuming operation slowing iteration and making infrastructure management overshadow actual model improvement.
Most ML deployment guides assume access to GPUs, Kubernetes, or specialized MLOps platforms. They often gloss over the real-world headaches of shipping a model that just works. This tutorial takes a different approach: you’ll deploy a CPU-optimized ML API on a single cloud server, bake the model into a Docker container at build time, and automate updates with GitHub Actions. The result is a repeatable, reliable, low-cost workflow that works in real production not just in a notebook or local environment.
In this guide, you’ll build a production-ready deployment pipeline for a lightweight sentiment analysis API using FastAPI and Hugging Face, provision a Vultr Cloud Compute instance, and set up a CI/CD workflow that automatically updates your deployment on every push. By the end, you’ll have a workflow capable of taking any ML model from local development to a live, continuously deployed API.
Prerequisites
To follow along with this tutorial, ensure you have the following in place:
Access to an Ubuntu 24.04 server as a non-root user with sudo privileges
Docker installed on both your local machine and the server
A Docker Hub account for storing container images
A GitHub account with GitHub Actions enabled
Basic familiarity with FastAPI, Docker, and Git-based workflows
This guide focuses on automating machine learning deployments rather than covering server provisioning or Docker installation. Refer to the Vultr documentation if you need assistance preparing your environment.
NOTE
This tutorial focuses on CPU-based deployment. The FastAPI service and Hugging Face model are configured to run on CPU only, making it suitable for cost-effective Vultr Cloud Compute instances without GPU support.
Build a Sample ML Model API
Before automating deployment, you need a machine learning service that behaves predictably in production. In this section, you’ll build a CPU-only sentiment analysis API using FastAPI and Hugging Face Transformers, designed specifically for containerized deployment on Vultr.
The application performs inference only. The model is downloaded once, stored locally, and loaded at startup to ensure fast and reliable requests.
Project Structure
Use the following project layout:
deployment_ml/
├─ app/
│ ├─ main.py
│ ├─ model.py
├─ download_model.py
├─ requirements.txt
└─ models/
This structure keeps the API logic, model artifacts, and dependency management clearly separated.
Define Python Dependencies
Create a requirements.txt file with the following contents:
--extra-index-url https://download.pytorch.org/whl/cpu
fastapi==0.115.6
uvicorn[standard]==0.30.6
torch==2.9.1
transformers>=4.46.3
pydantic>=2.7,<3
python-dotenv==1.0.1
This configuration ensures:
- CPU-only PyTorch wheels are used
- Compatibility with FastAPI and Pydantic v2
- Predictable dependency resolution during Docker builds
Download the Model Locally
Instead of downloading the model at runtime, the application uses locally stored model artifacts. This improves startup time and avoids network dependencies during deployment.
Create a script named download_model.py:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import os
MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"
LOCAL_PATH = "./models/distilbert-sst2"
os.makedirs(LOCAL_PATH, exist_ok=True)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model.save_pretrained(LOCAL_PATH)
tokenizer.save_pretrained(LOCAL_PATH)
print(f"Model downloaded and saved to {LOCAL_PATH}")
Run the script once to populate the models/ directory.
Implement the Model Wrapper
Create app/model.py to load the model and handle inference:
from transformers import pipeline
import os
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DEVICE = -1 # Force CPU usage
LOCAL_MODEL_PATH = os.path.join(
os.path.dirname(__file__),
"../models/distilbert-sst2"
)
class SentimentModel:
def __init__(self):
logger.info(f"Loading sentiment model from {LOCAL_MODEL_PATH} ...")
self.pipeline = pipeline(
task="sentiment-analysis",
model=LOCAL_MODEL_PATH,
tokenizer=LOCAL_MODEL_PATH,
device=DEVICE
)
logger.info("Sentiment model loaded successfully.")
def predict(self, text: str):
result = self.pipeline(text)[0]
return {
"sentiment": result["label"].lower(),
"confidence": round(result["score"], 4)
}
sentiment_model = SentimentModel()
def predict_sentiment(text: str):
return sentiment_model.predict(text)
The model is initialized once at application startup, ensuring consistent performance under load.
Create the FastAPI Application
Create app/main.py:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from app.model import predict_sentiment
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(
title="ML Sentiment Analysis API",
description="CPU-only ML API deployed with CI/CD on Vultr",
version="1.0.0"
)
class PredictionRequest(BaseModel):
text: str = Field(..., min_length=1, example="I love this product")
class PredictionResponse(BaseModel):
sentiment: str
confidence: float
@app.get("/")
def health_check():
return {"status": "Hello from the automated ML deployment!"}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
try:
return predict_sentiment(request.text)
except Exception as e:
logger.error(f"Prediction failed: {e}")
raise HTTPException(
status_code=500,
detail="Sentiment prediction failed."
)
Install Dependencies and Run the API
- Create a virtual environment
$ python -m venv .venv
$ source .venv/bin/activate
- install all dependencies and update pip
pip install --upgrade pip
pip install -r requirements.txt
- Download the model:
python download_model.py
This will create a ./models/distilbert-sst2 folder containing the pretrained Hugging Face model and tokenizer.
- Start the API:
uvicorn app.main:app --host 0.0.0.0 --port 8000
- Test the service:
- Health check: Confirm the API is running
curl http://localhost:8000/
Output:
{
"status": "Hello from the automated ML deployment!"
}
- Sentiment prediction: Send a test request
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This deployment workflow is impressive"}'
Expected outcome: The API should return a JSON object with the predicted sentiment and confidence score, for example:
A successful response confirms the API is ready for containerization.
{
"sentiment": "positive",
"confidence": 0.9987
}
Outcome
At this point, you have:
- A CPU-optimized ML inference API
- Locally packaged Hugging Face model artifacts
- Deterministic dependency management
- A service ready to be containerized and deployed automatically
Dockerize the ML API
To deploy your ML service reliably on Vultr, we need to containerize the API. Docker ensures that the environment is consistent, reproducible, and isolated, which is essential for CI/CD pipelines and automated deployments.
This section will show you how to:
- Create a Docker image that includes your API and model
- Bake in the Hugging Face model to avoid runtime downloads
- Expose the service port for external access
- Create the Dockerfile
In the root of your project directory, create a file named Dockerfile and paste the following content:
# ---- Base image ----
FROM python:3.14-slim
# ---- Environment variables ----
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/.hf_cache
# ---- Set working directory ----
WORKDIR /app
# ---- System dependencies ----
RUN apt-get update && apt-get install -y \
git \
&& rm -rf /var/lib/apt/lists/*
# ---- Install Python dependencies ----
COPY requirements.txt ./
RUN pip install --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt
# ---- Copy application code ----
COPY app ./app
COPY download_model.py ./download_model.py
# ---- Download model at build time ----
RUN python download_model.py
# ---- Expose port ----
EXPOSE 8000
# ---- Run the app ----
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Key Notes:
-
TRANSFORMERS_CACHEpoints to a local folder for Hugging Face caching. -
download_model.pyruns during build, so the container already includes the model. - Port
8000is exposed for API access. -
--no-cache-dirensures Python packages don’t increase image size unnecessarily.
- Create the
.dockerignoreFile
In the root of your project directory, create a file named .dockerignore and paste the following content:
.venv
__pycache__
*.pyc
*.pyo
*.pyd
.git
.gitignore
.env
.cache
models/
[!NOTE]
models/is ignored becausedownload_model.pyensures the model is downloaded inside the container at build time.
- Build the Docker Image
Run the following command in your project root:
docker build -t sentiment-api .
[!NOTE]
The Docker image namesentiment-apiis used throughout this tutorial for simplicity. You can rename it to anything you like, ust make sure to use the same name consistently in subsequent commands, including GitHub Actions workflows or Docker Hub pushes.
Output
- The image is created with your Python dependencies and ML model
- Logs during build show
Loading…andModel downloaded and saved to /app/models/distilbert-sst2 - The image size is optimized due to
.dockerignoreand--no-cache-dir
- Run the Docker Container
Start the API in a container:
docker run -p 8000:8000 sentiment-api
Output
- The API runs inside the container
- Health check:
curl http://localhost:8000/
Output:
{"status": "Hello from the automated ML deployment!"}
- Test prediction:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Dockerized deployment is smooth!"}'
Output:
{
"sentiment": "positive",
"confidence": 0.9981
}
Provision a Vultr Instance
Before deploying your Dockerized ML API, ensure you have access to an Ubuntu 24.04 server as a non-root user with sudo privileges (as noted in the prerequisites).
This section shows how to add your SSH key to Vultr when creating or managing an instance and how to install Docker on your server.
- Create an SSH Key (If You Don’t Have One)
If you don’t already have an SSH key, generate one on your local machine:
ssh-keygen -t ed25519 -C "your_email@example.com"
- Press Enter to accept the default file location.
- Optionally, set a passphrase for added security.
Your public key will be saved as:
~/.ssh/id_ed25519.pub
- Copy Your Public Key
Run:
cat ~/.ssh/id_ed25519.pub
- The public key will appear in the terminal.
-
Select and copy the entire line (starts with
ssh-ed25519and ends with your email or usename). - This is the key you will add to your Vultr server for secure SSH access.
- Add SSH Key to Your Vultr Instance
When creating a new Vultr Cloud Compute instance or managing an existing one:
- Log in to your Vultr dashboard.
- Navigate to Account.
- Under SSH Keys, click Add SSH Key.
- Paste your public key from Step 2.
Once the server is deployed, note the IP address — you’ll use it to connect via SSH.
- Connect via SSH
Use your existing sudo-enabled user to log in:
ssh username@YOUR_VULTR_IP
- Replace
usernamewith your sudo-enabled user. - You are now ready to install Docker and deploy your ML API.
- Install Docker
Run each command line by line as your sudo user:
Update package lists:
sudo apt update
Install Docker and Docker Compose:
sudo apt install -y docker.io docker-compose
Enable Docker to start on boot:
sudo systemctl enable docker
Start Docker:
sudo systemctl start docker
Verify installation:
docker --version
Output:
Docker version 29.1.4, build 0e6fee6
Set up firewall rules for port 8000:
ufw allow 8000/tcp
ufw enable
- Optional — Test Docker
Run a quick test container:
docker run hello-world
You should see a confirmation message that Docker is installed and running correctly.
Set Up GitHub Actions Workflow
We’ll automate deployment using GitHub Actions. The workflow will:
- Build the Docker image
- Push it to Docker Hub
- SSH into your Vultr server and deploy the container
Before setting up the workflow, make sure you can push code to GitHub via SSH.
- First-Time SSH Setup for GitHub
To push your project over SSH, GitHub needs your SSH public key added to your account (only required once per machine).
- Copy your public key:
cat ~/.ssh/id_ed25519.pub
Copy the entire output (starts with
ssh-ed25519and ends with your email).Add the SSH key to GitHub:
Log in to GitHub → click your profile → Settings → SSH and GPG keys → New SSH key.
Give it a descriptive title (e.g., “Laptop key”).
Paste your public key in the Key field and click Add SSH key.
- Test the connection:
ssh -T git@github.com
Output:
Hi your-username! You've successfully authenticated, but GitHub does not provide shell access.
You’re now ready to push code to GitHub securely.
- Create a GitHub Repository
- Go to GitHub and create a new repository for your project.
- In your local project folder, initialize Git (if not already done):
git init
- Push Your Project to GitHub
Add your files, commit, and push to the new repository:
git add .
git commit -m "Initial commit"
git branch -M main
git remote add origin git@github.com:your-username/ml-api.git
git push -u origin main
Replace
your-usernameandml-apiwith your GitHub username and repository name.
- Add Repository Secrets
GitHub Actions needs secrets for Docker Hub and your Vultr server:
| Secret | Description |
|---|---|
DOCKER_USERNAME |
Your Docker Hub username |
DOCKER_PASSWORD |
Your Docker Hub password |
VULTR_HOST |
Vultr server IP |
VULTR_USER |
SSH username (sudo-enabled user) |
VULTR_SSH_KEY |
Private SSH key corresponding to the public key on the server |
VULTR_PORT |
SSH port (default 22) |
Add them via Settings → Secrets and Variables → Actions → New repository secret.
- Create the Workflow File
In your project root, create the directory and file:
.github/workflows/deploy.yml
Paste the following workflow code:
name: CI/CD Deploy to Vultr
on:
push:
branches:
- main
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Verify Docker username
run: |
if [ -z "${{ secrets.DOCKER_USERNAME }}" ]; then
echo "Error: DOCKER_USERNAME secret is empty!"
exit 1
fi
echo "Docker username is set ✅"
- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and Push Docker Image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
${{ secrets.DOCKER_USERNAME }}/sentiment-api:${{ github.run_number }}
- name: Deploy to Vultr
uses: appleboy/ssh-action@v0.1.6
with:
host: ${{ secrets.VULTR_HOST }}
username: ${{ secrets.VULTR_USER }}
key: ${{ secrets.VULTR_SSH_KEY }}
port: ${{ secrets.VULTR_PORT }}
script: |
docker pull ${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
docker stop sentiment-api || true
docker rm sentiment-api || true
docker run -d \
--name sentiment-api \
-p 8000:8000 \
--restart always \
${{ secrets.DOCKER_USERNAME }}/sentiment-api:latest
- Push Workflow to Trigger Deployment
git add .
git commit -m "Add CI/CD workflow"
git push origin main
- GitHub Actions will automatically run the workflow.
- Your Dockerized ML API will be built, pushed, and deployed to your Vultr server.
Test Automatic Deployment
With the GitHub Actions workflow in place, you can now verify that your Dockerized ML API deploys automatically to your Vultr server whenever you push changes.
- Make a Code Change
For example, you can update the health check message in app/main.py:
@app.get("/", tags=["Health"])
def health_check():
return {"status": "Hello from the automated ML deployment! tested and trusted ✅"}
- Save your changes locally.
- This minor update is enough to trigger the CI/CD workflow.
- Commit and Push
git add .
git commit -m "Update health check message"
git push origin main
- GitHub Actions will automatically detect the push.
- The workflow will build the Docker image, push it to Docker Hub, and deploy it to your Vultr server.
- Monitor Deployment
- Go to your repository → Actions tab.
- Click on the latest workflow run.
You’ll see step-by-step logs:
Checkout repository ✅
Set up Docker Buildx ✅
Log in to Docker Hub ✅
Build and push Docker image ✅
Deploy to Vultr ✅
Any errors will appear in the logs, making debugging straightforward.
- Verify on Vultr
Once the workflow completes, check your API:
curl http://YOUR_VULTR_IP:8000/
Output
{
"status": "Hello from the automated ML deployment! ✅"
}
Next, test the prediction endpoint:
curl -X POST http://YOUR_VULTR_IP:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "I love this product!"}'
Output:
{
"sentiment": "positive",
"confidence": 0.9987
}
- This confirms the latest code changes are live on your Vultr server.
- Each subsequent push to
mainwill automatically deploy updated containers.
Pro Tips:
- If the workflow fails, check the Actions logs for errors in Docker build, push, or SSH deployment.
- You can rename your container in the workflow if you want multiple APIs on the same server.
- For safety, consider adding a rollback strategy (optional) if a deployment introduces a bug.
Add a Rollback Strategy (Optional)
In case a deployment introduces an issue, you can quickly roll back to a previous version of the API using Docker image tags.
Each deployment pushes two tags to Docker Hub:
-
latest– the most recent deployment - A numbered tag (for example,
sentiment-api:42) representing a specific build
If the latest deployment fails, you can redeploy a previous version directly on your Vultr server.
- SSH into your server
- Stop and remove the current container
- Run a previous image tag
Stop and remove the current container:
docker stop sentiment-api
docker rm sentiment-api
Run the previous stable image:
docker run -d \
--name sentiment-api \
-p 8000:8000 \
--restart always \
your-docker-username/sentiment-api:42
This immediately restores the last stable version without rebuilding or modifying the CI/CD pipeline.
Conclusion
Automating machine learning deployments is often the missing link between experimentation and production. By combining Docker, GitHub Actions, and Vultr Cloud Compute, you can turn a local ML model into a continuously deployed, production-ready API with minimal operational overhead.
In this tutorial, you:
- Built a CPU-efficient ML API using FastAPI and Hugging Face
- Containerized the application for consistent deployments
- Prepared a Vultr server for hosting Dockerized workloads
- Implemented a CI/CD pipeline that deploys automatically on every push
- Verified live updates and added a lightweight rollback option
Vultr’s straightforward infrastructure and predictable pricing make it an excellent platform for deploying ML services—whether you’re shipping a prototype, running internal tools, or serving real users in production.
With this foundation in place, you can confidently extend the workflow to support additional models, environments, or scaling strategies as your MLOps needs grow.
Top comments (0)