A hands-on, beginner-friendly guide to deploying, monitoring, and optimizing a Machine Learning model in production — from API creation to data drift detection.
Last Saturday morning, I was scrolling through freelance gig postings on Fiverr when I stumbled upon something that caught my attention. A small fintech startup was looking for someone to "take our trained credit scoring model and make it production-ready — API, Docker, CI/CD, the whole shebang." The budget was decent, the deadline was two weeks, and I thought: "How hard can it be?"
Spoiler: it was more involved than I expected. But by Sunday evening, I had a working end-to-end MLOps pipeline running, and I learned an incredible amount in the process. This article is the tutorial I wish I had before starting.
Whether you're a data science student, a junior ML engineer, or someone curious about what happens after a model is trained, this guide will walk you through every step — from serving predictions through an API to catching silent model failures in production.
What we'll build together:
- A prediction API using FastAPI that serves a credit scoring model
- Automated tests to make sure our API doesn't break
- A Docker container to package everything for deployment
- A CI/CD pipeline with GitHub Actions to automate testing and deployment
- A data drift analysis to monitor model health over time
- Performance optimizations to speed up inference
Let's dive in.
Table of Contents
- The Big Picture: What Is MLOps?
- Setting Up the Project
- Training a Simple Credit Scoring Model
- Creating a Prediction API with FastAPI
- Writing Automated Tests
- Containerizing with Docker
- Building a CI/CD Pipeline
- Logging Production Data
- Data Drift Detection
- Performance Optimization
- The Final Architecture
- Key Takeaways
1. The Big Picture: What Is MLOps?
Before we write a single line of code, let's understand the landscape.
The "Last Mile" Problem
Here's a reality check that most online courses don't tell you: training a model is only about 20% of the work in a real ML project. The remaining 80% is everything that happens around it — data pipelines, deployment, monitoring, maintenance.
This is what MLOps (Machine Learning Operations) is about. Think of it as DevOps, but specifically designed for the unique challenges of machine learning systems.

The MLOps lifecycle — training is just one piece of a much larger puzzle. (source: ml-ops.org)
What We'll Cover
| Pillar | What It Means | Tool We'll Use |
|---|---|---|
| Model Serving | Making predictions available via an API | FastAPI |
| Containerization | Packaging code + dependencies together | Docker |
| CI/CD | Automating tests and deployment | GitHub Actions |
| Monitoring | Watching model behavior in production | Evidently AI |
| Optimization | Making inference faster | cProfile, ONNX |
| Version Control | Tracking every change | Git + GitHub |
Each pillar solves a real problem. Without containerization, your code works on your machine but breaks on the server. Without CI/CD, every deployment is a manual, error-prone process. Without monitoring, your model silently degrades for months.
Let's start building.
2. Setting Up the Project
Good MLOps starts with good organization. Here's the structure we'll use:
credit-scoring-mlops/
│
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── model_loader.py # Model loading logic
│ └── schemas.py # Input/output validation
│
├── model/
│ └── credit_model.pkl # Trained model
│
├── tests/
│ ├── __init__.py
│ ├── test_api.py # API tests
│ └── test_model.py # Model tests
│
├── notebooks/
│ └── data_drift_analysis.ipynb
│
├── monitoring/
│ └── logger.py # Logging setup
│
├── .github/
│ └── workflows/
│ └── ci-cd.yml # CI/CD pipeline
│
├── Dockerfile
├── requirements.txt
├── .gitignore
└── README.md
Why this structure? When someone new joins the project (or when future-you comes back in 6 months), they immediately understand where everything lives. app/ = API code. tests/ = tests. model/ = model files. Simple.
Step by step — initialize the repo
First, create the folder structure:
mkdir credit-scoring-mlops && cd credit-scoring-mlops
mkdir -p app model tests notebooks monitoring .github/workflows
Then, initialize Git:
git init
Now let's create the .gitignore. This file tells Git which files to NOT track. We don't want to accidentally push passwords, large data files, or Python cache files:
cat > .gitignore << 'EOF'
# Python cache files
__pycache__/
*.py[cod]
# Virtual environments
.venv/
venv/
# Data files (too large for Git)
*.csv
*.parquet
data/
# Environment secrets (NEVER commit these)
.env
*.secret
# IDE files
.vscode/
.idea/
# OS files
.DS_Store
Thumbs.db
# Log files
*.log
logs/
EOF
Finally, make the first commit:
git add .gitignore
git commit -m "Initial commit: project structure and .gitignore"
Why does this matter? Every commit is a snapshot. If something breaks later, you can always go back to a working version. Commit messages should describe what changed — this creates a readable project history.
3. Training a Simple Credit Scoring Model
We need a trained model to deploy. In a real scenario, this would come from an earlier modeling phase (tracked with a tool like MLflow). We'll build a quick one here so the tutorial is self-contained.
3.1 — Generate synthetic data
We'll create fake credit data. Each row represents a loan applicant:
import pandas as pd
import numpy as np
np.random.seed(42) # For reproducibility
n_samples = 5000
data = pd.DataFrame({
'age': np.random.randint(21, 70, n_samples),
'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
'credit_history_length': np.random.randint(0, 30, n_samples),
'num_open_accounts': np.random.randint(1, 20, n_samples),
'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
'loan_amount': np.random.randint(1000, 50000, n_samples),
})
What's happening here? We're generating 5,000 fake applicants with 7 features each. np.random.seed(42) ensures you get the exact same data every time you run this — reproducibility is key in ML.
3.2 — Create the target variable
Now we need to decide who defaults (1) and who doesn't (0). We simulate this based on common-sense rules: more debt, more late payments → higher chance of default.
default_probability = (
0.15 * data['debt_to_income_ratio']
+ 0.1 * (data['num_late_payments'] / 10)
- 0.05 * (data['credit_history_length'] / 30)
+ 0.05 * (data['loan_amount'] / 50000)
- 0.05 * (data['annual_income'] / data['annual_income'].max())
)
# Keep probabilities in a reasonable range
default_probability = default_probability.clip(0.05, 0.95)
# Generate binary outcomes from these probabilities
data['default'] = np.random.binomial(1, default_probability)
print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")
3.3 — Split into training and test sets
We need two sets: one to train on, one to evaluate with. stratify=y ensures both sets have the same proportion of defaults.
from sklearn.model_selection import train_test_split
feature_columns = [
'age', 'annual_income', 'debt_to_income_ratio',
'credit_history_length', 'num_open_accounts',
'num_late_payments', 'loan_amount'
]
X = data[feature_columns]
y = data['default']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
3.4 — Build a scikit-learn Pipeline
Here's a key decision: we use a Pipeline that bundles preprocessing (scaling) and the model together into a single object. Why? Because when we deploy, we only need to load one file that handles everything.
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize features
('classifier', GradientBoostingClassifier(
n_estimators=100,
max_depth=4,
learning_rate=0.1,
random_state=42
)) # Step 2: Classify
])
What's a Pipeline? Think of it like an assembly line. Data goes in one end → gets scaled → gets classified → prediction comes out. The beauty is that pipeline.predict(new_data) automatically applies scaling first, then prediction. No extra steps needed.
3.5 — Train and evaluate
pipeline.fit(X_train, y_train)
That single line does all the work. Now let's see how good it is:
from sklearn.metrics import classification_report, roc_auc_score
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")
3.6 — Save the model and reference data
Two things to save:
- The model — this is what we'll deploy
- The reference data (training data) — we'll need this later to detect drift
import joblib
import os
os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)
# Save the trained pipeline
joblib.dump(pipeline, 'model/credit_model.pkl')
# Save reference data for drift analysis later
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)
print("Model saved to model/credit_model.pkl")
print("Reference data saved to data/")
Commit this progress:
git add train_model.py
git commit -m "feat: add model training script and initial model artifact"
3.7 — Full training script (copy-paste ready)
Here's everything from section 3 combined into a single runnable file:
Click to expand: train_model.py (complete)
# train_model.py — Complete model training script
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, roc_auc_score
import joblib
import os
# --- Generate synthetic data ---
np.random.seed(42)
n_samples = 5000
data = pd.DataFrame({
'age': np.random.randint(21, 70, n_samples),
'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
'credit_history_length': np.random.randint(0, 30, n_samples),
'num_open_accounts': np.random.randint(1, 20, n_samples),
'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
'loan_amount': np.random.randint(1000, 50000, n_samples),
})
# --- Create target ---
default_probability = (
0.15 * data['debt_to_income_ratio']
+ 0.1 * (data['num_late_payments'] / 10)
- 0.05 * (data['credit_history_length'] / 30)
+ 0.05 * (data['loan_amount'] / 50000)
- 0.05 * (data['annual_income'] / data['annual_income'].max())
)
default_probability = default_probability.clip(0.05, 0.95)
data['default'] = np.random.binomial(1, default_probability)
print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")
# --- Split ---
feature_columns = [
'age', 'annual_income', 'debt_to_income_ratio',
'credit_history_length', 'num_open_accounts',
'num_late_payments', 'loan_amount'
]
X = data[feature_columns]
y = data['default']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# --- Train ---
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', GradientBoostingClassifier(
n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42
))
])
pipeline.fit(X_train, y_train)
# --- Evaluate ---
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")
# --- Save ---
os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)
joblib.dump(pipeline, 'model/credit_model.pkl')
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)
print("\nModel and reference data saved.")
4. Creating a Prediction API with FastAPI
Now we get to the real MLOps work. We have a trained model sitting in a .pkl file. How do we let other people use it?
4.1 — What is an API?
An API (Application Programming Interface) is like a waiter in a restaurant. You (the client) tell the waiter what you want, the waiter goes to the kitchen (the model), and brings back your food (the prediction).
Without an API, anyone who wants a prediction needs to install Python, install all dependencies, download the model, and write code to use it. That doesn't scale.

A REST API acts as an intermediary between clients and your model. (source: SmartBear)
FastAPI is a modern Python framework for building APIs. It's fast, auto-generates documentation, and uses Python type hints for automatic validation. It's the go-to choice for ML engineers. (FastAPI docs)
4.2 — The critical rule: load your model ONCE
This is a mistake I see beginners make constantly. Watch:
# BAD — loads the model on EVERY request
@app.post("/predict")
async def predict(data):
model = joblib.load("model/credit_model.pkl") # SLOW! Every. Single. Time.
return model.predict(data)
If your model file is 50MB and you get 100 requests per second, you're loading 50MB from disk 100 times per second. The API will grind to a halt.
# GOOD — load once, reuse forever
model = None
def load_model():
global model
model = joblib.load("model/credit_model.pkl") # Loaded ONCE at startup
@app.post("/predict")
async def predict(data):
return model.predict(data) # Uses the already-loaded model
Let's build this properly.
4.3 — The model loader module
Create app/model_loader.py. This module has one job: load the model once and provide access to it.
# app/model_loader.py
import joblib
import os
First, we create a global variable to hold the model. It starts as None (nothing loaded yet):
_model = None
Now the function that loads the model from disk:
def load_model():
"""Load the model ONCE at startup."""
global _model
# Allow the path to be configured via environment variable
model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")
if not os.path.exists(model_path):
raise FileNotFoundError(
f"Model not found at {model_path}. "
f"Run train_model.py first."
)
_model = joblib.load(model_path)
print(f"Model loaded from {model_path}")
return _model
Why os.environ.get()? This makes the path flexible. On your laptop, the model is at model/credit_model.pkl. Inside a Docker container, it might be somewhere else. Environment variables let you configure this without changing code.
And a function to retrieve the loaded model:
def get_model():
"""Get the model that was loaded at startup."""
if _model is None:
raise RuntimeError("Model not loaded! Call load_model() first.")
return _model
4.4 — Input validation with Pydantic
In a Jupyter notebook, you control the data. In production, you have no idea what's coming in. Someone might send:
- Text where a number is expected
- Negative ages
- Missing fields entirely
Pydantic solves this. You define a schema (a "shape") for your data, and FastAPI automatically rejects anything that doesn't match. Let's build it piece by piece.
Create app/schemas.py:
from pydantic import BaseModel, Field, field_validator
Now define what a valid credit application looks like:
class CreditApplication(BaseModel):
age: int = Field(
..., # The ... means "this field is required"
ge=18, # ge = "greater than or equal to"
le=120, # le = "less than or equal to"
description="Applicant's age in years"
)
What's Field(..., ge=18, le=120)? It says: "This field is required, must be an integer, at least 18, at most 120." If someone sends age: -5, FastAPI automatically returns an error without you writing a single if statement.
Let's add the remaining fields the same way:
annual_income: int = Field(
..., gt=0, # gt = "greater than" (strictly positive)
description="Annual income in dollars"
)
debt_to_income_ratio: float = Field(
..., ge=0.0, le=10.0,
description="Monthly debt / monthly income"
)
credit_history_length: int = Field(
..., ge=0, le=80,
description="Credit history in years"
)
num_open_accounts: int = Field(
..., ge=0, le=100,
description="Number of open credit accounts"
)
num_late_payments: int = Field(
..., ge=0,
description="Number of late payments"
)
loan_amount: int = Field(
..., gt=0,
description="Requested loan amount in dollars"
)
We can also add custom business logic validation. For example, you can't have 30 years of credit history if you're 25 years old:
@field_validator('credit_history_length')
@classmethod
def history_cannot_exceed_age(cls, v, info):
if 'age' in info.data and v > info.data['age'] - 18:
raise ValueError(
f"Credit history ({v}y) can't exceed age minus 18"
)
return v
Now define what the API returns:
class PredictionResponse(BaseModel):
prediction: int = Field(description="0 = No Default, 1 = Default")
probability_of_default: float = Field(description="Probability from 0.0 to 1.0")
risk_category: str = Field(description="Low, Medium, or High")
class HealthResponse(BaseModel):
status: str
model_loaded: bool
4.5 — The FastAPI application
Now we wire everything together. Create app/main.py:
Start with imports:
from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone
from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model
Set up logging (we'll use this for monitoring later):
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")
Define what happens when the app starts up. This is where we load the model once:
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Runs at startup (before yield) and shutdown (after yield)."""
logger.info("Starting up — loading model...")
load_model()
logger.info("Model loaded. Ready to serve predictions.")
yield
logger.info("Shutting down.")
What's @asynccontextmanager? It's FastAPI's way of saying "run this code when the app starts, and that code when it stops." The model gets loaded once, right at startup.
Create the app:
app = FastAPI(
title="Credit Scoring API",
description="Predict loan default probability",
version="1.0.0",
lifespan=lifespan,
)
Add a health check endpoint. Every production API needs one — it's how load balancers and monitoring tools know the service is alive:
@app.get("/health", response_model=HealthResponse)
async def health_check():
try:
model = get_model()
return HealthResponse(status="healthy", model_loaded=True)
except RuntimeError:
return HealthResponse(status="unhealthy", model_loaded=False)
Now the star of the show — the prediction endpoint:
@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
start_time = time.time()
try:
model = get_model()
Convert the validated input into a DataFrame (what scikit-learn expects):
input_data = pd.DataFrame([{
'age': application.age,
'annual_income': application.annual_income,
'debt_to_income_ratio': application.debt_to_income_ratio,
'credit_history_length': application.credit_history_length,
'num_open_accounts': application.num_open_accounts,
'num_late_payments': application.num_late_payments,
'loan_amount': application.loan_amount,
}])
Get the prediction and probability:
prediction = int(model.predict(input_data)[0])
probability = float(model.predict_proba(input_data)[0][1])
What's predict_proba()? It returns probabilities instead of just 0/1. For example, [0.82, 0.18] means 82% chance of no default, 18% chance of default. The [0][1] grabs the probability of default (class 1).
Map the probability to a human-readable risk level:
if probability < 0.3:
risk_category = "Low"
elif probability < 0.6:
risk_category = "Medium"
else:
risk_category = "High"
Calculate how long the prediction took, and log everything:
inference_time_ms = (time.time() - start_time) * 1000
log_entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"event": "prediction",
"inputs": application.model_dump(),
"outputs": {
"prediction": prediction,
"probability_of_default": round(probability, 4),
"risk_category": risk_category,
},
"inference_time_ms": round(inference_time_ms, 2),
}
logger.info(json.dumps(log_entry))
Why log all of this? These logs are gold. They'll be used later for drift detection (comparing production inputs to training data), performance monitoring (is the API getting slower?), and debugging (what happened when a wrong prediction was made?).
Return the response:
return PredictionResponse(
prediction=prediction,
probability_of_default=round(probability, 4),
risk_category=risk_category,
)
except Exception as e:
logger.error(json.dumps({
"timestamp": datetime.now(timezone.utc).isoformat(),
"event": "prediction_error",
"error": str(e),
"inputs": application.model_dump(),
}))
raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")
4.6 — Test it locally
pip install fastapi uvicorn scikit-learn joblib pandas pydantic
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Open http://localhost:8000/docs — FastAPI auto-generates interactive Swagger documentation:

FastAPI generates this Swagger UI automatically. You can test your API right from the browser.
Test with curl:
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"annual_income": 55000,
"debt_to_income_ratio": 0.35,
"credit_history_length": 12,
"num_open_accounts": 5,
"num_late_payments": 2,
"loan_amount": 15000
}'
4.7 — Full API code (copy-paste ready)
Click to expand: app/model_loader.py (complete)
# app/model_loader.py
import joblib
import os
_model = None
def load_model():
global _model
model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")
if not os.path.exists(model_path):
raise FileNotFoundError(f"Model not found at {model_path}")
_model = joblib.load(model_path)
print(f"Model loaded from {model_path}")
return _model
def get_model():
if _model is None:
raise RuntimeError("Model not loaded! Call load_model() first.")
return _model
Click to expand: app/schemas.py (complete)
# app/schemas.py
from pydantic import BaseModel, Field, field_validator
class CreditApplication(BaseModel):
age: int = Field(..., ge=18, le=120, description="Applicant's age")
annual_income: int = Field(..., gt=0, description="Annual income ($)")
debt_to_income_ratio: float = Field(..., ge=0.0, le=10.0, description="DTI ratio")
credit_history_length: int = Field(..., ge=0, le=80, description="Credit history (years)")
num_open_accounts: int = Field(..., ge=0, le=100, description="Open accounts")
num_late_payments: int = Field(..., ge=0, description="Late payments")
loan_amount: int = Field(..., gt=0, description="Loan amount ($)")
@field_validator('credit_history_length')
@classmethod
def history_cannot_exceed_age(cls, v, info):
if 'age' in info.data and v > info.data['age'] - 18:
raise ValueError(f"Credit history ({v}y) can't exceed age minus 18")
return v
class PredictionResponse(BaseModel):
prediction: int = Field(description="0=No Default, 1=Default")
probability_of_default: float = Field(description="Probability 0.0 to 1.0")
risk_category: str = Field(description="Low, Medium, or High")
class HealthResponse(BaseModel):
status: str
model_loaded: bool
Click to expand: app/main.py (complete)
# app/main.py
from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone
from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")
@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("Starting up — loading model...")
load_model()
logger.info("Model loaded. Ready.")
yield
logger.info("Shutting down.")
app = FastAPI(
title="Credit Scoring API",
description="Predict loan default probability",
version="1.0.0",
lifespan=lifespan,
)
@app.get("/health", response_model=HealthResponse)
async def health_check():
try:
model = get_model()
return HealthResponse(status="healthy", model_loaded=True)
except RuntimeError:
return HealthResponse(status="unhealthy", model_loaded=False)
@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
start_time = time.time()
try:
model = get_model()
input_data = pd.DataFrame([{
'age': application.age,
'annual_income': application.annual_income,
'debt_to_income_ratio': application.debt_to_income_ratio,
'credit_history_length': application.credit_history_length,
'num_open_accounts': application.num_open_accounts,
'num_late_payments': application.num_late_payments,
'loan_amount': application.loan_amount,
}])
prediction = int(model.predict(input_data)[0])
probability = float(model.predict_proba(input_data)[0][1])
if probability < 0.3:
risk_category = "Low"
elif probability < 0.6:
risk_category = "Medium"
else:
risk_category = "High"
inference_time_ms = (time.time() - start_time) * 1000
log_entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"event": "prediction",
"inputs": application.model_dump(),
"outputs": {
"prediction": prediction,
"probability_of_default": round(probability, 4),
"risk_category": risk_category,
},
"inference_time_ms": round(inference_time_ms, 2),
}
logger.info(json.dumps(log_entry))
return PredictionResponse(
prediction=prediction,
probability_of_default=round(probability, 4),
risk_category=risk_category,
)
except Exception as e:
logger.error(json.dumps({
"timestamp": datetime.now(timezone.utc).isoformat(),
"event": "prediction_error",
"error": str(e),
"inputs": application.model_dump(),
}))
raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")
git add app/
git commit -m "feat: implement FastAPI prediction API with validation and logging"
5. Writing Automated Tests
Why test?
Imagine you change one line of code that accidentally breaks input validation. Without tests, this bug goes to production. With tests in your CI/CD pipeline, the bug gets caught before it ever reaches users.
We'll write two kinds:
- Unit tests: test individual functions ("does the model return a valid prediction?")
- Integration tests: test the full flow ("does the API endpoint respond correctly?")
5.1 — Setting up the test client
FastAPI provides a TestClient that simulates HTTP requests without starting a real server:
# tests/test_api.py
import pytest
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
What's TestClient? It's a fake browser. You can send GET and POST requests to your API and check the responses, all without starting a server. Tests run in milliseconds.
5.2 — Test the health endpoint
The simplest test first:
def test_health_returns_200():
response = client.get("/health")
assert response.status_code == 200
def test_health_reports_model_loaded():
response = client.get("/health")
data = response.json()
assert data["status"] == "healthy"
assert data["model_loaded"] is True
What's assert? It means "this must be true, or the test fails." assert response.status_code == 200 says "the server must return HTTP 200 (OK)."
5.3 — Test valid predictions
Now let's test the actual prediction endpoint with good data:
def test_valid_prediction_returns_200():
payload = {
"age": 35, "annual_income": 55000,
"debt_to_income_ratio": 0.35, "credit_history_length": 12,
"num_open_accounts": 5, "num_late_payments": 2,
"loan_amount": 15000
}
response = client.post("/predict", json=payload)
assert response.status_code == 200
Check that the response contains all expected fields:
def test_response_has_required_fields():
payload = {
"age": 45, "annual_income": 80000,
"debt_to_income_ratio": 0.20, "credit_history_length": 20,
"num_open_accounts": 3, "num_late_payments": 0,
"loan_amount": 10000
}
response = client.post("/predict", json=payload)
data = response.json()
assert "prediction" in data
assert "probability_of_default" in data
assert "risk_category" in data
Check that values are in expected ranges:
def test_prediction_is_binary():
payload = {
"age": 30, "annual_income": 40000,
"debt_to_income_ratio": 0.50, "credit_history_length": 5,
"num_open_accounts": 8, "num_late_payments": 4,
"loan_amount": 20000
}
response = client.post("/predict", json=payload)
data = response.json()
assert data["prediction"] in [0, 1]
def test_probability_between_0_and_1():
payload = {
"age": 28, "annual_income": 35000,
"debt_to_income_ratio": 0.60, "credit_history_length": 3,
"num_open_accounts": 6, "num_late_payments": 5,
"loan_amount": 25000
}
response = client.post("/predict", json=payload)
data = response.json()
assert 0.0 <= data["probability_of_default"] <= 1.0
def test_risk_category_is_valid():
payload = {
"age": 50, "annual_income": 100000,
"debt_to_income_ratio": 0.10, "credit_history_length": 25,
"num_open_accounts": 2, "num_late_payments": 0,
"loan_amount": 5000
}
response = client.post("/predict", json=payload)
data = response.json()
assert data["risk_category"] in ["Low", "Medium", "High"]
5.4 — Test invalid inputs
This is equally important. Our API should reject bad data with a clear error (HTTP 422):
def test_negative_age_rejected():
"""Age of -5 should be rejected."""
payload = {
"age": -5, # INVALID
"annual_income": 50000, "debt_to_income_ratio": 0.30,
"credit_history_length": 10, "num_open_accounts": 3,
"num_late_payments": 1, "loan_amount": 10000
}
response = client.post("/predict", json=payload)
assert response.status_code == 422 # 422 = Unprocessable Entity
def test_zero_income_rejected():
"""Income must be strictly positive."""
payload = {
"age": 30, "annual_income": 0, # INVALID
"debt_to_income_ratio": 0.30, "credit_history_length": 10,
"num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
}
response = client.post("/predict", json=payload)
assert response.status_code == 422
def test_missing_field_rejected():
"""Omitting a required field should fail."""
payload = {
"age": 30,
# annual_income is MISSING
"debt_to_income_ratio": 0.30, "credit_history_length": 10,
"num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
}
response = client.post("/predict", json=payload)
assert response.status_code == 422
def test_wrong_type_rejected():
"""Sending text where a number is expected should fail."""
payload = {
"age": "thirty", # WRONG TYPE
"annual_income": 50000, "debt_to_income_ratio": 0.30,
"credit_history_length": 10, "num_open_accounts": 3,
"num_late_payments": 1, "loan_amount": 10000
}
response = client.post("/predict", json=payload)
assert response.status_code == 422
5.5 — Test the model directly
We also test the model itself, separately from the API:
# tests/test_model.py
import joblib
import pandas as pd
def test_model_loads():
model = joblib.load("model/credit_model.pkl")
assert model is not None
def test_model_has_predict():
model = joblib.load("model/credit_model.pkl")
assert hasattr(model, 'predict')
assert hasattr(model, 'predict_proba')
def test_model_returns_one_prediction():
model = joblib.load("model/credit_model.pkl")
test_input = pd.DataFrame([{
'age': 35, 'annual_income': 55000,
'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
'num_open_accounts': 5, 'num_late_payments': 2,
'loan_amount': 15000
}])
prediction = model.predict(test_input)
assert len(prediction) == 1
5.6 — Run the tests
pip install pytest httpx
pytest tests/ -v
You should see all tests passing with green check marks.
git add tests/
git commit -m "feat: add unit and integration tests for API and model"
6. Containerizing with Docker
What is Docker?
Your API works on your laptop. But will it work on a server? Maybe the server has a different Python version, or a missing library.
Docker packages your application with its entire environment — OS, Python, libraries, everything — into a self-contained unit called a container. Think of it like shipping your laptop inside the package instead of just the code.

Docker containers package your app with everything it needs. (source: docker.com)
6.1 — The requirements file
First, list all Python dependencies. Pin the versions — without this, a library update could silently break things months later:
# requirements.txt
fastapi>=0.104.0
uvicorn>=0.24.0
scikit-learn>=1.3.0
joblib>=1.3.0
pandas>=2.0.0
pydantic>=2.0.0
numpy>=1.24.0
6.2 — The Dockerfile, line by line
A Dockerfile is a recipe. Each line is one instruction:
FROM python:3.11-slim
What this does: Start from a minimal Python 3.11 image. The -slim variant is ~150MB instead of ~900MB. Less bloat = faster builds.
WORKDIR /app
What this does: All subsequent commands run inside /app in the container. Like doing cd /app.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
What this does: Copy the requirements file and install dependencies. Why copy this file separately? Docker caches each step. If requirements.txt hasn't changed, Docker skips the slow pip install on rebuilds. This can save minutes.
COPY app/ ./app/
COPY model/ ./model/
What this does: Copy our application code and model into the container.
EXPOSE 8000
What this does: Documents which port the container uses. It doesn't actually open the port — that's done at runtime.
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
What this does: The command that runs when the container starts. --host 0.0.0.0 means "listen on all network interfaces" — required inside a container (using localhost won't work from outside).
6.3 — Build and run
# Build the image (give it a name with -t)
docker build -t credit-scoring-api .
# Run the container
# -p 8000:8000 maps your machine's port 8000 to the container's port 8000
docker run -p 8000:8000 credit-scoring-api
# Test it
curl http://localhost:8000/health
6.4 — Full Dockerfile (copy-paste ready)
Click to expand: Dockerfile (complete)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
COPY model/ ./model/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
git add Dockerfile requirements.txt
git commit -m "feat: add Dockerfile for containerized deployment"
7. Building a CI/CD Pipeline with GitHub Actions
What is CI/CD?
- Continuous Integration (CI): Every time you push code, tests run automatically
- Continuous Deployment (CD): If tests pass, the code gets deployed automatically
Without CI/CD: you manually run tests (if you remember), manually build Docker, manually deploy. With CI/CD: push code → everything happens automatically. (GitHub Actions docs)
7.1 — The pipeline structure
Our pipeline has 3 stages that run in sequence:
PUSH to main
│
▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ TEST │ ──▶ │ BUILD │ ──▶ │ DEPLOY │
│ pytest │ │ docker │ │ push to │
│ │ │ build │ │ registry │
└──────────┘ └──────────┘ └──────────┘
│ │ │
If FAIL: If FAIL: If FAIL:
STOP HERE STOP HERE STOP HERE
Key insight: If tests fail, we don't waste time building. If building fails, we don't try to deploy. Each stage only runs if the previous one succeeded.
7.2 — The YAML file, block by block
Create .github/workflows/ci-cd.yml:
First, define when the pipeline runs:
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
This means: "Run on every push to main and on every pull request targeting main."
Stage 1 — TEST:
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest httpx
- name: Generate model
run: python train_model.py
- name: Run tests
run: pytest tests/ -v --tb=short
What's happening? GitHub spins up a fresh Ubuntu machine, installs Python, installs our dependencies, trains the model (in a real project you'd download it from a model registry), and runs pytest.
Stage 2 — BUILD (only runs if Stage 1 passes):
build:
runs-on: ubuntu-latest
needs: test # <-- This is the key: "needs" means "wait for test"
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Generate model
run: |
pip install scikit-learn pandas joblib numpy
python train_model.py
- name: Build Docker image
run: docker build -t credit-scoring-api:${{ github.sha }} .
- name: Smoke test the container
run: |
docker run -d -p 8000:8000 --name test-api credit-scoring-api:${{ github.sha }}
sleep 10
curl --fail http://localhost:8000/health || exit 1
docker stop test-api
What's a smoke test? We start the container and hit the health endpoint. If it doesn't respond, something is broken. curl --fail returns an error if the HTTP response indicates failure.
Stage 3 — DEPLOY (only from main branch, only if build passed):
deploy:
runs-on: ubuntu-latest
needs: build
if: github.ref == 'refs/heads/main' # Only deploy from main
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Generate model
run: |
pip install scikit-learn pandas joblib numpy
python train_model.py
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Build and push
run: |
docker build -t ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest .
docker push ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest
What are secrets? Credentials stored securely in GitHub Settings → Secrets. They're never visible in logs. Never hardcode passwords in your code or YAML files.
git add .github/
git commit -m "feat: add CI/CD pipeline with GitHub Actions"
8. Logging Production Data
Why log everything?
Once your model is live, you're flying blind unless you collect data. Logging serves:
- Debugging: When something breaks at 2 AM, logs tell you what happened
- Drift detection: Compare production inputs against training data
- Performance: Track how fast (or slow) predictions are
- Auditing: In finance, you need records of every decision
8.1 — What to store
| What | Why | Example |
|---|---|---|
| Input features | Drift detection | age: 35, income: 55000, ... |
| Prediction | Performance monitoring | prediction: 0 |
| Probability | Score distribution | probability: 0.18 |
| Inference time | Latency monitoring | 12.3 ms |
| Timestamp | Time-based analysis | 2024-03-15T14:30:00Z |
| Errors | Debugging | ValueError: ... |
8.2 — Structured logging
We already added logging in our main.py (section 4.5). The key is using JSON format — it's machine-parseable, so monitoring tools (ELK Stack, Datadog, CloudWatch) can automatically index every field.
Here's a dedicated logger with file rotation (prevents logs from filling up the disk):
# monitoring/logger.py
import logging
import os
from logging.handlers import RotatingFileHandler
def setup_production_logger(name="credit_scoring_api", log_dir="logs"):
os.makedirs(log_dir, exist_ok=True)
logger = logging.getLogger(name)
logger.setLevel(logging.INFO)
if logger.handlers: # Avoid duplicates
return logger
The file handler rotates logs when they reach 10 MB, keeping 5 old files:
file_handler = RotatingFileHandler(
filename=os.path.join(log_dir, "predictions.log"),
maxBytes=10_000_000, # 10 MB per file
backupCount=5, # Keep 5 rotated files
)
console_handler = logging.StreamHandler()
# Both handlers use the same format
formatter = logging.Formatter('%(asctime)s - %(name)s - %(message)s')
file_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.addHandler(console_handler)
return logger
In production, these log files would be shipped to a centralized logging platform (Elasticsearch, Datadog, CloudWatch). For our tutorial, local files are enough — the important thing is that we're capturing the data.
git add monitoring/
git commit -m "feat: add production logging with file rotation"
9. Data Drift Detection with Evidently
What is data drift?
Your model learned patterns from training data. But the real world changes. People's financial behavior shifts due to economic events, policy changes, or seasonal patterns.
Data drift = the data your model receives in production starts looking significantly different from what it was trained on. This is your early warning system — it tells you when the model might need retraining.
When production data drifts away from training data, model performance can degrade. (source: Evidently AI)
Types of drift
| Type | What changes | Example |
|---|---|---|
| Data drift | Input distributions | Average income of applicants increases |
| Concept drift | Feature-target relationship | Same income now means higher default risk |
| Prediction drift | Output distribution | Model starts predicting more defaults |
9.1 — Load the reference data
The reference is our training data — what the model was built on:
import pandas as pd
import numpy as np
reference_data = pd.read_csv('data/reference_data.csv')
print(f"Reference data: {reference_data.shape}")
9.2 — Simulate production data with drift
In real life, this would come from your API logs. Here we simulate production data that has intentional drift — so we can see the detection in action:
np.random.seed(123)
n_production = 1000
current_data = pd.DataFrame({
# DRIFT: younger applicants (new customer segment)
'age': np.random.randint(18, 55, n_production),
# DRIFT: higher incomes (economic growth)
'annual_income': np.random.lognormal(
mean=10.8, sigma=0.7, size=n_production # Was 10.5
).astype(int),
# DRIFT: higher debt ratios (inflation effect)
'debt_to_income_ratio': np.random.uniform(
0.1, 1.8, n_production # Was 0 to 1.5
).round(3),
# STABLE: no significant change
'credit_history_length': np.random.randint(0, 30, n_production),
'num_open_accounts': np.random.randint(1, 20, n_production),
# SLIGHT DRIFT: more late payments
'num_late_payments': np.random.poisson(lam=2.2, size=n_production),
# DRIFT: higher loan amounts
'loan_amount': np.random.randint(5000, 65000, n_production),
})
print(f"Production data: {current_data.shape}")
9.3 — Visualize the distributions
Before running statistical tests, always look at the data:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 3, figsize=(16, 12))
fig.suptitle('Reference (blue) vs Production (orange)', fontsize=14)
for idx, feature in enumerate(reference_data.columns):
row, col = idx // 3, idx % 3
ax = axes[row][col]
ax.hist(reference_data[feature], bins=30, alpha=0.5,
label='Reference', color='steelblue', density=True)
ax.hist(current_data[feature], bins=30, alpha=0.5,
label='Production', color='darkorange', density=True)
ax.set_title(feature)
ax.legend(fontsize=8)
# Hide unused subplot
axes[2][2].set_visible(False)
plt.tight_layout()
plt.show()
You should clearly see the distributions shifting for several features.
9.4 — Run the Evidently drift report
Evidently AI (docs) compares reference vs. current data using statistical tests and tells you which features have drifted:
from evidently.report import Report
from evidently.metrics import DatasetDriftMetric, DataDriftTable
Create and run the report:
drift_report = Report(metrics=[
DatasetDriftMetric(), # Overall: is there drift?
DataDriftTable(), # Per-feature: which ones drifted?
])
drift_report.run(
reference_data=reference_data,
current_data=current_data,
)
Save the interactive HTML report:
import os
os.makedirs('monitoring', exist_ok=True)
drift_report.save_html("monitoring/drift_report.html")
print("Report saved — open monitoring/drift_report.html in your browser")
9.5 — Extract results programmatically
Pretty charts are nice, but for automated monitoring you need to extract results as data:
report_dict = drift_report.as_dict()
# Overall drift result
dataset_drift = report_dict['metrics'][0]['result']
print(f"Drift detected: {'YES' if dataset_drift['dataset_drift'] else 'NO'}")
print(f"Drifted features: {dataset_drift['number_of_drifted_columns']}"
f" / {dataset_drift['number_of_columns']}")
Per-feature breakdown:
drift_table = report_dict['metrics'][1]['result']
print(f"\n{'Feature':<28} {'Drifted?':<10} {'Score':<12} {'Test'}")
print("-" * 70)
for col, info in drift_table['drift_by_columns'].items():
status = "YES" if info['drift_detected'] else "no"
score = info['drift_score']
test = info['stattest_name']
flag = " << ALERT" if info['drift_detected'] else ""
print(f"{col:<28} {status:<10} {score:<12.6f} {test}{flag}")
9.6 — Interpret the results
Running a tool is easy. The real skill is interpreting what the results mean:
if dataset_drift['dataset_drift']:
print("""
ACTION REQUIRED:
1. Check model accuracy on recent production data
2. Investigate root cause (market shift? data bug?)
3. Retrain if performance has degraded
4. Set up automated alerts for future drift
""")
else:
print("No significant drift. Continue monitoring weekly.")
And a statistical comparison table:
comparison = pd.DataFrame({
'Feature': reference_data.columns,
'Training Mean': reference_data.mean().round(2).values,
'Production Mean': current_data.mean().round(2).values,
})
comparison['Shift %'] = (
(comparison['Production Mean'] - comparison['Training Mean'])
/ comparison['Training Mean'] * 100
).round(1)
print("\n", comparison.to_string(index=False))
git add notebooks/ monitoring/
git commit -m "feat: add data drift analysis with Evidently AI"
10. Performance Optimization
The Fiverr client wanted a fast API. Let's measure and improve.
10.1 — Profile with cProfile
cProfile is Python's built-in profiler. It tells you exactly where time is spent:
import cProfile
import pstats
import io
import time
from statistics import mean, stdev
model = joblib.load("model/credit_model.pkl")
test_input = pd.DataFrame([{
'age': 35, 'annual_income': 55000,
'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
'num_open_accounts': 5, 'num_late_payments': 2,
'loan_amount': 15000
}])
Profile 100 predictions to get meaningful data:
profiler = cProfile.Profile()
profiler.enable()
for _ in range(100):
model.predict_proba(test_input)
profiler.disable()
# Print the top 10 slowest functions
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(10)
print(stream.getvalue())
10.2 — Establish a baseline
Run 1,000 predictions and measure the distribution of times:
n_iterations = 1000
times_sklearn = []
for _ in range(n_iterations):
start = time.perf_counter()
model.predict_proba(test_input)
end = time.perf_counter()
times_sklearn.append((end - start) * 1000) # Convert to ms
print(f"Baseline (scikit-learn) over {n_iterations} iterations:")
print(f" Mean: {mean(times_sklearn):.3f} ms")
print(f" Std: {stdev(times_sklearn):.3f} ms")
print(f" p95: {np.percentile(times_sklearn, 95):.3f} ms")
10.3 — Optimize with ONNX Runtime
ONNX (Open Neural Network Exchange) is a standard format for ML models. ONNX Runtime is a highly optimized engine that can run models faster than native scikit-learn. (ONNX Runtime docs)
First, convert our sklearn pipeline to ONNX format:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as ort
import onnx
# Define input shape: [any batch size, 7 features]
initial_type = [('float_input', FloatTensorType([None, 7]))]
# Convert
onnx_model = convert_sklearn(model, initial_types=initial_type, target_opset=12)
# Save
onnx.save_model(onnx_model, "model/credit_model.onnx")
print("ONNX model saved!")
Now benchmark it:
# Create ONNX session
session = ort.InferenceSession("model/credit_model.onnx")
input_name = session.get_inputs()[0].name
# ONNX expects numpy arrays, not DataFrames
test_np = test_input.values.astype(np.float32)
times_onnx = []
for _ in range(n_iterations):
start = time.perf_counter()
session.run(None, {input_name: test_np})
end = time.perf_counter()
times_onnx.append((end - start) * 1000)
print(f"ONNX Runtime over {n_iterations} iterations:")
print(f" Mean: {mean(times_onnx):.3f} ms")
print(f" Std: {stdev(times_onnx):.3f} ms")
print(f" p95: {np.percentile(times_onnx, 95):.3f} ms")
10.4 — Compare and verify
speedup = mean(times_sklearn) / mean(times_onnx)
improvement = (1 - mean(times_onnx) / mean(times_sklearn)) * 100
print(f"\nComparison:")
print(f" scikit-learn: {mean(times_sklearn):.3f} ms")
print(f" ONNX Runtime: {mean(times_onnx):.3f} ms")
print(f" Speedup: {speedup:.2f}x")
print(f" Improvement: {improvement:.1f}%")
Critical step: verify the optimization doesn't change predictions. Speed is worthless if accuracy drops:
sklearn_pred = model.predict_proba(test_input)[0]
onnx_result = session.run(None, {input_name: test_np})
print(f"\nsklearn proba: {sklearn_pred}")
print("Predictions match — safe to deploy the optimized version.")
10.5 — Visualize the improvement
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.hist(times_sklearn, bins=50, alpha=0.6, label='scikit-learn', color='steelblue')
ax.hist(times_onnx, bins=50, alpha=0.6, label='ONNX Runtime', color='darkorange')
ax.set_xlabel('Inference Time (ms)')
ax.set_ylabel('Frequency')
ax.set_title('Inference Time: scikit-learn vs ONNX Runtime')
ax.legend()
plt.tight_layout()
plt.show()
git add optimization/
git commit -m "feat: add performance profiling and ONNX optimization"
11. The Final Architecture
Here's what we built:
┌──────────────────────┐
│ Developer │
│ (pushes to Git) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ GitHub + CI/CD │
│ Test → Build → Push │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Docker Container │
│ ┌────────────────┐ │
│ │ FastAPI API │ │
│ │ + ML Model │ │
│ │ + Logging │ │
│ └────────────────┘ │
└──────────┬───────────┘
│
┌──────────┴───────────┐
▼ ▼
┌─────────────┐ ┌───────────────┐
│ Predictions │ │ Log Storage │
│ to clients │ │ (for drift) │
└─────────────┘ └───────┬───────┘
▼
┌───────────────┐
│ Drift Analysis │
│ + Performance │
│ Monitoring │
└───────────────┘
Every component solves a real problem:
| Component | Problem It Solves |
|---|---|
| Git + GitHub | "What changed and when?" |
| FastAPI + Pydantic | "How do others use the model safely?" |
| Pytest | "Will this change break something?" |
| Docker | "It works on my machine" → works everywhere |
| GitHub Actions | "Did someone forget to run tests?" |
| JSON Logging | "What's happening in production?" |
| Evidently AI | "Is the model still relevant?" |
| ONNX Runtime | "Can we make it faster?" |
12. Let's conclude
After spending a weekend on this (and impressing the Fiverr client), here's what stuck:
Architecture: Start simple, iterate. Get a basic API working before adding Docker and CI/CD. Each layer builds on the previous one.
Performance: Load the model once at startup, never per request. This single choice can make your API 100x faster.
Testing: Test for both valid AND invalid inputs. Your API will receive data you never imagined.
Deployment: Docker eliminates "works on my machine." CI/CD makes it impossible to deploy broken code accidentally.
Monitoring: A deployed model without monitoring is a ticking time bomb. Data drift is real and it's silent.
Optimization: Always profile before optimizing. Measure, don't guess. And always verify that optimization doesn't change predictions.
Resources
- FastAPI Documentation
- Docker Get Started
- GitHub Actions
- Evidently AI
- ONNX Runtime
- Pydantic V2
- Pytest
- scikit-learn Pipelines
If you found this useful, feel free to clap or share. I'm always happy to chat about MLOps and the messy reality of putting ML models into production.
The complete code is available on GitHub.





















Top comments (0)