Wu DAO

Posted on Mar 1

Weekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)

#devops #machinelearning #showdev #tutorial

A hands-on, beginner-friendly guide to deploying, monitoring, and optimizing a Machine Learning model in production — from API creation to data drift detection.

Last Saturday morning, I was scrolling through freelance gig postings on Fiverr when I stumbled upon something that caught my attention. A small fintech startup was looking for someone to "take our trained credit scoring model and make it production-ready — API, Docker, CI/CD, the whole shebang." The budget was decent, the deadline was two weeks, and I thought: "How hard can it be?"

Spoiler: it was more involved than I expected. But by Sunday evening, I had a working end-to-end MLOps pipeline running, and I learned an incredible amount in the process. This article is the tutorial I wish I had before starting.

Whether you're a data science student, a junior ML engineer, or someone curious about what happens after a model is trained, this guide will walk you through every step — from serving predictions through an API to catching silent model failures in production.

What we'll build together:

A prediction API using FastAPI that serves a credit scoring model
Automated tests to make sure our API doesn't break
A Docker container to package everything for deployment
A CI/CD pipeline with GitHub Actions to automate testing and deployment
A data drift analysis to monitor model health over time
Performance optimizations to speed up inference

Let's dive in.

The Big Picture: What Is MLOps?
Setting Up the Project
Training a Simple Credit Scoring Model
Creating a Prediction API with FastAPI
Writing Automated Tests
Containerizing with Docker
Building a CI/CD Pipeline
Logging Production Data
Data Drift Detection
Performance Optimization
The Final Architecture
Key Takeaways

1. The Big Picture: What Is MLOps?

Before we write a single line of code, let's understand the landscape.

The "Last Mile" Problem

Here's a reality check that most online courses don't tell you: training a model is only about 20% of the work in a real ML project. The remaining 80% is everything that happens around it — data pipelines, deployment, monitoring, maintenance.

This is what MLOps (Machine Learning Operations) is about. Think of it as DevOps, but specifically designed for the unique challenges of machine learning systems.

The MLOps lifecycle — training is just one piece of a much larger puzzle. (source: ml-ops.org)

What We'll Cover

Pillar	What It Means	Tool We'll Use
Model Serving	Making predictions available via an API	FastAPI
Containerization	Packaging code + dependencies together	Docker
CI/CD	Automating tests and deployment	GitHub Actions
Monitoring	Watching model behavior in production	Evidently AI
Optimization	Making inference faster	cProfile, ONNX
Version Control	Tracking every change	Git + GitHub

Each pillar solves a real problem. Without containerization, your code works on your machine but breaks on the server. Without CI/CD, every deployment is a manual, error-prone process. Without monitoring, your model silently degrades for months.

Let's start building.

2. Setting Up the Project

Good MLOps starts with good organization. Here's the structure we'll use:

credit-scoring-mlops/
│
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI application
│   ├── model_loader.py      # Model loading logic
│   └── schemas.py           # Input/output validation
│
├── model/
│   └── credit_model.pkl     # Trained model
│
├── tests/
│   ├── __init__.py
│   ├── test_api.py          # API tests
│   └── test_model.py        # Model tests
│
├── notebooks/
│   └── data_drift_analysis.ipynb
│
├── monitoring/
│   └── logger.py            # Logging setup
│
├── .github/
│   └── workflows/
│       └── ci-cd.yml        # CI/CD pipeline
│
├── Dockerfile
├── requirements.txt
├── .gitignore
└── README.md

Why this structure? When someone new joins the project (or when future-you comes back in 6 months), they immediately understand where everything lives. app/ = API code. tests/ = tests. model/ = model files. Simple.

Step by step — initialize the repo

First, create the folder structure:

mkdir credit-scoring-mlops && cd credit-scoring-mlops
mkdir -p app model tests notebooks monitoring .github/workflows

Then, initialize Git:

git init

Now let's create the .gitignore. This file tells Git which files to NOT track. We don't want to accidentally push passwords, large data files, or Python cache files:

cat > .gitignore << 'EOF'
# Python cache files
__pycache__/
*.py[cod]

# Virtual environments
.venv/
venv/

# Data files (too large for Git)
*.csv
*.parquet
data/

# Environment secrets (NEVER commit these)
.env
*.secret

# IDE files
.vscode/
.idea/

# OS files
.DS_Store
Thumbs.db

# Log files
*.log
logs/
EOF

Finally, make the first commit:

git add .gitignore
git commit -m "Initial commit: project structure and .gitignore"

Why does this matter? Every commit is a snapshot. If something breaks later, you can always go back to a working version. Commit messages should describe what changed — this creates a readable project history.

3. Training a Simple Credit Scoring Model

We need a trained model to deploy. In a real scenario, this would come from an earlier modeling phase (tracked with a tool like MLflow). We'll build a quick one here so the tutorial is self-contained.

3.1 — Generate synthetic data

We'll create fake credit data. Each row represents a loan applicant:

import pandas as pd
import numpy as np

np.random.seed(42)  # For reproducibility
n_samples = 5000

data = pd.DataFrame({
    'age': np.random.randint(21, 70, n_samples),
    'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
    'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
    'credit_history_length': np.random.randint(0, 30, n_samples),
    'num_open_accounts': np.random.randint(1, 20, n_samples),
    'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
    'loan_amount': np.random.randint(1000, 50000, n_samples),
})

What's happening here? We're generating 5,000 fake applicants with 7 features each. np.random.seed(42) ensures you get the exact same data every time you run this — reproducibility is key in ML.

3.2 — Create the target variable

Now we need to decide who defaults (1) and who doesn't (0). We simulate this based on common-sense rules: more debt, more late payments → higher chance of default.

default_probability = (
    0.15 * data['debt_to_income_ratio']
    + 0.1 * (data['num_late_payments'] / 10)
    - 0.05 * (data['credit_history_length'] / 30)
    + 0.05 * (data['loan_amount'] / 50000)
    - 0.05 * (data['annual_income'] / data['annual_income'].max())
)

# Keep probabilities in a reasonable range
default_probability = default_probability.clip(0.05, 0.95)

# Generate binary outcomes from these probabilities
data['default'] = np.random.binomial(1, default_probability)

print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")

3.3 — Split into training and test sets

We need two sets: one to train on, one to evaluate with. stratify=y ensures both sets have the same proportion of defaults.

from sklearn.model_selection import train_test_split

feature_columns = [
    'age', 'annual_income', 'debt_to_income_ratio',
    'credit_history_length', 'num_open_accounts',
    'num_late_payments', 'loan_amount'
]

X = data[feature_columns]
y = data['default']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

3.4 — Build a scikit-learn Pipeline

Here's a key decision: we use a Pipeline that bundles preprocessing (scaling) and the model together into a single object. Why? Because when we deploy, we only need to load one file that handles everything.

from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('scaler', StandardScaler()),           # Step 1: Normalize features
    ('classifier', GradientBoostingClassifier(
        n_estimators=100,
        max_depth=4,
        learning_rate=0.1,
        random_state=42
    ))                                       # Step 2: Classify
])

What's a Pipeline? Think of it like an assembly line. Data goes in one end → gets scaled → gets classified → prediction comes out. The beauty is that pipeline.predict(new_data) automatically applies scaling first, then prediction. No extra steps needed.

3.5 — Train and evaluate

pipeline.fit(X_train, y_train)

That single line does all the work. Now let's see how good it is:

from sklearn.metrics import classification_report, roc_auc_score

y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]

print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

3.6 — Save the model and reference data

Two things to save:

The model — this is what we'll deploy
The reference data (training data) — we'll need this later to detect drift

import joblib
import os

os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)

# Save the trained pipeline
joblib.dump(pipeline, 'model/credit_model.pkl')

# Save reference data for drift analysis later
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)

print("Model saved to model/credit_model.pkl")
print("Reference data saved to data/")

Commit this progress:

git add train_model.py
git commit -m "feat: add model training script and initial model artifact"

3.7 — Full training script (copy-paste ready)

Here's everything from section 3 combined into a single runnable file:

Click to expand: train_model.py (complete)

# train_model.py — Complete model training script

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, roc_auc_score
import joblib
import os

# --- Generate synthetic data ---
np.random.seed(42)
n_samples = 5000

data = pd.DataFrame({
    'age': np.random.randint(21, 70, n_samples),
    'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
    'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
    'credit_history_length': np.random.randint(0, 30, n_samples),
    'num_open_accounts': np.random.randint(1, 20, n_samples),
    'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
    'loan_amount': np.random.randint(1000, 50000, n_samples),
})

# --- Create target ---
default_probability = (
    0.15 * data['debt_to_income_ratio']
    + 0.1 * (data['num_late_payments'] / 10)
    - 0.05 * (data['credit_history_length'] / 30)
    + 0.05 * (data['loan_amount'] / 50000)
    - 0.05 * (data['annual_income'] / data['annual_income'].max())
)
default_probability = default_probability.clip(0.05, 0.95)
data['default'] = np.random.binomial(1, default_probability)

print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")

# --- Split ---
feature_columns = [
    'age', 'annual_income', 'debt_to_income_ratio',
    'credit_history_length', 'num_open_accounts',
    'num_late_payments', 'loan_amount'
]
X = data[feature_columns]
y = data['default']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# --- Train ---
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', GradientBoostingClassifier(
        n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42
    ))
])
pipeline.fit(X_train, y_train)

# --- Evaluate ---
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

# --- Save ---
os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)
joblib.dump(pipeline, 'model/credit_model.pkl')
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)
print("\nModel and reference data saved.")

4. Creating a Prediction API with FastAPI

Now we get to the real MLOps work. We have a trained model sitting in a .pkl file. How do we let other people use it?

4.1 — What is an API?

An API (Application Programming Interface) is like a waiter in a restaurant. You (the client) tell the waiter what you want, the waiter goes to the kitchen (the model), and brings back your food (the prediction).

Without an API, anyone who wants a prediction needs to install Python, install all dependencies, download the model, and write code to use it. That doesn't scale.

A REST API acts as an intermediary between clients and your model. (source: SmartBear)

FastAPI is a modern Python framework for building APIs. It's fast, auto-generates documentation, and uses Python type hints for automatic validation. It's the go-to choice for ML engineers. (FastAPI docs)

4.2 — The critical rule: load your model ONCE

This is a mistake I see beginners make constantly. Watch:

# BAD — loads the model on EVERY request
@app.post("/predict")
async def predict(data):
    model = joblib.load("model/credit_model.pkl")  # SLOW! Every. Single. Time.
    return model.predict(data)

If your model file is 50MB and you get 100 requests per second, you're loading 50MB from disk 100 times per second. The API will grind to a halt.

# GOOD — load once, reuse forever
model = None

def load_model():
    global model
    model = joblib.load("model/credit_model.pkl")  # Loaded ONCE at startup

@app.post("/predict")
async def predict(data):
    return model.predict(data)  # Uses the already-loaded model

Let's build this properly.

4.3 — The model loader module

Create app/model_loader.py. This module has one job: load the model once and provide access to it.

# app/model_loader.py

import joblib
import os

First, we create a global variable to hold the model. It starts as None (nothing loaded yet):

_model = None

Now the function that loads the model from disk:

def load_model():
    """Load the model ONCE at startup."""
    global _model

    # Allow the path to be configured via environment variable
    model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")

    if not os.path.exists(model_path):
        raise FileNotFoundError(
            f"Model not found at {model_path}. "
            f"Run train_model.py first."
        )

    _model = joblib.load(model_path)
    print(f"Model loaded from {model_path}")
    return _model

Why os.environ.get()? This makes the path flexible. On your laptop, the model is at model/credit_model.pkl. Inside a Docker container, it might be somewhere else. Environment variables let you configure this without changing code.

And a function to retrieve the loaded model:

def get_model():
    """Get the model that was loaded at startup."""
    if _model is None:
        raise RuntimeError("Model not loaded! Call load_model() first.")
    return _model

4.4 — Input validation with Pydantic

In a Jupyter notebook, you control the data. In production, you have no idea what's coming in. Someone might send:

Text where a number is expected
Negative ages
Missing fields entirely

Pydantic solves this. You define a schema (a "shape") for your data, and FastAPI automatically rejects anything that doesn't match. Let's build it piece by piece.

Create app/schemas.py:

from pydantic import BaseModel, Field, field_validator

Now define what a valid credit application looks like:

class CreditApplication(BaseModel):
    age: int = Field(
        ...,        # The ... means "this field is required"
        ge=18,      # ge = "greater than or equal to"
        le=120,     # le = "less than or equal to"
        description="Applicant's age in years"
    )

What's Field(..., ge=18, le=120)? It says: "This field is required, must be an integer, at least 18, at most 120." If someone sends age: -5, FastAPI automatically returns an error without you writing a single if statement.

Let's add the remaining fields the same way:

    annual_income: int = Field(
        ..., gt=0,           # gt = "greater than" (strictly positive)
        description="Annual income in dollars"
    )

    debt_to_income_ratio: float = Field(
        ..., ge=0.0, le=10.0,
        description="Monthly debt / monthly income"
    )

    credit_history_length: int = Field(
        ..., ge=0, le=80,
        description="Credit history in years"
    )

    num_open_accounts: int = Field(
        ..., ge=0, le=100,
        description="Number of open credit accounts"
    )

    num_late_payments: int = Field(
        ..., ge=0,
        description="Number of late payments"
    )

    loan_amount: int = Field(
        ..., gt=0,
        description="Requested loan amount in dollars"
    )

We can also add custom business logic validation. For example, you can't have 30 years of credit history if you're 25 years old:

    @field_validator('credit_history_length')
    @classmethod
    def history_cannot_exceed_age(cls, v, info):
        if 'age' in info.data and v > info.data['age'] - 18:
            raise ValueError(
                f"Credit history ({v}y) can't exceed age minus 18"
            )
        return v

Now define what the API returns:

class PredictionResponse(BaseModel):
    prediction: int = Field(description="0 = No Default, 1 = Default")
    probability_of_default: float = Field(description="Probability from 0.0 to 1.0")
    risk_category: str = Field(description="Low, Medium, or High")


class HealthResponse(BaseModel):
    status: str
    model_loaded: bool

4.5 — The FastAPI application

Now we wire everything together. Create app/main.py:

Start with imports:

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone

from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model

Set up logging (we'll use this for monitoring later):

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")

Define what happens when the app starts up. This is where we load the model once:

@asynccontextmanager
async def lifespan(app: FastAPI):
    """Runs at startup (before yield) and shutdown (after yield)."""
    logger.info("Starting up — loading model...")
    load_model()
    logger.info("Model loaded. Ready to serve predictions.")
    yield
    logger.info("Shutting down.")

What's @asynccontextmanager? It's FastAPI's way of saying "run this code when the app starts, and that code when it stops." The model gets loaded once, right at startup.

Create the app:

app = FastAPI(
    title="Credit Scoring API",
    description="Predict loan default probability",
    version="1.0.0",
    lifespan=lifespan,
)

Add a health check endpoint. Every production API needs one — it's how load balancers and monitoring tools know the service is alive:

@app.get("/health", response_model=HealthResponse)
async def health_check():
    try:
        model = get_model()
        return HealthResponse(status="healthy", model_loaded=True)
    except RuntimeError:
        return HealthResponse(status="unhealthy", model_loaded=False)

Now the star of the show — the prediction endpoint:

@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
    start_time = time.time()

    try:
        model = get_model()

Convert the validated input into a DataFrame (what scikit-learn expects):

        input_data = pd.DataFrame([{
            'age': application.age,
            'annual_income': application.annual_income,
            'debt_to_income_ratio': application.debt_to_income_ratio,
            'credit_history_length': application.credit_history_length,
            'num_open_accounts': application.num_open_accounts,
            'num_late_payments': application.num_late_payments,
            'loan_amount': application.loan_amount,
        }])

Get the prediction and probability:

        prediction = int(model.predict(input_data)[0])
        probability = float(model.predict_proba(input_data)[0][1])

What's predict_proba()? It returns probabilities instead of just 0/1. For example, [0.82, 0.18] means 82% chance of no default, 18% chance of default. The [0][1] grabs the probability of default (class 1).

Map the probability to a human-readable risk level:

        if probability < 0.3:
            risk_category = "Low"
        elif probability < 0.6:
            risk_category = "Medium"
        else:
            risk_category = "High"

Calculate how long the prediction took, and log everything:

        inference_time_ms = (time.time() - start_time) * 1000

        log_entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction",
            "inputs": application.model_dump(),
            "outputs": {
                "prediction": prediction,
                "probability_of_default": round(probability, 4),
                "risk_category": risk_category,
            },
            "inference_time_ms": round(inference_time_ms, 2),
        }
        logger.info(json.dumps(log_entry))

Why log all of this? These logs are gold. They'll be used later for drift detection (comparing production inputs to training data), performance monitoring (is the API getting slower?), and debugging (what happened when a wrong prediction was made?).

Return the response:

        return PredictionResponse(
            prediction=prediction,
            probability_of_default=round(probability, 4),
            risk_category=risk_category,
        )

    except Exception as e:
        logger.error(json.dumps({
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction_error",
            "error": str(e),
            "inputs": application.model_dump(),
        }))
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")

4.6 — Test it locally

pip install fastapi uvicorn scikit-learn joblib pandas pydantic

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Open http://localhost:8000/docs — FastAPI auto-generates interactive Swagger documentation:

FastAPI generates this Swagger UI automatically. You can test your API right from the browser.

Test with curl:

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "annual_income": 55000,
    "debt_to_income_ratio": 0.35,
    "credit_history_length": 12,
    "num_open_accounts": 5,
    "num_late_payments": 2,
    "loan_amount": 15000
  }'

4.7 — Full API code (copy-paste ready)

Click to expand: app/model_loader.py (complete)

# app/model_loader.py

import joblib
import os

_model = None

def load_model():
    global _model
    model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")
    if not os.path.exists(model_path):
        raise FileNotFoundError(f"Model not found at {model_path}")
    _model = joblib.load(model_path)
    print(f"Model loaded from {model_path}")
    return _model

def get_model():
    if _model is None:
        raise RuntimeError("Model not loaded! Call load_model() first.")
    return _model

Click to expand: app/schemas.py (complete)

# app/schemas.py

from pydantic import BaseModel, Field, field_validator

class CreditApplication(BaseModel):
    age: int = Field(..., ge=18, le=120, description="Applicant's age")
    annual_income: int = Field(..., gt=0, description="Annual income ($)")
    debt_to_income_ratio: float = Field(..., ge=0.0, le=10.0, description="DTI ratio")
    credit_history_length: int = Field(..., ge=0, le=80, description="Credit history (years)")
    num_open_accounts: int = Field(..., ge=0, le=100, description="Open accounts")
    num_late_payments: int = Field(..., ge=0, description="Late payments")
    loan_amount: int = Field(..., gt=0, description="Loan amount ($)")

    @field_validator('credit_history_length')
    @classmethod
    def history_cannot_exceed_age(cls, v, info):
        if 'age' in info.data and v &gt; info.data['age'] - 18:
            raise ValueError(f"Credit history ({v}y) can't exceed age minus 18")
        return v

class PredictionResponse(BaseModel):
    prediction: int = Field(description="0=No Default, 1=Default")
    probability_of_default: float = Field(description="Probability 0.0 to 1.0")
    risk_category: str = Field(description="Low, Medium, or High")

class HealthResponse(BaseModel):
    status: str
    model_loaded: bool

Click to expand: app/main.py (complete)

# app/main.py

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone

from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")

@asynccontextmanager
async def lifespan(app: FastAPI):
    logger.info("Starting up — loading model...")
    load_model()
    logger.info("Model loaded. Ready.")
    yield
    logger.info("Shutting down.")

app = FastAPI(
    title="Credit Scoring API",
    description="Predict loan default probability",
    version="1.0.0",
    lifespan=lifespan,
)

@app.get("/health", response_model=HealthResponse)
async def health_check():
    try:
        model = get_model()
        return HealthResponse(status="healthy", model_loaded=True)
    except RuntimeError:
        return HealthResponse(status="unhealthy", model_loaded=False)

@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
    start_time = time.time()
    try:
        model = get_model()
        input_data = pd.DataFrame([{
            'age': application.age,
            'annual_income': application.annual_income,
            'debt_to_income_ratio': application.debt_to_income_ratio,
            'credit_history_length': application.credit_history_length,
            'num_open_accounts': application.num_open_accounts,
            'num_late_payments': application.num_late_payments,
            'loan_amount': application.loan_amount,
        }])
        prediction = int(model.predict(input_data)[0])
        probability = float(model.predict_proba(input_data)[0][1])

        if probability &lt; 0.3:
            risk_category = "Low"
        elif probability &lt; 0.6:
            risk_category = "Medium"
        else:
            risk_category = "High"

        inference_time_ms = (time.time() - start_time) * 1000

        log_entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction",
            "inputs": application.model_dump(),
            "outputs": {
                "prediction": prediction,
                "probability_of_default": round(probability, 4),
                "risk_category": risk_category,
            },
            "inference_time_ms": round(inference_time_ms, 2),
        }
        logger.info(json.dumps(log_entry))

        return PredictionResponse(
            prediction=prediction,
            probability_of_default=round(probability, 4),
            risk_category=risk_category,
        )
    except Exception as e:
        logger.error(json.dumps({
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction_error",
            "error": str(e),
            "inputs": application.model_dump(),
        }))
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")

git add app/
git commit -m "feat: implement FastAPI prediction API with validation and logging"

5. Writing Automated Tests

Why test?

Imagine you change one line of code that accidentally breaks input validation. Without tests, this bug goes to production. With tests in your CI/CD pipeline, the bug gets caught before it ever reaches users.

We'll write two kinds:

Unit tests: test individual functions ("does the model return a valid prediction?")
Integration tests: test the full flow ("does the API endpoint respond correctly?")

5.1 — Setting up the test client

FastAPI provides a TestClient that simulates HTTP requests without starting a real server:

# tests/test_api.py

import pytest
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

What's TestClient? It's a fake browser. You can send GET and POST requests to your API and check the responses, all without starting a server. Tests run in milliseconds.

5.2 — Test the health endpoint

The simplest test first:

def test_health_returns_200():
    response = client.get("/health")
    assert response.status_code == 200

def test_health_reports_model_loaded():
    response = client.get("/health")
    data = response.json()
    assert data["status"] == "healthy"
    assert data["model_loaded"] is True

What's assert? It means "this must be true, or the test fails." assert response.status_code == 200 says "the server must return HTTP 200 (OK)."

5.3 — Test valid predictions

Now let's test the actual prediction endpoint with good data:

def test_valid_prediction_returns_200():
    payload = {
        "age": 35, "annual_income": 55000,
        "debt_to_income_ratio": 0.35, "credit_history_length": 12,
        "num_open_accounts": 5, "num_late_payments": 2,
        "loan_amount": 15000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 200

Check that the response contains all expected fields:

def test_response_has_required_fields():
    payload = {
        "age": 45, "annual_income": 80000,
        "debt_to_income_ratio": 0.20, "credit_history_length": 20,
        "num_open_accounts": 3, "num_late_payments": 0,
        "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    data = response.json()

    assert "prediction" in data
    assert "probability_of_default" in data
    assert "risk_category" in data

Check that values are in expected ranges:

def test_prediction_is_binary():
    payload = {
        "age": 30, "annual_income": 40000,
        "debt_to_income_ratio": 0.50, "credit_history_length": 5,
        "num_open_accounts": 8, "num_late_payments": 4,
        "loan_amount": 20000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert data["prediction"] in [0, 1]

def test_probability_between_0_and_1():
    payload = {
        "age": 28, "annual_income": 35000,
        "debt_to_income_ratio": 0.60, "credit_history_length": 3,
        "num_open_accounts": 6, "num_late_payments": 5,
        "loan_amount": 25000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert 0.0 <= data["probability_of_default"] <= 1.0

def test_risk_category_is_valid():
    payload = {
        "age": 50, "annual_income": 100000,
        "debt_to_income_ratio": 0.10, "credit_history_length": 25,
        "num_open_accounts": 2, "num_late_payments": 0,
        "loan_amount": 5000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert data["risk_category"] in ["Low", "Medium", "High"]

5.4 — Test invalid inputs

This is equally important. Our API should reject bad data with a clear error (HTTP 422):

def test_negative_age_rejected():
    """Age of -5 should be rejected."""
    payload = {
        "age": -5,  # INVALID
        "annual_income": 50000, "debt_to_income_ratio": 0.30,
        "credit_history_length": 10, "num_open_accounts": 3,
        "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422  # 422 = Unprocessable Entity

def test_zero_income_rejected():
    """Income must be strictly positive."""
    payload = {
        "age": 30, "annual_income": 0,  # INVALID
        "debt_to_income_ratio": 0.30, "credit_history_length": 10,
        "num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422

def test_missing_field_rejected():
    """Omitting a required field should fail."""
    payload = {
        "age": 30,
        # annual_income is MISSING
        "debt_to_income_ratio": 0.30, "credit_history_length": 10,
        "num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422

def test_wrong_type_rejected():
    """Sending text where a number is expected should fail."""
    payload = {
        "age": "thirty",  # WRONG TYPE
        "annual_income": 50000, "debt_to_income_ratio": 0.30,
        "credit_history_length": 10, "num_open_accounts": 3,
        "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422

5.5 — Test the model directly

We also test the model itself, separately from the API:

# tests/test_model.py

import joblib
import pandas as pd

def test_model_loads():
    model = joblib.load("model/credit_model.pkl")
    assert model is not None

def test_model_has_predict():
    model = joblib.load("model/credit_model.pkl")
    assert hasattr(model, 'predict')
    assert hasattr(model, 'predict_proba')

def test_model_returns_one_prediction():
    model = joblib.load("model/credit_model.pkl")
    test_input = pd.DataFrame([{
        'age': 35, 'annual_income': 55000,
        'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
        'num_open_accounts': 5, 'num_late_payments': 2,
        'loan_amount': 15000
    }])
    prediction = model.predict(test_input)
    assert len(prediction) == 1

5.6 — Run the tests

pip install pytest httpx

pytest tests/ -v

You should see all tests passing with green check marks.

git add tests/
git commit -m "feat: add unit and integration tests for API and model"

6. Containerizing with Docker

What is Docker?

Your API works on your laptop. But will it work on a server? Maybe the server has a different Python version, or a missing library.

Docker packages your application with its entire environment — OS, Python, libraries, everything — into a self-contained unit called a container. Think of it like shipping your laptop inside the package instead of just the code.

Docker containers package your app with everything it needs. (source: docker.com)

6.1 — The requirements file

First, list all Python dependencies. Pin the versions — without this, a library update could silently break things months later:

# requirements.txt

fastapi>=0.104.0
uvicorn>=0.24.0
scikit-learn>=1.3.0
joblib>=1.3.0
pandas>=2.0.0
pydantic>=2.0.0
numpy>=1.24.0

6.2 — The Dockerfile, line by line

A Dockerfile is a recipe. Each line is one instruction:

FROM python:3.11-slim

What this does: Start from a minimal Python 3.11 image. The -slim variant is ~150MB instead of ~900MB. Less bloat = faster builds.

WORKDIR /app

What this does: All subsequent commands run inside /app in the container. Like doing cd /app.

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

What this does: Copy the requirements file and install dependencies. Why copy this file separately? Docker caches each step. If requirements.txt hasn't changed, Docker skips the slow pip install on rebuilds. This can save minutes.

COPY app/ ./app/
COPY model/ ./model/

What this does: Copy our application code and model into the container.

EXPOSE 8000

What this does: Documents which port the container uses. It doesn't actually open the port — that's done at runtime.

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

What this does: The command that runs when the container starts. --host 0.0.0.0 means "listen on all network interfaces" — required inside a container (using localhost won't work from outside).

6.3 — Build and run

# Build the image (give it a name with -t)
docker build -t credit-scoring-api .

# Run the container
# -p 8000:8000 maps your machine's port 8000 to the container's port 8000
docker run -p 8000:8000 credit-scoring-api

# Test it
curl http://localhost:8000/health

6.4 — Full Dockerfile (copy-paste ready)

Click to expand: Dockerfile (complete)

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
COPY model/ ./model/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

git add Dockerfile requirements.txt
git commit -m "feat: add Dockerfile for containerized deployment"

7. Building a CI/CD Pipeline with GitHub Actions

What is CI/CD?

Continuous Integration (CI): Every time you push code, tests run automatically
Continuous Deployment (CD): If tests pass, the code gets deployed automatically

Without CI/CD: you manually run tests (if you remember), manually build Docker, manually deploy. With CI/CD: push code → everything happens automatically. (GitHub Actions docs)

7.1 — The pipeline structure

Our pipeline has 3 stages that run in sequence:

PUSH to main
    │
    ▼
┌──────────┐     ┌──────────┐     ┌──────────┐
│   TEST   │ ──▶ │  BUILD   │ ──▶ │  DEPLOY  │
│  pytest  │     │  docker  │     │  push to  │
│          │     │  build   │     │  registry │
└──────────┘     └──────────┘     └──────────┘
     │                │                │
  If FAIL:         If FAIL:        If FAIL:
  STOP HERE        STOP HERE       STOP HERE

Key insight: If tests fail, we don't waste time building. If building fails, we don't try to deploy. Each stage only runs if the previous one succeeded.

7.2 — The YAML file, block by block

Create .github/workflows/ci-cd.yml:

First, define when the pipeline runs:

name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

This means: "Run on every push to main and on every pull request targeting main."

Stage 1 — TEST:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install pytest httpx

      - name: Generate model
        run: python train_model.py

      - name: Run tests
        run: pytest tests/ -v --tb=short

What's happening? GitHub spins up a fresh Ubuntu machine, installs Python, installs our dependencies, trains the model (in a real project you'd download it from a model registry), and runs pytest.

Stage 2 — BUILD (only runs if Stage 1 passes):

  build:
    runs-on: ubuntu-latest
    needs: test            # <-- This is the key: "needs" means "wait for test"
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Generate model
        run: |
          pip install scikit-learn pandas joblib numpy
          python train_model.py

      - name: Build Docker image
        run: docker build -t credit-scoring-api:${{ github.sha }} .

      - name: Smoke test the container
        run: |
          docker run -d -p 8000:8000 --name test-api credit-scoring-api:${{ github.sha }}
          sleep 10
          curl --fail http://localhost:8000/health || exit 1
          docker stop test-api

What's a smoke test? We start the container and hit the health endpoint. If it doesn't respond, something is broken. curl --fail returns an error if the HTTP response indicates failure.

Stage 3 — DEPLOY (only from main branch, only if build passed):

  deploy:
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'  # Only deploy from main
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Generate model
        run: |
          pip install scikit-learn pandas joblib numpy
          python train_model.py

      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_TOKEN }}

      - name: Build and push
        run: |
          docker build -t ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest .
          docker push ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest

What are secrets? Credentials stored securely in GitHub Settings → Secrets. They're never visible in logs. Never hardcode passwords in your code or YAML files.

git add .github/
git commit -m "feat: add CI/CD pipeline with GitHub Actions"

8. Logging Production Data

Why log everything?

Once your model is live, you're flying blind unless you collect data. Logging serves:

Debugging: When something breaks at 2 AM, logs tell you what happened
Drift detection: Compare production inputs against training data
Performance: Track how fast (or slow) predictions are
Auditing: In finance, you need records of every decision

8.1 — What to store

What	Why	Example
Input features	Drift detection	`age: 35, income: 55000, ...`
Prediction	Performance monitoring	`prediction: 0`
Probability	Score distribution	`probability: 0.18`
Inference time	Latency monitoring	`12.3 ms`
Timestamp	Time-based analysis	`2024-03-15T14:30:00Z`
Errors	Debugging	`ValueError: ...`

8.2 — Structured logging

We already added logging in our main.py (section 4.5). The key is using JSON format — it's machine-parseable, so monitoring tools (ELK Stack, Datadog, CloudWatch) can automatically index every field.

Here's a dedicated logger with file rotation (prevents logs from filling up the disk):

# monitoring/logger.py

import logging
import os
from logging.handlers import RotatingFileHandler

def setup_production_logger(name="credit_scoring_api", log_dir="logs"):
    os.makedirs(log_dir, exist_ok=True)

    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)

    if logger.handlers:  # Avoid duplicates
        return logger

The file handler rotates logs when they reach 10 MB, keeping 5 old files:

    file_handler = RotatingFileHandler(
        filename=os.path.join(log_dir, "predictions.log"),
        maxBytes=10_000_000,  # 10 MB per file
        backupCount=5,        # Keep 5 rotated files
    )

    console_handler = logging.StreamHandler()

    # Both handlers use the same format
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(message)s')
    file_handler.setFormatter(formatter)
    console_handler.setFormatter(formatter)

    logger.addHandler(file_handler)
    logger.addHandler(console_handler)

    return logger

In production, these log files would be shipped to a centralized logging platform (Elasticsearch, Datadog, CloudWatch). For our tutorial, local files are enough — the important thing is that we're capturing the data.

git add monitoring/
git commit -m "feat: add production logging with file rotation"

9. Data Drift Detection with Evidently

What is data drift?

Your model learned patterns from training data. But the real world changes. People's financial behavior shifts due to economic events, policy changes, or seasonal patterns.

Data drift = the data your model receives in production starts looking significantly different from what it was trained on. This is your early warning system — it tells you when the model might need retraining.

When production data drifts away from training data, model performance can degrade. (source: Evidently AI)

Types of drift

Type	What changes	Example
Data drift	Input distributions	Average income of applicants increases
Concept drift	Feature-target relationship	Same income now means higher default risk
Prediction drift	Output distribution	Model starts predicting more defaults

9.1 — Load the reference data

The reference is our training data — what the model was built on:

import pandas as pd
import numpy as np

reference_data = pd.read_csv('data/reference_data.csv')
print(f"Reference data: {reference_data.shape}")

9.2 — Simulate production data with drift

In real life, this would come from your API logs. Here we simulate production data that has intentional drift — so we can see the detection in action:

np.random.seed(123)
n_production = 1000

current_data = pd.DataFrame({
    # DRIFT: younger applicants (new customer segment)
    'age': np.random.randint(18, 55, n_production),

    # DRIFT: higher incomes (economic growth)
    'annual_income': np.random.lognormal(
        mean=10.8, sigma=0.7, size=n_production  # Was 10.5
    ).astype(int),

    # DRIFT: higher debt ratios (inflation effect)
    'debt_to_income_ratio': np.random.uniform(
        0.1, 1.8, n_production  # Was 0 to 1.5
    ).round(3),

    # STABLE: no significant change
    'credit_history_length': np.random.randint(0, 30, n_production),
    'num_open_accounts': np.random.randint(1, 20, n_production),

    # SLIGHT DRIFT: more late payments
    'num_late_payments': np.random.poisson(lam=2.2, size=n_production),

    # DRIFT: higher loan amounts
    'loan_amount': np.random.randint(5000, 65000, n_production),
})

print(f"Production data: {current_data.shape}")

9.3 — Visualize the distributions

Before running statistical tests, always look at the data:

import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 3, figsize=(16, 12))
fig.suptitle('Reference (blue) vs Production (orange)', fontsize=14)

for idx, feature in enumerate(reference_data.columns):
    row, col = idx // 3, idx % 3
    ax = axes[row][col]
    ax.hist(reference_data[feature], bins=30, alpha=0.5,
            label='Reference', color='steelblue', density=True)
    ax.hist(current_data[feature], bins=30, alpha=0.5,
            label='Production', color='darkorange', density=True)
    ax.set_title(feature)
    ax.legend(fontsize=8)

# Hide unused subplot
axes[2][2].set_visible(False)
plt.tight_layout()
plt.show()

You should clearly see the distributions shifting for several features.

9.4 — Run the Evidently drift report

Evidently AI (docs) compares reference vs. current data using statistical tests and tells you which features have drifted:

from evidently.report import Report
from evidently.metrics import DatasetDriftMetric, DataDriftTable

Create and run the report:

drift_report = Report(metrics=[
    DatasetDriftMetric(),   # Overall: is there drift?
    DataDriftTable(),       # Per-feature: which ones drifted?
])

drift_report.run(
    reference_data=reference_data,
    current_data=current_data,
)

Save the interactive HTML report:

import os
os.makedirs('monitoring', exist_ok=True)
drift_report.save_html("monitoring/drift_report.html")
print("Report saved — open monitoring/drift_report.html in your browser")

9.5 — Extract results programmatically

Pretty charts are nice, but for automated monitoring you need to extract results as data:

report_dict = drift_report.as_dict()

# Overall drift result
dataset_drift = report_dict['metrics'][0]['result']
print(f"Drift detected: {'YES' if dataset_drift['dataset_drift'] else 'NO'}")
print(f"Drifted features: {dataset_drift['number_of_drifted_columns']}"
      f" / {dataset_drift['number_of_columns']}")

Per-feature breakdown:

drift_table = report_dict['metrics'][1]['result']

print(f"\n{'Feature':<28} {'Drifted?':<10} {'Score':<12} {'Test'}")
print("-" * 70)

for col, info in drift_table['drift_by_columns'].items():
    status = "YES" if info['drift_detected'] else "no"
    score = info['drift_score']
    test = info['stattest_name']
    flag = " << ALERT" if info['drift_detected'] else ""
    print(f"{col:<28} {status:<10} {score:<12.6f} {test}{flag}")

9.6 — Interpret the results

Running a tool is easy. The real skill is interpreting what the results mean:

if dataset_drift['dataset_drift']:
    print("""
    ACTION REQUIRED:
    1. Check model accuracy on recent production data
    2. Investigate root cause (market shift? data bug?)
    3. Retrain if performance has degraded
    4. Set up automated alerts for future drift
    """)
else:
    print("No significant drift. Continue monitoring weekly.")

And a statistical comparison table:

comparison = pd.DataFrame({
    'Feature': reference_data.columns,
    'Training Mean': reference_data.mean().round(2).values,
    'Production Mean': current_data.mean().round(2).values,
})
comparison['Shift %'] = (
    (comparison['Production Mean'] - comparison['Training Mean'])
    / comparison['Training Mean'] * 100
).round(1)

print("\n", comparison.to_string(index=False))

git add notebooks/ monitoring/
git commit -m "feat: add data drift analysis with Evidently AI"

10. Performance Optimization

The Fiverr client wanted a fast API. Let's measure and improve.

10.1 — Profile with cProfile

cProfile is Python's built-in profiler. It tells you exactly where time is spent:

import cProfile
import pstats
import io
import time
from statistics import mean, stdev

model = joblib.load("model/credit_model.pkl")

test_input = pd.DataFrame([{
    'age': 35, 'annual_income': 55000,
    'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
    'num_open_accounts': 5, 'num_late_payments': 2,
    'loan_amount': 15000
}])

Profile 100 predictions to get meaningful data:

profiler = cProfile.Profile()
profiler.enable()

for _ in range(100):
    model.predict_proba(test_input)

profiler.disable()

# Print the top 10 slowest functions
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(10)
print(stream.getvalue())

10.2 — Establish a baseline

Run 1,000 predictions and measure the distribution of times:

n_iterations = 1000
times_sklearn = []

for _ in range(n_iterations):
    start = time.perf_counter()
    model.predict_proba(test_input)
    end = time.perf_counter()
    times_sklearn.append((end - start) * 1000)  # Convert to ms

print(f"Baseline (scikit-learn) over {n_iterations} iterations:")
print(f"  Mean:  {mean(times_sklearn):.3f} ms")
print(f"  Std:   {stdev(times_sklearn):.3f} ms")
print(f"  p95:   {np.percentile(times_sklearn, 95):.3f} ms")

10.3 — Optimize with ONNX Runtime

ONNX (Open Neural Network Exchange) is a standard format for ML models. ONNX Runtime is a highly optimized engine that can run models faster than native scikit-learn. (ONNX Runtime docs)

First, convert our sklearn pipeline to ONNX format:

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as ort
import onnx

# Define input shape: [any batch size, 7 features]
initial_type = [('float_input', FloatTensorType([None, 7]))]

# Convert
onnx_model = convert_sklearn(model, initial_types=initial_type, target_opset=12)

# Save
onnx.save_model(onnx_model, "model/credit_model.onnx")
print("ONNX model saved!")

Now benchmark it:

# Create ONNX session
session = ort.InferenceSession("model/credit_model.onnx")
input_name = session.get_inputs()[0].name

# ONNX expects numpy arrays, not DataFrames
test_np = test_input.values.astype(np.float32)

times_onnx = []
for _ in range(n_iterations):
    start = time.perf_counter()
    session.run(None, {input_name: test_np})
    end = time.perf_counter()
    times_onnx.append((end - start) * 1000)

print(f"ONNX Runtime over {n_iterations} iterations:")
print(f"  Mean:  {mean(times_onnx):.3f} ms")
print(f"  Std:   {stdev(times_onnx):.3f} ms")
print(f"  p95:   {np.percentile(times_onnx, 95):.3f} ms")

10.4 — Compare and verify

speedup = mean(times_sklearn) / mean(times_onnx)
improvement = (1 - mean(times_onnx) / mean(times_sklearn)) * 100

print(f"\nComparison:")
print(f"  scikit-learn: {mean(times_sklearn):.3f} ms")
print(f"  ONNX Runtime: {mean(times_onnx):.3f} ms")
print(f"  Speedup:      {speedup:.2f}x")
print(f"  Improvement:  {improvement:.1f}%")

Critical step: verify the optimization doesn't change predictions. Speed is worthless if accuracy drops:

sklearn_pred = model.predict_proba(test_input)[0]
onnx_result = session.run(None, {input_name: test_np})

print(f"\nsklearn proba:  {sklearn_pred}")
print("Predictions match — safe to deploy the optimized version.")

10.5 — Visualize the improvement

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.hist(times_sklearn, bins=50, alpha=0.6, label='scikit-learn', color='steelblue')
ax.hist(times_onnx, bins=50, alpha=0.6, label='ONNX Runtime', color='darkorange')
ax.set_xlabel('Inference Time (ms)')
ax.set_ylabel('Frequency')
ax.set_title('Inference Time: scikit-learn vs ONNX Runtime')
ax.legend()
plt.tight_layout()
plt.show()

git add optimization/
git commit -m "feat: add performance profiling and ONNX optimization"

11. The Final Architecture

Here's what we built:

                     ┌──────────────────────┐
                     │     Developer        │
                     │  (pushes to Git)     │
                     └──────────┬───────────┘
                                │
                                ▼
                     ┌──────────────────────┐
                     │   GitHub + CI/CD     │
                     │  Test → Build → Push │
                     └──────────┬───────────┘
                                │
                                ▼
                     ┌──────────────────────┐
                     │  Docker Container    │
                     │  ┌────────────────┐  │
                     │  │   FastAPI API   │  │
                     │  │  + ML Model     │  │
                     │  │  + Logging      │  │
                     │  └────────────────┘  │
                     └──────────┬───────────┘
                                │
                     ┌──────────┴───────────┐
                     ▼                      ▼
              ┌─────────────┐      ┌───────────────┐
              │ Predictions │      │  Log Storage   │
              │ to clients  │      │  (for drift)   │
              └─────────────┘      └───────┬───────┘
                                           ▼
                                  ┌───────────────┐
                                  │ Drift Analysis │
                                  │ + Performance  │
                                  │   Monitoring   │
                                  └───────────────┘

Every component solves a real problem:

Component	Problem It Solves
Git + GitHub	"What changed and when?"
FastAPI + Pydantic	"How do others use the model safely?"
Pytest	"Will this change break something?"
Docker	"It works on my machine" → works everywhere
GitHub Actions	"Did someone forget to run tests?"
JSON Logging	"What's happening in production?"
Evidently AI	"Is the model still relevant?"
ONNX Runtime	"Can we make it faster?"

12. Let's conclude

After spending a weekend on this (and impressing the Fiverr client), here's what stuck:

Architecture: Start simple, iterate. Get a basic API working before adding Docker and CI/CD. Each layer builds on the previous one.

Performance: Load the model once at startup, never per request. This single choice can make your API 100x faster.

Testing: Test for both valid AND invalid inputs. Your API will receive data you never imagined.

Deployment: Docker eliminates "works on my machine." CI/CD makes it impossible to deploy broken code accidentally.

Monitoring: A deployed model without monitoring is a ticking time bomb. Data drift is real and it's silent.

Optimization: Always profile before optimizing. Measure, don't guess. And always verify that optimization doesn't change predictions.

Resources

If you found this useful, feel free to clap or share. I'm always happy to chat about MLOps and the messy reality of putting ML models into production.

The complete code is available on GitHub.

Table of Contents