DEV Community

Wu DAO
Wu DAO

Posted on

Weekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)

A hands-on, beginner-friendly guide to deploying, monitoring, and optimizing a Machine Learning model in production — from API creation to data drift detection.


Last Saturday morning, I was scrolling through freelance gig postings on Fiverr when I stumbled upon something that caught my attention. A small fintech startup was looking for someone to "take our trained credit scoring model and make it production-ready — API, Docker, CI/CD, the whole shebang." The budget was decent, the deadline was two weeks, and I thought: "How hard can it be?"

Spoiler: it was more involved than I expected. But by Sunday evening, I had a working end-to-end MLOps pipeline running, and I learned an incredible amount in the process. This article is the tutorial I wish I had before starting.

Whether you're a data science student, a junior ML engineer, or someone curious about what happens after a model is trained, this guide will walk you through every step — from serving predictions through an API to catching silent model failures in production.

What we'll build together:

  • A prediction API using FastAPI that serves a credit scoring model
  • Automated tests to make sure our API doesn't break
  • A Docker container to package everything for deployment
  • A CI/CD pipeline with GitHub Actions to automate testing and deployment
  • A data drift analysis to monitor model health over time
  • Performance optimizations to speed up inference

Let's dive in.


Table of Contents

  1. The Big Picture: What Is MLOps?
  2. Setting Up the Project
  3. Training a Simple Credit Scoring Model
  4. Creating a Prediction API with FastAPI
  5. Writing Automated Tests
  6. Containerizing with Docker
  7. Building a CI/CD Pipeline
  8. Logging Production Data
  9. Data Drift Detection
  10. Performance Optimization
  11. The Final Architecture
  12. Key Takeaways

1. The Big Picture: What Is MLOps?

Before we write a single line of code, let's understand the landscape.

The "Last Mile" Problem

Here's a reality check that most online courses don't tell you: training a model is only about 20% of the work in a real ML project. The remaining 80% is everything that happens around it — data pipelines, deployment, monitoring, maintenance.

This is what MLOps (Machine Learning Operations) is about. Think of it as DevOps, but specifically designed for the unique challenges of machine learning systems.

MLOps Lifecycle
The MLOps lifecycle — training is just one piece of a much larger puzzle. (source: ml-ops.org)

What We'll Cover

Pillar What It Means Tool We'll Use
Model Serving Making predictions available via an API FastAPI
Containerization Packaging code + dependencies together Docker
CI/CD Automating tests and deployment GitHub Actions
Monitoring Watching model behavior in production Evidently AI
Optimization Making inference faster cProfile, ONNX
Version Control Tracking every change Git + GitHub

Each pillar solves a real problem. Without containerization, your code works on your machine but breaks on the server. Without CI/CD, every deployment is a manual, error-prone process. Without monitoring, your model silently degrades for months.

Let's start building.


2. Setting Up the Project

Good MLOps starts with good organization. Here's the structure we'll use:

credit-scoring-mlops/
│
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI application
│   ├── model_loader.py      # Model loading logic
│   └── schemas.py           # Input/output validation
│
├── model/
│   └── credit_model.pkl     # Trained model
│
├── tests/
│   ├── __init__.py
│   ├── test_api.py          # API tests
│   └── test_model.py        # Model tests
│
├── notebooks/
│   └── data_drift_analysis.ipynb
│
├── monitoring/
│   └── logger.py            # Logging setup
│
├── .github/
│   └── workflows/
│       └── ci-cd.yml        # CI/CD pipeline
│
├── Dockerfile
├── requirements.txt
├── .gitignore
└── README.md
Enter fullscreen mode Exit fullscreen mode

Why this structure? When someone new joins the project (or when future-you comes back in 6 months), they immediately understand where everything lives. app/ = API code. tests/ = tests. model/ = model files. Simple.

Step by step — initialize the repo

First, create the folder structure:

mkdir credit-scoring-mlops && cd credit-scoring-mlops
mkdir -p app model tests notebooks monitoring .github/workflows
Enter fullscreen mode Exit fullscreen mode

Then, initialize Git:

git init
Enter fullscreen mode Exit fullscreen mode

Now let's create the .gitignore. This file tells Git which files to NOT track. We don't want to accidentally push passwords, large data files, or Python cache files:

cat > .gitignore << 'EOF'
# Python cache files
__pycache__/
*.py[cod]

# Virtual environments
.venv/
venv/

# Data files (too large for Git)
*.csv
*.parquet
data/

# Environment secrets (NEVER commit these)
.env
*.secret

# IDE files
.vscode/
.idea/

# OS files
.DS_Store
Thumbs.db

# Log files
*.log
logs/
EOF
Enter fullscreen mode Exit fullscreen mode

Finally, make the first commit:

git add .gitignore
git commit -m "Initial commit: project structure and .gitignore"
Enter fullscreen mode Exit fullscreen mode

Why does this matter? Every commit is a snapshot. If something breaks later, you can always go back to a working version. Commit messages should describe what changed — this creates a readable project history.


3. Training a Simple Credit Scoring Model

We need a trained model to deploy. In a real scenario, this would come from an earlier modeling phase (tracked with a tool like MLflow). We'll build a quick one here so the tutorial is self-contained.

3.1 — Generate synthetic data

We'll create fake credit data. Each row represents a loan applicant:

import pandas as pd
import numpy as np

np.random.seed(42)  # For reproducibility
n_samples = 5000

data = pd.DataFrame({
    'age': np.random.randint(21, 70, n_samples),
    'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
    'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
    'credit_history_length': np.random.randint(0, 30, n_samples),
    'num_open_accounts': np.random.randint(1, 20, n_samples),
    'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
    'loan_amount': np.random.randint(1000, 50000, n_samples),
})
Enter fullscreen mode Exit fullscreen mode

What's happening here? We're generating 5,000 fake applicants with 7 features each. np.random.seed(42) ensures you get the exact same data every time you run this — reproducibility is key in ML.

3.2 — Create the target variable

Now we need to decide who defaults (1) and who doesn't (0). We simulate this based on common-sense rules: more debt, more late payments → higher chance of default.

default_probability = (
    0.15 * data['debt_to_income_ratio']
    + 0.1 * (data['num_late_payments'] / 10)
    - 0.05 * (data['credit_history_length'] / 30)
    + 0.05 * (data['loan_amount'] / 50000)
    - 0.05 * (data['annual_income'] / data['annual_income'].max())
)

# Keep probabilities in a reasonable range
default_probability = default_probability.clip(0.05, 0.95)

# Generate binary outcomes from these probabilities
data['default'] = np.random.binomial(1, default_probability)
Enter fullscreen mode Exit fullscreen mode
print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")
Enter fullscreen mode Exit fullscreen mode

3.3 — Split into training and test sets

We need two sets: one to train on, one to evaluate with. stratify=y ensures both sets have the same proportion of defaults.

from sklearn.model_selection import train_test_split

feature_columns = [
    'age', 'annual_income', 'debt_to_income_ratio',
    'credit_history_length', 'num_open_accounts',
    'num_late_payments', 'loan_amount'
]

X = data[feature_columns]
y = data['default']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
Enter fullscreen mode Exit fullscreen mode

3.4 — Build a scikit-learn Pipeline

Here's a key decision: we use a Pipeline that bundles preprocessing (scaling) and the model together into a single object. Why? Because when we deploy, we only need to load one file that handles everything.


from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('scaler', StandardScaler()),           # Step 1: Normalize features
    ('classifier', GradientBoostingClassifier(
        n_estimators=100,
        max_depth=4,
        learning_rate=0.1,
        random_state=42
    ))                                       # Step 2: Classify
])
Enter fullscreen mode Exit fullscreen mode

What's a Pipeline? Think of it like an assembly line. Data goes in one end → gets scaled → gets classified → prediction comes out. The beauty is that pipeline.predict(new_data) automatically applies scaling first, then prediction. No extra steps needed.

3.5 — Train and evaluate

pipeline.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

That single line does all the work. Now let's see how good it is:

from sklearn.metrics import classification_report, roc_auc_score

y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]

print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")
Enter fullscreen mode Exit fullscreen mode

3.6 — Save the model and reference data

Two things to save:

  1. The model — this is what we'll deploy
  2. The reference data (training data) — we'll need this later to detect drift
import joblib
import os

os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)

# Save the trained pipeline
joblib.dump(pipeline, 'model/credit_model.pkl')

# Save reference data for drift analysis later
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)

print("Model saved to model/credit_model.pkl")
print("Reference data saved to data/")
Enter fullscreen mode Exit fullscreen mode

Commit this progress:

git add train_model.py
git commit -m "feat: add model training script and initial model artifact"
Enter fullscreen mode Exit fullscreen mode

3.7 — Full training script (copy-paste ready)

Here's everything from section 3 combined into a single runnable file:

Click to expand: train_model.py (complete)

# train_model.py — Complete model training script

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, roc_auc_score
import joblib
import os

# --- Generate synthetic data ---
np.random.seed(42)
n_samples = 5000

data = pd.DataFrame({
    'age': np.random.randint(21, 70, n_samples),
    'annual_income': np.random.lognormal(mean=10.5, sigma=0.8, size=n_samples).astype(int),
    'debt_to_income_ratio': np.random.uniform(0, 1.5, n_samples).round(3),
    'credit_history_length': np.random.randint(0, 30, n_samples),
    'num_open_accounts': np.random.randint(1, 20, n_samples),
    'num_late_payments': np.random.poisson(lam=1.5, size=n_samples),
    'loan_amount': np.random.randint(1000, 50000, n_samples),
})

# --- Create target ---
default_probability = (
    0.15 * data['debt_to_income_ratio']
    + 0.1 * (data['num_late_payments'] / 10)
    - 0.05 * (data['credit_history_length'] / 30)
    + 0.05 * (data['loan_amount'] / 50000)
    - 0.05 * (data['annual_income'] / data['annual_income'].max())
)
default_probability = default_probability.clip(0.05, 0.95)
data['default'] = np.random.binomial(1, default_probability)

print(f"Dataset shape: {data.shape}")
print(f"Default rate: {data['default'].mean():.2%}")

# --- Split ---
feature_columns = [
    'age', 'annual_income', 'debt_to_income_ratio',
    'credit_history_length', 'num_open_accounts',
    'num_late_payments', 'loan_amount'
]
X = data[feature_columns]
y = data['default']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# --- Train ---
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', GradientBoostingClassifier(
        n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42
    ))
])
pipeline.fit(X_train, y_train)

# --- Evaluate ---
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

# --- Save ---
os.makedirs('model', exist_ok=True)
os.makedirs('data', exist_ok=True)
joblib.dump(pipeline, 'model/credit_model.pkl')
X_train.to_csv('data/reference_data.csv', index=False)
X_test.to_csv('data/test_data.csv', index=False)
print("\nModel and reference data saved.")
Enter fullscreen mode Exit fullscreen mode

4. Creating a Prediction API with FastAPI

Now we get to the real MLOps work. We have a trained model sitting in a .pkl file. How do we let other people use it?

4.1 — What is an API?

An API (Application Programming Interface) is like a waiter in a restaurant. You (the client) tell the waiter what you want, the waiter goes to the kitchen (the model), and brings back your food (the prediction).

Without an API, anyone who wants a prediction needs to install Python, install all dependencies, download the model, and write code to use it. That doesn't scale.

REST API concept
A REST API acts as an intermediary between clients and your model. (source: SmartBear)

FastAPI is a modern Python framework for building APIs. It's fast, auto-generates documentation, and uses Python type hints for automatic validation. It's the go-to choice for ML engineers. (FastAPI docs)

4.2 — The critical rule: load your model ONCE

This is a mistake I see beginners make constantly. Watch:

# BAD — loads the model on EVERY request
@app.post("/predict")
async def predict(data):
    model = joblib.load("model/credit_model.pkl")  # SLOW! Every. Single. Time.
    return model.predict(data)
Enter fullscreen mode Exit fullscreen mode

If your model file is 50MB and you get 100 requests per second, you're loading 50MB from disk 100 times per second. The API will grind to a halt.

# GOOD — load once, reuse forever
model = None

def load_model():
    global model
    model = joblib.load("model/credit_model.pkl")  # Loaded ONCE at startup

@app.post("/predict")
async def predict(data):
    return model.predict(data)  # Uses the already-loaded model
Enter fullscreen mode Exit fullscreen mode

Let's build this properly.

4.3 — The model loader module

Create app/model_loader.py. This module has one job: load the model once and provide access to it.

# app/model_loader.py

import joblib
import os
Enter fullscreen mode Exit fullscreen mode

First, we create a global variable to hold the model. It starts as None (nothing loaded yet):

_model = None
Enter fullscreen mode Exit fullscreen mode

Now the function that loads the model from disk:

def load_model():
    """Load the model ONCE at startup."""
    global _model

    # Allow the path to be configured via environment variable
    model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")

    if not os.path.exists(model_path):
        raise FileNotFoundError(
            f"Model not found at {model_path}. "
            f"Run train_model.py first."
        )

    _model = joblib.load(model_path)
    print(f"Model loaded from {model_path}")
    return _model
Enter fullscreen mode Exit fullscreen mode

Why os.environ.get()? This makes the path flexible. On your laptop, the model is at model/credit_model.pkl. Inside a Docker container, it might be somewhere else. Environment variables let you configure this without changing code.

And a function to retrieve the loaded model:

def get_model():
    """Get the model that was loaded at startup."""
    if _model is None:
        raise RuntimeError("Model not loaded! Call load_model() first.")
    return _model
Enter fullscreen mode Exit fullscreen mode

4.4 — Input validation with Pydantic

In a Jupyter notebook, you control the data. In production, you have no idea what's coming in. Someone might send:

  • Text where a number is expected
  • Negative ages
  • Missing fields entirely

Pydantic solves this. You define a schema (a "shape") for your data, and FastAPI automatically rejects anything that doesn't match. Let's build it piece by piece.

Create app/schemas.py:

from pydantic import BaseModel, Field, field_validator
Enter fullscreen mode Exit fullscreen mode

Now define what a valid credit application looks like:

class CreditApplication(BaseModel):
    age: int = Field(
        ...,        # The ... means "this field is required"
        ge=18,      # ge = "greater than or equal to"
        le=120,     # le = "less than or equal to"
        description="Applicant's age in years"
    )
Enter fullscreen mode Exit fullscreen mode

What's Field(..., ge=18, le=120)? It says: "This field is required, must be an integer, at least 18, at most 120." If someone sends age: -5, FastAPI automatically returns an error without you writing a single if statement.

Let's add the remaining fields the same way:

    annual_income: int = Field(
        ..., gt=0,           # gt = "greater than" (strictly positive)
        description="Annual income in dollars"
    )

    debt_to_income_ratio: float = Field(
        ..., ge=0.0, le=10.0,
        description="Monthly debt / monthly income"
    )

    credit_history_length: int = Field(
        ..., ge=0, le=80,
        description="Credit history in years"
    )

    num_open_accounts: int = Field(
        ..., ge=0, le=100,
        description="Number of open credit accounts"
    )

    num_late_payments: int = Field(
        ..., ge=0,
        description="Number of late payments"
    )

    loan_amount: int = Field(
        ..., gt=0,
        description="Requested loan amount in dollars"
    )
Enter fullscreen mode Exit fullscreen mode

We can also add custom business logic validation. For example, you can't have 30 years of credit history if you're 25 years old:

    @field_validator('credit_history_length')
    @classmethod
    def history_cannot_exceed_age(cls, v, info):
        if 'age' in info.data and v > info.data['age'] - 18:
            raise ValueError(
                f"Credit history ({v}y) can't exceed age minus 18"
            )
        return v
Enter fullscreen mode Exit fullscreen mode

Now define what the API returns:

class PredictionResponse(BaseModel):
    prediction: int = Field(description="0 = No Default, 1 = Default")
    probability_of_default: float = Field(description="Probability from 0.0 to 1.0")
    risk_category: str = Field(description="Low, Medium, or High")


class HealthResponse(BaseModel):
    status: str
    model_loaded: bool
Enter fullscreen mode Exit fullscreen mode

4.5 — The FastAPI application

Now we wire everything together. Create app/main.py:

Start with imports:

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone

from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model
Enter fullscreen mode Exit fullscreen mode

Set up logging (we'll use this for monitoring later):

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")
Enter fullscreen mode Exit fullscreen mode

Define what happens when the app starts up. This is where we load the model once:

@asynccontextmanager
async def lifespan(app: FastAPI):
    """Runs at startup (before yield) and shutdown (after yield)."""
    logger.info("Starting up — loading model...")
    load_model()
    logger.info("Model loaded. Ready to serve predictions.")
    yield
    logger.info("Shutting down.")
Enter fullscreen mode Exit fullscreen mode

What's @asynccontextmanager? It's FastAPI's way of saying "run this code when the app starts, and that code when it stops." The model gets loaded once, right at startup.

Create the app:

app = FastAPI(
    title="Credit Scoring API",
    description="Predict loan default probability",
    version="1.0.0",
    lifespan=lifespan,
)
Enter fullscreen mode Exit fullscreen mode

Add a health check endpoint. Every production API needs one — it's how load balancers and monitoring tools know the service is alive:

@app.get("/health", response_model=HealthResponse)
async def health_check():
    try:
        model = get_model()
        return HealthResponse(status="healthy", model_loaded=True)
    except RuntimeError:
        return HealthResponse(status="unhealthy", model_loaded=False)
Enter fullscreen mode Exit fullscreen mode

Now the star of the show — the prediction endpoint:

@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
    start_time = time.time()

    try:
        model = get_model()
Enter fullscreen mode Exit fullscreen mode

Convert the validated input into a DataFrame (what scikit-learn expects):

        input_data = pd.DataFrame([{
            'age': application.age,
            'annual_income': application.annual_income,
            'debt_to_income_ratio': application.debt_to_income_ratio,
            'credit_history_length': application.credit_history_length,
            'num_open_accounts': application.num_open_accounts,
            'num_late_payments': application.num_late_payments,
            'loan_amount': application.loan_amount,
        }])
Enter fullscreen mode Exit fullscreen mode

Get the prediction and probability:

        prediction = int(model.predict(input_data)[0])
        probability = float(model.predict_proba(input_data)[0][1])
Enter fullscreen mode Exit fullscreen mode

What's predict_proba()? It returns probabilities instead of just 0/1. For example, [0.82, 0.18] means 82% chance of no default, 18% chance of default. The [0][1] grabs the probability of default (class 1).

Map the probability to a human-readable risk level:

        if probability < 0.3:
            risk_category = "Low"
        elif probability < 0.6:
            risk_category = "Medium"
        else:
            risk_category = "High"
Enter fullscreen mode Exit fullscreen mode

Calculate how long the prediction took, and log everything:

        inference_time_ms = (time.time() - start_time) * 1000

        log_entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction",
            "inputs": application.model_dump(),
            "outputs": {
                "prediction": prediction,
                "probability_of_default": round(probability, 4),
                "risk_category": risk_category,
            },
            "inference_time_ms": round(inference_time_ms, 2),
        }
        logger.info(json.dumps(log_entry))
Enter fullscreen mode Exit fullscreen mode

Why log all of this? These logs are gold. They'll be used later for drift detection (comparing production inputs to training data), performance monitoring (is the API getting slower?), and debugging (what happened when a wrong prediction was made?).

Return the response:

        return PredictionResponse(
            prediction=prediction,
            probability_of_default=round(probability, 4),
            risk_category=risk_category,
        )

    except Exception as e:
        logger.error(json.dumps({
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction_error",
            "error": str(e),
            "inputs": application.model_dump(),
        }))
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")
Enter fullscreen mode Exit fullscreen mode

4.6 — Test it locally

pip install fastapi uvicorn scikit-learn joblib pandas pydantic

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8000/docs — FastAPI auto-generates interactive Swagger documentation:

Swagger UI
FastAPI generates this Swagger UI automatically. You can test your API right from the browser.

Test with curl:

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "annual_income": 55000,
    "debt_to_income_ratio": 0.35,
    "credit_history_length": 12,
    "num_open_accounts": 5,
    "num_late_payments": 2,
    "loan_amount": 15000
  }'
Enter fullscreen mode Exit fullscreen mode

4.7 — Full API code (copy-paste ready)

Click to expand: app/model_loader.py (complete)

# app/model_loader.py

import joblib
import os

_model = None

def load_model():
    global _model
    model_path = os.environ.get("MODEL_PATH", "model/credit_model.pkl")
    if not os.path.exists(model_path):
        raise FileNotFoundError(f"Model not found at {model_path}")
    _model = joblib.load(model_path)
    print(f"Model loaded from {model_path}")
    return _model

def get_model():
    if _model is None:
        raise RuntimeError("Model not loaded! Call load_model() first.")
    return _model
Enter fullscreen mode Exit fullscreen mode

Click to expand: app/schemas.py (complete)

# app/schemas.py

from pydantic import BaseModel, Field, field_validator

class CreditApplication(BaseModel):
    age: int = Field(..., ge=18, le=120, description="Applicant's age")
    annual_income: int = Field(..., gt=0, description="Annual income ($)")
    debt_to_income_ratio: float = Field(..., ge=0.0, le=10.0, description="DTI ratio")
    credit_history_length: int = Field(..., ge=0, le=80, description="Credit history (years)")
    num_open_accounts: int = Field(..., ge=0, le=100, description="Open accounts")
    num_late_payments: int = Field(..., ge=0, description="Late payments")
    loan_amount: int = Field(..., gt=0, description="Loan amount ($)")

    @field_validator('credit_history_length')
    @classmethod
    def history_cannot_exceed_age(cls, v, info):
        if 'age' in info.data and v &gt; info.data['age'] - 18:
            raise ValueError(f"Credit history ({v}y) can't exceed age minus 18")
        return v

class PredictionResponse(BaseModel):
    prediction: int = Field(description="0=No Default, 1=Default")
    probability_of_default: float = Field(description="Probability 0.0 to 1.0")
    risk_category: str = Field(description="Low, Medium, or High")

class HealthResponse(BaseModel):
    status: str
    model_loaded: bool
Enter fullscreen mode Exit fullscreen mode

Click to expand: app/main.py (complete)

# app/main.py

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import time
import logging
import json
from datetime import datetime, timezone

from app.schemas import CreditApplication, PredictionResponse, HealthResponse
from app.model_loader import load_model, get_model

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("credit_scoring_api")

@asynccontextmanager
async def lifespan(app: FastAPI):
    logger.info("Starting up — loading model...")
    load_model()
    logger.info("Model loaded. Ready.")
    yield
    logger.info("Shutting down.")

app = FastAPI(
    title="Credit Scoring API",
    description="Predict loan default probability",
    version="1.0.0",
    lifespan=lifespan,
)

@app.get("/health", response_model=HealthResponse)
async def health_check():
    try:
        model = get_model()
        return HealthResponse(status="healthy", model_loaded=True)
    except RuntimeError:
        return HealthResponse(status="unhealthy", model_loaded=False)

@app.post("/predict", response_model=PredictionResponse)
async def predict(application: CreditApplication):
    start_time = time.time()
    try:
        model = get_model()
        input_data = pd.DataFrame([{
            'age': application.age,
            'annual_income': application.annual_income,
            'debt_to_income_ratio': application.debt_to_income_ratio,
            'credit_history_length': application.credit_history_length,
            'num_open_accounts': application.num_open_accounts,
            'num_late_payments': application.num_late_payments,
            'loan_amount': application.loan_amount,
        }])
        prediction = int(model.predict(input_data)[0])
        probability = float(model.predict_proba(input_data)[0][1])

        if probability &lt; 0.3:
            risk_category = "Low"
        elif probability &lt; 0.6:
            risk_category = "Medium"
        else:
            risk_category = "High"

        inference_time_ms = (time.time() - start_time) * 1000

        log_entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction",
            "inputs": application.model_dump(),
            "outputs": {
                "prediction": prediction,
                "probability_of_default": round(probability, 4),
                "risk_category": risk_category,
            },
            "inference_time_ms": round(inference_time_ms, 2),
        }
        logger.info(json.dumps(log_entry))

        return PredictionResponse(
            prediction=prediction,
            probability_of_default=round(probability, 4),
            risk_category=risk_category,
        )
    except Exception as e:
        logger.error(json.dumps({
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "event": "prediction_error",
            "error": str(e),
            "inputs": application.model_dump(),
        }))
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")
Enter fullscreen mode Exit fullscreen mode
git add app/
git commit -m "feat: implement FastAPI prediction API with validation and logging"
Enter fullscreen mode Exit fullscreen mode

5. Writing Automated Tests

Why test?

Imagine you change one line of code that accidentally breaks input validation. Without tests, this bug goes to production. With tests in your CI/CD pipeline, the bug gets caught before it ever reaches users.

We'll write two kinds:

  • Unit tests: test individual functions ("does the model return a valid prediction?")
  • Integration tests: test the full flow ("does the API endpoint respond correctly?")

5.1 — Setting up the test client

FastAPI provides a TestClient that simulates HTTP requests without starting a real server:

# tests/test_api.py

import pytest
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)
Enter fullscreen mode Exit fullscreen mode

What's TestClient? It's a fake browser. You can send GET and POST requests to your API and check the responses, all without starting a server. Tests run in milliseconds.

5.2 — Test the health endpoint

The simplest test first:

def test_health_returns_200():
    response = client.get("/health")
    assert response.status_code == 200

def test_health_reports_model_loaded():
    response = client.get("/health")
    data = response.json()
    assert data["status"] == "healthy"
    assert data["model_loaded"] is True
Enter fullscreen mode Exit fullscreen mode

What's assert? It means "this must be true, or the test fails." assert response.status_code == 200 says "the server must return HTTP 200 (OK)."

5.3 — Test valid predictions

Now let's test the actual prediction endpoint with good data:

def test_valid_prediction_returns_200():
    payload = {
        "age": 35, "annual_income": 55000,
        "debt_to_income_ratio": 0.35, "credit_history_length": 12,
        "num_open_accounts": 5, "num_late_payments": 2,
        "loan_amount": 15000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 200
Enter fullscreen mode Exit fullscreen mode

Check that the response contains all expected fields:

def test_response_has_required_fields():
    payload = {
        "age": 45, "annual_income": 80000,
        "debt_to_income_ratio": 0.20, "credit_history_length": 20,
        "num_open_accounts": 3, "num_late_payments": 0,
        "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    data = response.json()

    assert "prediction" in data
    assert "probability_of_default" in data
    assert "risk_category" in data
Enter fullscreen mode Exit fullscreen mode

Check that values are in expected ranges:

def test_prediction_is_binary():
    payload = {
        "age": 30, "annual_income": 40000,
        "debt_to_income_ratio": 0.50, "credit_history_length": 5,
        "num_open_accounts": 8, "num_late_payments": 4,
        "loan_amount": 20000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert data["prediction"] in [0, 1]

def test_probability_between_0_and_1():
    payload = {
        "age": 28, "annual_income": 35000,
        "debt_to_income_ratio": 0.60, "credit_history_length": 3,
        "num_open_accounts": 6, "num_late_payments": 5,
        "loan_amount": 25000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert 0.0 <= data["probability_of_default"] <= 1.0

def test_risk_category_is_valid():
    payload = {
        "age": 50, "annual_income": 100000,
        "debt_to_income_ratio": 0.10, "credit_history_length": 25,
        "num_open_accounts": 2, "num_late_payments": 0,
        "loan_amount": 5000
    }
    response = client.post("/predict", json=payload)
    data = response.json()
    assert data["risk_category"] in ["Low", "Medium", "High"]
Enter fullscreen mode Exit fullscreen mode

5.4 — Test invalid inputs

This is equally important. Our API should reject bad data with a clear error (HTTP 422):

def test_negative_age_rejected():
    """Age of -5 should be rejected."""
    payload = {
        "age": -5,  # INVALID
        "annual_income": 50000, "debt_to_income_ratio": 0.30,
        "credit_history_length": 10, "num_open_accounts": 3,
        "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422  # 422 = Unprocessable Entity
Enter fullscreen mode Exit fullscreen mode
def test_zero_income_rejected():
    """Income must be strictly positive."""
    payload = {
        "age": 30, "annual_income": 0,  # INVALID
        "debt_to_income_ratio": 0.30, "credit_history_length": 10,
        "num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422
Enter fullscreen mode Exit fullscreen mode
def test_missing_field_rejected():
    """Omitting a required field should fail."""
    payload = {
        "age": 30,
        # annual_income is MISSING
        "debt_to_income_ratio": 0.30, "credit_history_length": 10,
        "num_open_accounts": 3, "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422
Enter fullscreen mode Exit fullscreen mode
def test_wrong_type_rejected():
    """Sending text where a number is expected should fail."""
    payload = {
        "age": "thirty",  # WRONG TYPE
        "annual_income": 50000, "debt_to_income_ratio": 0.30,
        "credit_history_length": 10, "num_open_accounts": 3,
        "num_late_payments": 1, "loan_amount": 10000
    }
    response = client.post("/predict", json=payload)
    assert response.status_code == 422
Enter fullscreen mode Exit fullscreen mode

5.5 — Test the model directly

We also test the model itself, separately from the API:

# tests/test_model.py

import joblib
import pandas as pd

def test_model_loads():
    model = joblib.load("model/credit_model.pkl")
    assert model is not None

def test_model_has_predict():
    model = joblib.load("model/credit_model.pkl")
    assert hasattr(model, 'predict')
    assert hasattr(model, 'predict_proba')

def test_model_returns_one_prediction():
    model = joblib.load("model/credit_model.pkl")
    test_input = pd.DataFrame([{
        'age': 35, 'annual_income': 55000,
        'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
        'num_open_accounts': 5, 'num_late_payments': 2,
        'loan_amount': 15000
    }])
    prediction = model.predict(test_input)
    assert len(prediction) == 1
Enter fullscreen mode Exit fullscreen mode

5.6 — Run the tests

pip install pytest httpx

pytest tests/ -v
Enter fullscreen mode Exit fullscreen mode

You should see all tests passing with green check marks.

git add tests/
git commit -m "feat: add unit and integration tests for API and model"
Enter fullscreen mode Exit fullscreen mode

6. Containerizing with Docker

What is Docker?

Your API works on your laptop. But will it work on a server? Maybe the server has a different Python version, or a missing library.

Docker packages your application with its entire environment — OS, Python, libraries, everything — into a self-contained unit called a container. Think of it like shipping your laptop inside the package instead of just the code.

Docker concept
Docker containers package your app with everything it needs. (source: docker.com)

6.1 — The requirements file

First, list all Python dependencies. Pin the versions — without this, a library update could silently break things months later:

# requirements.txt

fastapi>=0.104.0
uvicorn>=0.24.0
scikit-learn>=1.3.0
joblib>=1.3.0
pandas>=2.0.0
pydantic>=2.0.0
numpy>=1.24.0
Enter fullscreen mode Exit fullscreen mode

6.2 — The Dockerfile, line by line

A Dockerfile is a recipe. Each line is one instruction:


FROM python:3.11-slim
Enter fullscreen mode Exit fullscreen mode

What this does: Start from a minimal Python 3.11 image. The -slim variant is ~150MB instead of ~900MB. Less bloat = faster builds.

WORKDIR /app
Enter fullscreen mode Exit fullscreen mode

What this does: All subsequent commands run inside /app in the container. Like doing cd /app.

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

What this does: Copy the requirements file and install dependencies. Why copy this file separately? Docker caches each step. If requirements.txt hasn't changed, Docker skips the slow pip install on rebuilds. This can save minutes.

COPY app/ ./app/
COPY model/ ./model/
Enter fullscreen mode Exit fullscreen mode

What this does: Copy our application code and model into the container.

EXPOSE 8000
Enter fullscreen mode Exit fullscreen mode

What this does: Documents which port the container uses. It doesn't actually open the port — that's done at runtime.

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

What this does: The command that runs when the container starts. --host 0.0.0.0 means "listen on all network interfaces" — required inside a container (using localhost won't work from outside).

6.3 — Build and run

# Build the image (give it a name with -t)
docker build -t credit-scoring-api .

# Run the container
# -p 8000:8000 maps your machine's port 8000 to the container's port 8000
docker run -p 8000:8000 credit-scoring-api

# Test it
curl http://localhost:8000/health
Enter fullscreen mode Exit fullscreen mode

6.4 — Full Dockerfile (copy-paste ready)

Click to expand: Dockerfile (complete)

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
COPY model/ ./model/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode
git add Dockerfile requirements.txt
git commit -m "feat: add Dockerfile for containerized deployment"
Enter fullscreen mode Exit fullscreen mode

7. Building a CI/CD Pipeline with GitHub Actions

What is CI/CD?

  • Continuous Integration (CI): Every time you push code, tests run automatically
  • Continuous Deployment (CD): If tests pass, the code gets deployed automatically

Without CI/CD: you manually run tests (if you remember), manually build Docker, manually deploy. With CI/CD: push code → everything happens automatically. (GitHub Actions docs)

7.1 — The pipeline structure

Our pipeline has 3 stages that run in sequence:

PUSH to main
    │
    ▼
┌──────────┐     ┌──────────┐     ┌──────────┐
│   TEST   │ ──▶ │  BUILD   │ ──▶ │  DEPLOY  │
│  pytest  │     │  docker  │     │  push to  │
│          │     │  build   │     │  registry │
└──────────┘     └──────────┘     └──────────┘
     │                │                │
  If FAIL:         If FAIL:        If FAIL:
  STOP HERE        STOP HERE       STOP HERE
Enter fullscreen mode Exit fullscreen mode

Key insight: If tests fail, we don't waste time building. If building fails, we don't try to deploy. Each stage only runs if the previous one succeeded.

7.2 — The YAML file, block by block

Create .github/workflows/ci-cd.yml:

First, define when the pipeline runs:

name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
Enter fullscreen mode Exit fullscreen mode

This means: "Run on every push to main and on every pull request targeting main."

Stage 1 — TEST:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install pytest httpx

      - name: Generate model
        run: python train_model.py

      - name: Run tests
        run: pytest tests/ -v --tb=short
Enter fullscreen mode Exit fullscreen mode

What's happening? GitHub spins up a fresh Ubuntu machine, installs Python, installs our dependencies, trains the model (in a real project you'd download it from a model registry), and runs pytest.

Stage 2 — BUILD (only runs if Stage 1 passes):

  build:
    runs-on: ubuntu-latest
    needs: test            # <-- This is the key: "needs" means "wait for test"
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Generate model
        run: |
          pip install scikit-learn pandas joblib numpy
          python train_model.py

      - name: Build Docker image
        run: docker build -t credit-scoring-api:${{ github.sha }} .

      - name: Smoke test the container
        run: |
          docker run -d -p 8000:8000 --name test-api credit-scoring-api:${{ github.sha }}
          sleep 10
          curl --fail http://localhost:8000/health || exit 1
          docker stop test-api
Enter fullscreen mode Exit fullscreen mode

What's a smoke test? We start the container and hit the health endpoint. If it doesn't respond, something is broken. curl --fail returns an error if the HTTP response indicates failure.

Stage 3 — DEPLOY (only from main branch, only if build passed):

  deploy:
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'  # Only deploy from main
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Generate model
        run: |
          pip install scikit-learn pandas joblib numpy
          python train_model.py

      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_TOKEN }}

      - name: Build and push
        run: |
          docker build -t ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest .
          docker push ${{ secrets.DOCKER_USERNAME }}/credit-scoring-api:latest
Enter fullscreen mode Exit fullscreen mode

What are secrets? Credentials stored securely in GitHub Settings → Secrets. They're never visible in logs. Never hardcode passwords in your code or YAML files.

git add .github/
git commit -m "feat: add CI/CD pipeline with GitHub Actions"
Enter fullscreen mode Exit fullscreen mode

8. Logging Production Data

Why log everything?

Once your model is live, you're flying blind unless you collect data. Logging serves:

  1. Debugging: When something breaks at 2 AM, logs tell you what happened
  2. Drift detection: Compare production inputs against training data
  3. Performance: Track how fast (or slow) predictions are
  4. Auditing: In finance, you need records of every decision

8.1 — What to store

What Why Example
Input features Drift detection age: 35, income: 55000, ...
Prediction Performance monitoring prediction: 0
Probability Score distribution probability: 0.18
Inference time Latency monitoring 12.3 ms
Timestamp Time-based analysis 2024-03-15T14:30:00Z
Errors Debugging ValueError: ...

8.2 — Structured logging

We already added logging in our main.py (section 4.5). The key is using JSON format — it's machine-parseable, so monitoring tools (ELK Stack, Datadog, CloudWatch) can automatically index every field.

Here's a dedicated logger with file rotation (prevents logs from filling up the disk):

# monitoring/logger.py

import logging
import os
from logging.handlers import RotatingFileHandler
Enter fullscreen mode Exit fullscreen mode
def setup_production_logger(name="credit_scoring_api", log_dir="logs"):
    os.makedirs(log_dir, exist_ok=True)

    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)

    if logger.handlers:  # Avoid duplicates
        return logger
Enter fullscreen mode Exit fullscreen mode

The file handler rotates logs when they reach 10 MB, keeping 5 old files:

    file_handler = RotatingFileHandler(
        filename=os.path.join(log_dir, "predictions.log"),
        maxBytes=10_000_000,  # 10 MB per file
        backupCount=5,        # Keep 5 rotated files
    )
Enter fullscreen mode Exit fullscreen mode
    console_handler = logging.StreamHandler()

    # Both handlers use the same format
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(message)s')
    file_handler.setFormatter(formatter)
    console_handler.setFormatter(formatter)

    logger.addHandler(file_handler)
    logger.addHandler(console_handler)

    return logger
Enter fullscreen mode Exit fullscreen mode

In production, these log files would be shipped to a centralized logging platform (Elasticsearch, Datadog, CloudWatch). For our tutorial, local files are enough — the important thing is that we're capturing the data.

git add monitoring/
git commit -m "feat: add production logging with file rotation"
Enter fullscreen mode Exit fullscreen mode

9. Data Drift Detection with Evidently

What is data drift?

Your model learned patterns from training data. But the real world changes. People's financial behavior shifts due to economic events, policy changes, or seasonal patterns.

Data drift = the data your model receives in production starts looking significantly different from what it was trained on. This is your early warning system — it tells you when the model might need retraining.

When production data drifts away from training data, model performance can degrade. (source: Evidently AI)

Types of drift

Type What changes Example
Data drift Input distributions Average income of applicants increases
Concept drift Feature-target relationship Same income now means higher default risk
Prediction drift Output distribution Model starts predicting more defaults

9.1 — Load the reference data

The reference is our training data — what the model was built on:

import pandas as pd
import numpy as np

reference_data = pd.read_csv('data/reference_data.csv')
print(f"Reference data: {reference_data.shape}")
Enter fullscreen mode Exit fullscreen mode

9.2 — Simulate production data with drift

In real life, this would come from your API logs. Here we simulate production data that has intentional drift — so we can see the detection in action:

np.random.seed(123)
n_production = 1000

current_data = pd.DataFrame({
    # DRIFT: younger applicants (new customer segment)
    'age': np.random.randint(18, 55, n_production),

    # DRIFT: higher incomes (economic growth)
    'annual_income': np.random.lognormal(
        mean=10.8, sigma=0.7, size=n_production  # Was 10.5
    ).astype(int),

    # DRIFT: higher debt ratios (inflation effect)
    'debt_to_income_ratio': np.random.uniform(
        0.1, 1.8, n_production  # Was 0 to 1.5
    ).round(3),

    # STABLE: no significant change
    'credit_history_length': np.random.randint(0, 30, n_production),
    'num_open_accounts': np.random.randint(1, 20, n_production),

    # SLIGHT DRIFT: more late payments
    'num_late_payments': np.random.poisson(lam=2.2, size=n_production),

    # DRIFT: higher loan amounts
    'loan_amount': np.random.randint(5000, 65000, n_production),
})

print(f"Production data: {current_data.shape}")
Enter fullscreen mode Exit fullscreen mode

9.3 — Visualize the distributions

Before running statistical tests, always look at the data:

import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 3, figsize=(16, 12))
fig.suptitle('Reference (blue) vs Production (orange)', fontsize=14)

for idx, feature in enumerate(reference_data.columns):
    row, col = idx // 3, idx % 3
    ax = axes[row][col]
    ax.hist(reference_data[feature], bins=30, alpha=0.5,
            label='Reference', color='steelblue', density=True)
    ax.hist(current_data[feature], bins=30, alpha=0.5,
            label='Production', color='darkorange', density=True)
    ax.set_title(feature)
    ax.legend(fontsize=8)

# Hide unused subplot
axes[2][2].set_visible(False)
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

You should clearly see the distributions shifting for several features.

9.4 — Run the Evidently drift report

Evidently AI (docs) compares reference vs. current data using statistical tests and tells you which features have drifted:

from evidently.report import Report
from evidently.metrics import DatasetDriftMetric, DataDriftTable
Enter fullscreen mode Exit fullscreen mode

Create and run the report:

drift_report = Report(metrics=[
    DatasetDriftMetric(),   # Overall: is there drift?
    DataDriftTable(),       # Per-feature: which ones drifted?
])

drift_report.run(
    reference_data=reference_data,
    current_data=current_data,
)
Enter fullscreen mode Exit fullscreen mode

Save the interactive HTML report:

import os
os.makedirs('monitoring', exist_ok=True)
drift_report.save_html("monitoring/drift_report.html")
print("Report saved — open monitoring/drift_report.html in your browser")
Enter fullscreen mode Exit fullscreen mode

9.5 — Extract results programmatically

Pretty charts are nice, but for automated monitoring you need to extract results as data:

report_dict = drift_report.as_dict()

# Overall drift result
dataset_drift = report_dict['metrics'][0]['result']
print(f"Drift detected: {'YES' if dataset_drift['dataset_drift'] else 'NO'}")
print(f"Drifted features: {dataset_drift['number_of_drifted_columns']}"
      f" / {dataset_drift['number_of_columns']}")
Enter fullscreen mode Exit fullscreen mode

Per-feature breakdown:

drift_table = report_dict['metrics'][1]['result']

print(f"\n{'Feature':<28} {'Drifted?':<10} {'Score':<12} {'Test'}")
print("-" * 70)

for col, info in drift_table['drift_by_columns'].items():
    status = "YES" if info['drift_detected'] else "no"
    score = info['drift_score']
    test = info['stattest_name']
    flag = " << ALERT" if info['drift_detected'] else ""
    print(f"{col:<28} {status:<10} {score:<12.6f} {test}{flag}")
Enter fullscreen mode Exit fullscreen mode

9.6 — Interpret the results

Running a tool is easy. The real skill is interpreting what the results mean:

if dataset_drift['dataset_drift']:
    print("""
    ACTION REQUIRED:
    1. Check model accuracy on recent production data
    2. Investigate root cause (market shift? data bug?)
    3. Retrain if performance has degraded
    4. Set up automated alerts for future drift
    """)
else:
    print("No significant drift. Continue monitoring weekly.")
Enter fullscreen mode Exit fullscreen mode

And a statistical comparison table:

comparison = pd.DataFrame({
    'Feature': reference_data.columns,
    'Training Mean': reference_data.mean().round(2).values,
    'Production Mean': current_data.mean().round(2).values,
})
comparison['Shift %'] = (
    (comparison['Production Mean'] - comparison['Training Mean'])
    / comparison['Training Mean'] * 100
).round(1)

print("\n", comparison.to_string(index=False))
Enter fullscreen mode Exit fullscreen mode
git add notebooks/ monitoring/
git commit -m "feat: add data drift analysis with Evidently AI"
Enter fullscreen mode Exit fullscreen mode

10. Performance Optimization

The Fiverr client wanted a fast API. Let's measure and improve.

10.1 — Profile with cProfile

cProfile is Python's built-in profiler. It tells you exactly where time is spent:

import cProfile
import pstats
import io
import time
from statistics import mean, stdev
Enter fullscreen mode Exit fullscreen mode
model = joblib.load("model/credit_model.pkl")

test_input = pd.DataFrame([{
    'age': 35, 'annual_income': 55000,
    'debt_to_income_ratio': 0.35, 'credit_history_length': 12,
    'num_open_accounts': 5, 'num_late_payments': 2,
    'loan_amount': 15000
}])
Enter fullscreen mode Exit fullscreen mode

Profile 100 predictions to get meaningful data:

profiler = cProfile.Profile()
profiler.enable()

for _ in range(100):
    model.predict_proba(test_input)

profiler.disable()

# Print the top 10 slowest functions
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(10)
print(stream.getvalue())
Enter fullscreen mode Exit fullscreen mode

10.2 — Establish a baseline

Run 1,000 predictions and measure the distribution of times:

n_iterations = 1000
times_sklearn = []

for _ in range(n_iterations):
    start = time.perf_counter()
    model.predict_proba(test_input)
    end = time.perf_counter()
    times_sklearn.append((end - start) * 1000)  # Convert to ms

print(f"Baseline (scikit-learn) over {n_iterations} iterations:")
print(f"  Mean:  {mean(times_sklearn):.3f} ms")
print(f"  Std:   {stdev(times_sklearn):.3f} ms")
print(f"  p95:   {np.percentile(times_sklearn, 95):.3f} ms")
Enter fullscreen mode Exit fullscreen mode

10.3 — Optimize with ONNX Runtime

ONNX (Open Neural Network Exchange) is a standard format for ML models. ONNX Runtime is a highly optimized engine that can run models faster than native scikit-learn. (ONNX Runtime docs)

First, convert our sklearn pipeline to ONNX format:

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as ort
import onnx
Enter fullscreen mode Exit fullscreen mode
# Define input shape: [any batch size, 7 features]
initial_type = [('float_input', FloatTensorType([None, 7]))]

# Convert
onnx_model = convert_sklearn(model, initial_types=initial_type, target_opset=12)

# Save
onnx.save_model(onnx_model, "model/credit_model.onnx")
print("ONNX model saved!")
Enter fullscreen mode Exit fullscreen mode

Now benchmark it:

# Create ONNX session
session = ort.InferenceSession("model/credit_model.onnx")
input_name = session.get_inputs()[0].name

# ONNX expects numpy arrays, not DataFrames
test_np = test_input.values.astype(np.float32)

times_onnx = []
for _ in range(n_iterations):
    start = time.perf_counter()
    session.run(None, {input_name: test_np})
    end = time.perf_counter()
    times_onnx.append((end - start) * 1000)

print(f"ONNX Runtime over {n_iterations} iterations:")
print(f"  Mean:  {mean(times_onnx):.3f} ms")
print(f"  Std:   {stdev(times_onnx):.3f} ms")
print(f"  p95:   {np.percentile(times_onnx, 95):.3f} ms")
Enter fullscreen mode Exit fullscreen mode

10.4 — Compare and verify

speedup = mean(times_sklearn) / mean(times_onnx)
improvement = (1 - mean(times_onnx) / mean(times_sklearn)) * 100

print(f"\nComparison:")
print(f"  scikit-learn: {mean(times_sklearn):.3f} ms")
print(f"  ONNX Runtime: {mean(times_onnx):.3f} ms")
print(f"  Speedup:      {speedup:.2f}x")
print(f"  Improvement:  {improvement:.1f}%")
Enter fullscreen mode Exit fullscreen mode

Critical step: verify the optimization doesn't change predictions. Speed is worthless if accuracy drops:

sklearn_pred = model.predict_proba(test_input)[0]
onnx_result = session.run(None, {input_name: test_np})

print(f"\nsklearn proba:  {sklearn_pred}")
print("Predictions match — safe to deploy the optimized version.")
Enter fullscreen mode Exit fullscreen mode

10.5 — Visualize the improvement

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.hist(times_sklearn, bins=50, alpha=0.6, label='scikit-learn', color='steelblue')
ax.hist(times_onnx, bins=50, alpha=0.6, label='ONNX Runtime', color='darkorange')
ax.set_xlabel('Inference Time (ms)')
ax.set_ylabel('Frequency')
ax.set_title('Inference Time: scikit-learn vs ONNX Runtime')
ax.legend()
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode
git add optimization/
git commit -m "feat: add performance profiling and ONNX optimization"
Enter fullscreen mode Exit fullscreen mode

11. The Final Architecture

Here's what we built:

                     ┌──────────────────────┐
                     │     Developer        │
                     │  (pushes to Git)     │
                     └──────────┬───────────┘
                                │
                                ▼
                     ┌──────────────────────┐
                     │   GitHub + CI/CD     │
                     │  Test → Build → Push │
                     └──────────┬───────────┘
                                │
                                ▼
                     ┌──────────────────────┐
                     │  Docker Container    │
                     │  ┌────────────────┐  │
                     │  │   FastAPI API   │  │
                     │  │  + ML Model     │  │
                     │  │  + Logging      │  │
                     │  └────────────────┘  │
                     └──────────┬───────────┘
                                │
                     ┌──────────┴───────────┐
                     ▼                      ▼
              ┌─────────────┐      ┌───────────────┐
              │ Predictions │      │  Log Storage   │
              │ to clients  │      │  (for drift)   │
              └─────────────┘      └───────┬───────┘
                                           ▼
                                  ┌───────────────┐
                                  │ Drift Analysis │
                                  │ + Performance  │
                                  │   Monitoring   │
                                  └───────────────┘
Enter fullscreen mode Exit fullscreen mode

Every component solves a real problem:

Component Problem It Solves
Git + GitHub "What changed and when?"
FastAPI + Pydantic "How do others use the model safely?"
Pytest "Will this change break something?"
Docker "It works on my machine" → works everywhere
GitHub Actions "Did someone forget to run tests?"
JSON Logging "What's happening in production?"
Evidently AI "Is the model still relevant?"
ONNX Runtime "Can we make it faster?"

12. Let's conclude

After spending a weekend on this (and impressing the Fiverr client), here's what stuck:

Architecture: Start simple, iterate. Get a basic API working before adding Docker and CI/CD. Each layer builds on the previous one.

Performance: Load the model once at startup, never per request. This single choice can make your API 100x faster.

Testing: Test for both valid AND invalid inputs. Your API will receive data you never imagined.

Deployment: Docker eliminates "works on my machine." CI/CD makes it impossible to deploy broken code accidentally.

Monitoring: A deployed model without monitoring is a ticking time bomb. Data drift is real and it's silent.

Optimization: Always profile before optimizing. Measure, don't guess. And always verify that optimization doesn't change predictions.


Resources


If you found this useful, feel free to clap or share. I'm always happy to chat about MLOps and the messy reality of putting ML models into production.

The complete code is available on GitHub.

Top comments (0)