Erick Mwangi Muguchia

Posted on May 9

Lending's Old Faithful: How a 1958 Breakthrough Still Holds Off the AI Rush

#statistics #machinelearning #datascience

Series: Building an Explainable AI Underwriter

This is Part 2.

↜ Part 1: Why Explainable AI Matters in Underwriting

↝ Part 3: Human Intervention Thresholds in AI Decision (coming)

In 1958, David Cox published the logistic regression model. That same year, the first FICO score went into production. Sixty‑six years later, a loan officer in Des Moines and a regulator in Frankfurt still trust that same equation more than any neural network. This is not nostalgia. It is a verdict on what lending actually needs.

This post continues the series on why generalized linear models (GLMs) with a logit link remain the workhorse of consumer credit, and where they finally break.

Why This Post?

Credit scoring demands interpretability and stability over raw accuracy. While neural networks can push AUC past 0.92, GLMs with a logit link still power most production systems at top banks. This post explains why, and when to break the rule. GLM is not "less AI". It is a different flavor of AI, one that prioritizes transparency.

For Everyone

The problem

Your credit decision must be explainable. To regulators (e.g. CFPB, ECB), to applicants (adverse action notices), and to your own risk committee. A black box that says "denied" is a lawsuit waiting to happen.

The tradeoff

Neural net: 92% AUC (better separation)
GLM (logistic regression): 87% AUC (worse but good enough)

So why do top banks still use GLM?

Because in credit, a 5‑point AUC gain is worthless if you cannot:

Explain why a specific applicant was rejected
Prove no disparate impact (ECOA / Fair Lending)
Audit coefficient stability over 5+ years
Run on a single CPU core for $0.0001 per inference

What This Series Is Not

I am not against black boxes. Neural networks and gradient boosting excel at fraud detection, anomaly spotting, and computer vision. This series argues that for probability of default estimation under regulatory oversight, a transparent GLM is often the better engineering choice. Not the only choice. The better choice for that specific problem.

For Data Scientists

Training a logistic regression

From training data (X, y) with y ∈ {0,1} (default / no default):

import statsmodels.api as sm
# y: 1 = default, 0 = non-default
model = sm.GLM(y, X, family=sm.families.Binomial()).fit()

The model estimates:

log(p/(1-p)) = β₀ + β₁·credit_score + β₂·debt_ratio + ...

Link function: why logit, not probit?

Logit gives odds ratios – directly interpretable for business. Probit is fine but less intuitive. On real credit data (10K loans), logit consistently yields better calibration near 0% and 100% default probabilities.

Coefficient interpretation

A coefficient of 0.15 on credit_score means:

One point increase in credit score multiplies the odds of default by exp(0.15) ≈ 1.16.

So a 50‑point jump → odds multiplied by exp(0.15*50) = exp(7.5) ≈ 1808× (huge – realistic for very low scores).

⚠️ Note: In production credit scoring models, the coefficient on credit score is usually negative, because higher scores mean lower risk of default.

This example uses a positive coefficient purely to illustrate how odds ratios work. The sign depends on how the variable is defined:

If credit_score is coded as a risk score (higher = riskier), the coefficient will be positive.
If it’s coded as a traditional credit score (higher = safer), the coefficient will be negative.

Feature engineering for credit

Income: log(income+1) – turns multiplicative effects into additive
Debt ratio: bin into [0‑10%], [10‑30%], [30%+] – captures non‑linearity
Age: sin(2π·age/78) + cos(2π·age/78) – cyclical life stage effects (the 78 approximates a human lifespan in years for the cycle)

For Actuaries

From probability to expected loss

If GLM gives p_default for a loan of exposure E and loss given default LGD:

EL = p_default × E × LGD

Calibration: Brier score + calibration curves

Brier score = mean squared error of probabilities. For a well‑calibrated GLM on credit data: Brier ≈ 0.03–0.08.

from sklearn.calibration import calibration_curve
prob_true, prob_pred = calibration_curve(y_true, y_proba, n_bins=10)

Plot prob_true vs. prob_pred – should lie near the diagonal.

Bootstrap confidence intervals on coefficients

n_boot = 1000
coefs = []
for _ in range(n_boot):
    idx = np.random.choice(len(X), len(X), replace=True)
    boot_model = sm.GLM(y[idx], X[idx], family=sm.families.Binomial()).fit()
    coefs.append(boot_model.params)
ci_lower, ci_upper = np.percentile(coefs, [2.5, 97.5], axis=0)

Adverse selection drift detection

PSI (Population Stability Index) > 0.1 → feature drift
AUC decay > 5% over 3 months → score drift

For Executives

Regulatory alignment

ECOA / Fair Lending: GLM coefficients directly show whether protected class variables (or proxies like zip code) drive decisions.
FCRA adverse action: Must provide "specific reasons" – GLM top coefficients give those reasons (e.g. "debt ratio too high").

Operational cost

GLM prediction: ~100 microseconds per loan on a CPU → 10,000 predictions/sec on a single core.
Neural net (even small): ~5‑10ms on CPU or GPU → 200‑1000/sec. At scale (millions of applications), that is infrastructure cost ×50.

Risk quantification under stress

Stress scenarios: apply shocked coefficients (e.g. double the debt ratio coefficient) and recompute portfolio expected loss.

portfolio_el = np.sum(p_default * exposure * lgd)
stressed_el = np.sum(sigmoid(log_odds_shocked) * exposure * lgd)

Code Example

import numpy as np

# GLM coefficient interpretation
intercept = 2.5
coef_credit = 0.15
coef_debt = -0.002

def odds_ratio(credit_score, debt_ratio):
    log_odds = intercept + coef_credit * credit_score + coef_debt * debt_ratio
    return np.exp(log_odds)

# Approval probability
def prob_approve(credit_score, debt_ratio):
    log_odds = intercept + coef_credit * credit_score + coef_debt * debt_ratio
    return 1 / (1 + np.exp(-log_odds))

# If credit_score increases by 50 points:
# log_odds increase by 0.15 * 50 = 7.5
# odds increase by exp(7.5) = 1808x

Artifacts to Show

Training data summary: feature distributions, target rate (e.g. 2% default)
Coefficient table with bootstrap 95% CIs:

Feature	Coef	2.5% CI	97.5% CI
credit_score	0.15	0.12	0.18
debt_ratio	-0.00	-0.003	0.001

Calibration curve – actual vs. predicted default rate (plot)
Feature importance via |coef| × feature standard deviation (not model‑agnostic, but production‑happy)

Where This Fits in the Series

Human Intervention Thresholds – When should the AI say "I'm not sure, let a human decide"?
Cox Proportion Hazards – Moving from static PD to time‑to‑default modeling.
Auditability and Traceability – Every decision logged, every coefficient justified.
Ethical AI in African Financial Systems – No credit bureau? Alternative data, but carefully.
Simulations and What‑If Scenarios – Stress testing the underwriter.
Limits of Fully Automated Underwriting – Where algorithms must stop.
Designing Human‑Centered Risk Intelligence Systems – The final synthesis.

A Note from the Author

I wrote this on a laptop with portable WiFi, far from my room, without pen or paper. No familiar aura. Only the machine and memory. If I need to capture a thought, I open another tab. The 1958 algorithm does not care about your setup. It only cares that you can explain your decision.

DEV Community