If you build credit models, you probably treat FICO as your primary signal. You are not wrong, exactly, but you are almost certainly missing the highest-value improvement available to you. For your best borrowers, the ones above 720, FICO is already priced in. The risk that matters in that segment is somewhere else entirely.
That somewhere else is debt-to-income ratio. And the way to see it is SHAP.
The Setup
The analysis runs on 50,000 synthetic consumer installment loans, calibrated to Lending Club historical distributions with a fixed seed (42) and a roughly 15% default rate. That calibration matters: the findings hold up against real-world portfolio shapes, not toy data.
Three models were compared: Logistic Regression, Random Forest, and Gradient Boosting. The comparison is deliberate. Any single model can produce feature importance numbers that look plausible but are artifacts of that model's structure. Running three models side by side, and then applying SHAP values across all three, lets you distinguish genuine signal from modeling quirks.
SHAP (SHapley Additive exPlanations) is the right tool here, not standard feature importance. Feature importance tells you which features the model uses most. SHAP tells you how each feature pushes each individual prediction higher or lower, with a sign. You can segment the SHAP values by any borrower characteristic, which is the only way to surface the finding described below.
The Finding
Across the full lending book, FICO dominates. It has the highest mean absolute SHAP value. This is expected. FICO is a compressed summary of payment history, credit utilization, length of history, and several other factors. Of course it predicts default.
But segment the portfolio to FICO 720 and above, and the picture changes. In that segment, DTI ratio becomes the dominant predictor, explaining roughly 38% of default variance. FICO drops.
The SHAP beeswarm plot makes this concrete. For the full population, FICO values fan out widely on both sides of zero, meaning high and low FICO scores are both doing significant explanatory work. In the 720+ segment, those FICO dots compress toward zero. The DTI dots spread out instead.
Why does this happen? Prime borrowers have already passed a FICO floor. The lender screened on FICO, so FICO variance in the approved pool is low. When variance in a feature is low, that feature cannot explain much of the outcome variance. What is left? Income and debt load. A borrower with a 740 FICO and a 44% DTI is meaningfully different from a borrower with a 740 FICO and a 22% DTI, but FICO cannot see that distinction. DTI can.
The practical implication is that lenders who screen only on FICO are systematically underestimating the tail risk sitting above their prime cutoff.
The Dollar Math
For a $100M consumer installment portfolio, the numbers are specific enough to put in a business case.
Tightening DTI thresholds in the 720-760 FICO band from 45% to 38% reduces annual expected losses by an estimated $600K-$900K. The assumptions: 15% portfolio default rate, 40% loss given default, and approximately 38% of defaults in this band being DTI-driven rather than FICO-driven.
The action does not require changing the FICO cutoff, and it does not decline additional applications. It re-weights the approval decision within the existing prime segment. This is a policy parameter change, not a model change. It can go into effect without a model validation cycle.
That combination -- six-figure loss reduction with no new model, no approval volume impact, and no model governance overhead -- is rare. Most credit improvement levers require tradeoffs.
The Regulatory Angle
There is a fair lending dimension worth paying attention to.
FICO score can act as a proxy for protected class characteristics. This is well documented. ECOA (Reg B) and the Fair Housing Act require that adverse action be based on neutral, income-related factors wherever possible. DTI is precisely that: it measures a borrower's actual debt load relative to income. It is not a protected characteristic, and it does not have the proxy concerns that FICO carries.
A DTI-first screening approach in the prime band is also more defensible under a fair lending examination. If a regulator asks why you denied a 725 FICO borrower, "DTI of 46% exceeds our prime-band threshold of 38%" is a cleaner answer than any explanation that depends on the FICO composite. Examiners know what goes into FICO.
This is not a hypothetical regulatory posture. The CFPB has consistently indicated in supervisory guidance that income-based factors are preferred when they are genuinely predictive. The SHAP analysis confirms that DTI is genuinely predictive in this segment.
How to Reproduce It
The full analysis runs on the Credit Risk Explorer page of the live dashboard at finance-analytics-portfolio.streamlit.app. The source is at github.com/ChunkyTortoise/finance-analytics-portfolio.
To run SHAP on the credit model locally:
import shap
from analysis.credit_models import train_models, CREDIT_FEATURES
models, X_test, y_test = train_models(df)
rf = models["random_forest"]
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test)
# Segment to prime borrowers only
prime_mask = X_test["fico_score"] >= 720
shap.summary_plot(shap_values[1][prime_mask], X_test[prime_mask])
The beeswarm plot that results from the segmented call is where the DTI flip becomes visible.
The Takeaway
Every analyst building a credit model should run SHAP on their high-FICO segment separately, not just on the full portfolio. The full-population feature importance is almost always dominated by FICO, which makes it easy to conclude that FICO is the only signal worth tracking. That conclusion is wrong for your prime borrowers.
DTI is the actionable variable in the segment where it matters most. The improvement is a policy change, not a modeling exercise. And it happens to be the more defensible choice under fair lending scrutiny.
The finding generalizes: any model trained on a population that has already been filtered on a key feature will systematically underweight the variables that matter within that filtered population. SHAP, segmented by the filter criterion, is the fastest way to find those blind spots.
Cayman Roden is a data analyst specializing in financial services analytics. Full analysis at Finance Analytics Portfolio | GitHub
Top comments (0)