From RFM Models to Credit Scoring: Python Implementation for Banking AI
Credit approval isn't about "approve or deny" — it's about "how much can this person afford to repay." Traditional approval relies on loan officers' experience: subjective, slow, inconsistent. The first step in AI transformation is turning repayment capacity assessment from a gut call into a calculation.
I've implemented this end-to-end pipeline across multiple retail banking projects. Here's every algorithm with working Python code.
Repayment Capacity Assessment: The Foundation
The core logic chain: monthly income → subtract fixed obligations → disposable income → match monthly payment cap → reverse-calculate credit limit.
Disposable Monthly Income = After-tax Monthly Income - Rent/Mortgage - Social Insurance/Housing Fund - Minimum Living Expenses
Monthly Payment Cap = Disposable Monthly Income × DTI Threshold
Credit Limit = Monthly Payment Cap × Annuity Present Value Factor(Term, Rate)
Let me walk through an example. A customer has after-tax monthly income of 18,000 CNY, existing mortgage payment of 5,000, social insurance and housing fund deduction of 1,200, and minimum living expenses of 3,500. With a DTI threshold of 50%:
- Disposable monthly income = 18,000 - 5,000 - 1,200 - 3,500 = 8,300 CNY
- Monthly payment cap = 8,300 × 50% = 4,150 CNY
- At 5.2% annual rate, 3-year term (36 installments), annuity present value factor ≈ 33.23
- Credit limit ≈ 4,150 × 33.23 ≈ 137,904 CNY
This calculation is fully codable and auditable — the starting point for intelligent credit approval.
Equal Installment Algorithm: From Formula to Code
Equal installment (等额本息) is the dominant repayment method in retail lending. Understanding its mathematical nature is prerequisite for rate pricing, prepayment calculation, and default cost analysis.
Formula Derivation
Let $P$ be the loan principal, $r$ the monthly rate (annual rate / 12), $n$ the number of installments, and $M$ the monthly payment.
At the end of the $k$-th period, the remaining balance is:
$$P(1+r)^k - M[(1+r)^{k-1} + (1+r)^{k-2} + \cdots + 1]$$
Setting the balance at period $n$ to zero:
$$M = \frac{P \cdot r \cdot (1+r)^n}{(1+r)^n - 1}$$
Code Implementation
def calc_equal_installment(principal, annual_rate, months):
"""
Calculate monthly payment for equal installment loans.
principal: loan amount
annual_rate: annual interest rate (e.g. 0.052 for 5.2%)
months: number of installments
"""
r = annual_rate / 12
if r == 0:
return principal / months # Zero-rate: simple split
monthly_payment = (principal * r * (1 + r) ** months) / ((1 + r) ** months - 1)
return monthly_payment
def amortization_schedule(principal, annual_rate, months):
"""
Generate full amortization schedule.
Returns principal, interest, and remaining balance for each period.
"""
r = annual_rate / 12
monthly_payment = calc_equal_installment(principal, annual_rate, months)
balance = principal
schedule = []
for period in range(1, months + 1):
interest = balance * r # Current interest = remaining balance × monthly rate
principal_paid = monthly_payment - interest # Current principal = payment - interest
balance -= principal_paid # Remaining balance decreases
schedule.append({
'period': period,
'payment': round(monthly_payment, 2),
'principal': round(principal_paid, 2),
'interest': round(interest, 2),
'remaining_balance': round(max(balance, 0), 2)
})
return schedule
# Example: 500K consumer loan, 5.2% annual rate, 36 months
result = amortization_schedule(500000, 0.052, 36)
print(f"Monthly payment: {result[0]['payment']} CNY")
print(f"Period 1 - Interest: {result[0]['interest']}, Principal: {result[0]['principal']}")
print(f"Total interest: {sum(r['interest'] for r in result):.2f} CNY")
Prepayment Calculation
def prepayment_savings(principal, annual_rate, months, prepay_month, prepay_amount):
"""
Calculate interest savings from prepayment.
prepay_month: the period when prepayment occurs
prepay_amount: additional principal paid
"""
schedule = amortization_schedule(principal, annual_rate, months)
# Original total interest
total_interest_original = sum(r['interest'] for r in schedule)
# After prepayment, remaining balance decreases — recalculate subsequent payments
balance_at_prepay = schedule[prepay_month - 1]['remaining_balance'] - prepay_amount
remaining_months = months - prepay_month
# New monthly payment for remaining term
new_schedule = amortization_schedule(balance_at_prepay, annual_rate, remaining_months)
total_interest_new = sum(r['interest'] for r in schedule[:prepay_month]) + \
sum(r['interest'] for r in new_schedule)
savings = total_interest_original - total_interest_new
return round(savings, 2)
# Example: Prepay 50K at period 12
saving = prepayment_savings(500000, 0.052, 36, 12, 50000)
print(f"Prepayment interest savings: {saving} CNY")
Key insight: With equal installment loans, interest proportion is high early on and principal proportion is low. Prepayment savings diminish over time — maximum benefit in the first period, negligible in the last few.
Credit Scoring Model: The Engine for Automated Decisions
Credit scoring compresses multi-dimensional risk features into a single number, driving approval automation. When the score exceeds a threshold (e.g., 88), the system automatically approves loan applications within a certain limit — no human intervention needed.
Scorecard Modeling Pipeline
Raw Features → WOE Encoding → Logistic Regression → Probability Mapping → Standard Score (300-850)
WOE (Weight of Evidence) transforms the good/bad ratio within each feature bin into a log-likelihood ratio:
$$WOE_i = \ln\left(\frac{Good_i / Good_{total}}{Bad_i / Bad_{total}}\right)$$
IV (Information Value) measures a feature's discriminative power:
$$IV = \sum_{i} (Good_i/Good_{total} - Bad_i/Bad_{total}) \times WOE_i$$
IV > 0.3 = strong feature, 0.1-0.3 = moderate, < 0.02 = can be dropped.
Code Implementation
import numpy as np
def calc_woe_iv(df, feature, target):
"""
Calculate WOE and IV for a feature.
df: DataFrame, feature: column name, target: label column (1=bad, 0=good)
"""
total_good = (df[target] == 0).sum()
total_bad = (df[target] == 1).sum()
woe_dict = {}
iv = 0.0
for bucket in df[feature].unique():
n_good = ((df[feature] == bucket) & (df[target] == 0)).sum()
n_bad = ((df[feature] == bucket) & (df[target] == 1)).sum()
# Smoothing factor to prevent division by zero
ratio_good = (n_good + 0.5) / total_good
ratio_bad = (n_bad + 0.5) / total_bad
woe = np.log(ratio_good / ratio_bad)
woe_dict[bucket] = woe
iv += (ratio_good - ratio_bad) * woe
return woe_dict, iv
def score_from_probability(prob_bad, base_score=650, pdo=50):
"""
Map default probability to standard credit score.
base_score: score at 1:1 odds
pdo: points to double the odds (PDO)
"""
factor = pdo / np.log(2)
offset = base_score - factor * np.log(1)
odds = (1 - prob_bad) / prob_bad
score = offset + factor * np.log(odds)
return round(score)
def auto_approve(score, threshold=88, max_amount=500000):
"""
Automated approval decision.
score: credit score (0-100 scale)
threshold: auto-approval threshold
max_amount: maximum auto-approved amount
"""
if score >= threshold:
# High-score customer: auto-approve, amount linearly mapped from score
approved_amount = max_amount * (score / 100)
return {'decision': 'auto_approved', 'amount': round(approved_amount), 'manual_review': False}
elif score >= 60:
return {'decision': 'manual_review', 'amount': None, 'manual_review': True}
else:
return {'decision': 'rejected', 'amount': 0, 'manual_review': False}
# Example: 88-point customer, auto-approved
result = auto_approve(88)
print(f"Decision: {result['decision']}, Amount: {result['amount']} CNY")
When a credit score of 88 auto-approves 500K, the essence is decision front-loading based on quantified risk — when model confidence is high enough, human intervention actually increases operational risk.
RFM + AI: Engineering Customer Segmentation
The RFM model is the classic framework for retail banking customer segmentation, quantifying customer value across three dimensions:
- R (Recency): Days since last transaction — smaller = more active
- F (Frequency): Transaction count in a period — higher = more engaged
- M (Monetary): Transaction amount in a period — higher = more valuable
Traditional RFM Limitations and AI Enhancement
Traditional RFM uses median splits on each dimension (above = 1, below = 0) to form 8 groups. Three problems: threshold choice is too crude, equal-weight across dimensions ignores business context, and it can't discover non-linear segmentation patterns.
AI enhancement: K-Means clustering replaces hard binning, feature weighting embeds business priors, and silhouette scores automatically determine optimal cluster count.
Code Implementation
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
def compute_rfm(transactions, customer_id_col='customer_id',
date_col='date', amount_col='amount', reference_date=None):
"""
Compute RFM metrics from transaction records.
transactions: transaction DataFrame
reference_date: reference date (defaults to max transaction date + 1 day)
"""
import pandas as pd
if reference_date is None:
reference_date = transactions[date_col].max() + pd.Timedelta(days=1)
rfm = transactions.groupby(customer_id_col).agg(
Recency=(date_col, lambda x: (reference_date - x.max()).days),
Frequency=(date_col, 'count'),
Monetary=(amount_col, 'sum')
).reset_index()
return rfm
def rfm_cluster(rfm_df, n_clusters=5, weights=None):
"""
K-Means-based RFM customer segmentation.
weights: business weights [R, F, M], default equal weight
"""
features = rfm_df[['Recency', 'Frequency', 'Monetary']].copy()
# Standardize
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
# Apply business weights
if weights:
features_scaled = features_scaled * np.array(weights)
# Cluster
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
rfm_df['Cluster'] = kmeans.fit_predict(features_scaled)
# Generate business labels for each cluster
cluster_stats = rfm_df.groupby('Cluster')[['Recency', 'Frequency', 'Monetary']].mean()
labels = {}
for idx, row in cluster_stats.iterrows():
if row['Recency'] < cluster_stats['Recency'].median() and \
row['Monetary'] > cluster_stats['Monetary'].median():
labels[idx] = 'High-Value Active'
elif row['Frequency'] > cluster_stats['Frequency'].median():
labels[idx] = 'High-Freq Low-Value'
elif row['Recency'] > cluster_stats['Recency'].median():
labels[idx] = 'Churn Risk'
else:
labels[idx] = 'Growth Potential'
rfm_df['Segment'] = rfm_df['Cluster'].map(labels)
return rfm_df, cluster_stats
def silhouette_score_optimal(rfm_df, k_range=range(3, 9)):
"""
Automatically select optimal cluster count using silhouette score.
"""
from sklearn.metrics import silhouette_score
scaler = StandardScaler()
features = scaler.fit_transform(
rfm_df[['Recency', 'Frequency', 'Monetary']]
)
scores = {}
for k in k_range:
km = KMeans(n_clusters=k, random_state=42, n_init=10)
labels = km.fit_predict(features)
scores[k] = silhouette_score(features, labels)
best_k = max(scores, key=scores.get)
return best_k, scores
# Business weight example: Banking M (assets) matters more
# weights = [0.8, 1.0, 1.5] # R=0.8, F=1.0, M=1.5
RFM + K-Means isn't just swapping binning methods. It lets segmentation be driven by data shape — silhouette score auto-selects k, feature weights embed business priors. Together: "data speaks, business steers."
Retail Marketing Intelligence: Precision Recommendation and AUM Growth
Customer segmentation is the means; precision marketing and AUM growth are the goal. Linking segmentation results with a recommendation engine creates a closed loop: segment → recommend → reach → feedback.
Segment-Driven Differentiated Strategies
| Segment | Core Strategy | Recommended Products | Channel | AUM Target |
|---|---|---|---|---|
| High-Value Active | Retain + upgrade | Private banking, family trust | Dedicated relationship manager | Maintain stability |
| High-Freq Low-Value | Drive AUM up | Fund SIP, large CDs | App popup + SMS | +30% |
| Growth Potential | Activate + cross-sell | Credit cards, consumer loans | In-app recommendations | +50% |
| Churn Risk | Recall + retain | High-yield deposits, exclusive perks | Phone + WeChat | Stop decline |
Item-Based Collaborative Filtering
def item_based_recommend(customer_segment, product_matrix, top_n=3):
"""
Item-similarity-based product recommendation.
customer_segment: segment label
product_matrix: product-feature matrix (products × features)
top_n: number of products to recommend
"""
from sklearn.metrics.pairwise import cosine_similarity
# Compute product-to-product similarity
sim_matrix = cosine_similarity(product_matrix)
# Get seed products (highest持有率 for this segment)
segment_products = segment_seed_products[customer_segment]
scores = {}
for seed in segment_products:
seed_idx = product_index[seed]
for j, prod in enumerate(product_index.keys()):
if prod not in segment_products:
scores[prod] = scores.get(prod, 0) + sim_matrix[seed_idx][j]
# Sort and return Top N
recommended = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
return recommended
def aum_uplift_estimate(current_aum, segment, treatment_effect):
"""
AUM uplift estimation.
treatment_effect: incremental effect per segment (from historical A/B tests)
"""
expected_uplift = current_aum * treatment_effect.get(segment, 0)
return round(expected_uplift, 2)
# Per-segment estimated incremental effects (based on historical A/B tests)
segment_effects = {
'High-Value Active': 0.03, # 3% stabilization
'High-Freq Low-Value': 0.30, # 30% uplift
'Growth Potential': 0.50, # 50% uplift
'Churn Risk': -0.10 # Stop 10% decline
}
AUM growth isn't about doing the same thing for everyone — it's about doing different things for different segments. Growth potential customers: lift AUM. High-frequency customers: drive large tickets. Churn-risk customers: stabilize first.
Wealth Management AI: Asset Allocation to Retirement Planning
Markowitz Mean-Variance Optimization
import numpy as np
from scipy.optimize import minimize
def portfolio_optimize(expected_returns, cov_matrix, target_risk=None,
target_return=None):
"""
Mean-variance portfolio optimization.
expected_returns: expected return for each asset
cov_matrix: return covariance matrix
target_risk: target volatility (alternative to target_return)
"""
n = len(expected_returns)
def portfolio_volatility(w):
return np.sqrt(w @ cov_matrix @ w)
def portfolio_return(w):
return w @ expected_returns
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(0, 1)] * n # No short selling
if target_risk is not None:
# Maximize return at given risk level
constraints.append({
'type': 'ineq',
'fun': lambda w: target_risk - portfolio_volatility(w)
})
objective = lambda w: -portfolio_return(w)
else:
# Minimize risk
objective = lambda w: portfolio_volatility(w)
if target_return is not None:
constraints.append({
'type': 'eq',
'fun': lambda w: portfolio_return(w) - target_return
})
result = minimize(objective, np.ones(n) / n, method='SLSQP',
bounds=bounds, constraints=constraints)
return result.x
# Example: Stock/Bond/Commodity three-asset allocation
returns = np.array([0.08, 0.04, 0.05])
cov = np.array([
[0.04, 0.005, 0.01],
[0.005, 0.01, 0.003],
[0.01, 0.003, 0.02]
])
weights = portfolio_optimize(returns, cov, target_risk=0.10)
print(f"Optimal allocation: Equity {weights[0]:.1%}, Bonds {weights[1]:.1%}, Commodities {weights[2]:.1%}")
Retirement Planning Model
def retirement_projection(current_age, retire_age, current_savings,
monthly_contribution, annual_return,
inflation_rate, monthly_expense_at_retire):
"""
Retirement planning calculator.
Returns total assets at retirement and years sustainable.
"""
months_to_retire = (retire_age - current_age) * 12
r = annual_return / 12
inf = inflation_rate / 12
# Accumulated savings: compound growth + regular contributions
savings_at_retire = current_savings * (1 + r) ** months_to_retire + \
monthly_contribution * ((1 + r) ** months_to_retire - 1) / r
# Post-retirement monthly expense (inflation-adjusted)
real_monthly_expense = monthly_expense_at_retire * (1 + inf) ** months_to_retire
# Calculate sustainable years
balance = savings_at_retire
months_sustained = 0
while balance > 0 and months_sustained < 12 * 50:
balance = balance * (1 + r) - real_monthly_expense
months_sustained += 1
return {
'total_at_retirement': round(savings_at_retire, 0),
'monthly_expense_retirement': round(real_monthly_expense, 0),
'years_sustainable': round(months_sustained / 12, 1)
}
# Example: 35-year-old customer, retiring at 60
proj = retirement_projection(
current_age=35, retire_age=60,
current_savings=300000,
monthly_contribution=8000,
annual_return=0.06,
inflation_rate=0.03,
monthly_expense_at_retire=8000
)
print(proj)
Wealth management AI doesn't make decisions for customers — it turns the vague "will I have enough to retire?" into a quantifiable "at current pace, your savings sustain X years." Visualization beats prediction.
The Complete Picture: From Algorithms to Business Closed Loops
| Business Scenario | Core Algorithm | Key Output | Business Value |
|---|---|---|---|
| Repayment capacity | DTI + annuity PV | Credit limit | Approval standardization |
| Equal installment | Annuity formula | Payment schedule | Rate pricing foundation |
| Credit scoring | WOE + logistic regression | Standard score → auto-approval | 88 points auto-approves 500K |
| Customer segmentation | RFM + K-Means | Segment labels | Differentiated management |
| Precision recommendation | Collaborative filtering + segments | Product Top N | 30-50% AUM uplift |
| Asset allocation | Mean-variance optimization | Portfolio weights | Max return at controlled risk |
| Retirement planning | Compound + inflation model | Sustainable years | Long-term financial security |
These seven algorithms form three business threads:
- Credit Thread: repayment assessment → equal installment → credit scoring → auto-approval
- Retail Thread: RFM segmentation → precision recommendation → AUM uplift
- Wealth Thread: asset allocation → retirement planning → long-term companionship
All three threads share a unified data foundation: consolidated customer profiles and transaction flows. That's banking's natural AI advantage — complete data, closed business loops, clear scenarios.
The algorithms aren't the point. The point is that each algorithm serves a specific business node, and the nodes connect through shared data and features. That's what turns "AI demos" into "AI that ships."
Top comments (0)