兆鹏于

Posted on Jun 29

Banking Retail & Credit AI in Practice: From RFM Models to Credit Algorithms — A Complete Code Walkthrough

#machinelearning #tutorial #ai #python

From RFM Models to Credit Scoring: Python Implementation for Banking AI

Credit approval isn't about "approve or deny" — it's about "how much can this person afford to repay." Traditional approval relies on loan officers' experience: subjective, slow, inconsistent. The first step in AI transformation is turning repayment capacity assessment from a gut call into a calculation.

I've implemented this end-to-end pipeline across multiple retail banking projects. Here's every algorithm with working Python code.

Repayment Capacity Assessment: The Foundation

The core logic chain: monthly income → subtract fixed obligations → disposable income → match monthly payment cap → reverse-calculate credit limit.

Disposable Monthly Income = After-tax Monthly Income - Rent/Mortgage - Social Insurance/Housing Fund - Minimum Living Expenses
Monthly Payment Cap = Disposable Monthly Income × DTI Threshold
Credit Limit = Monthly Payment Cap × Annuity Present Value Factor(Term, Rate)

Let me walk through an example. A customer has after-tax monthly income of 18,000 CNY, existing mortgage payment of 5,000, social insurance and housing fund deduction of 1,200, and minimum living expenses of 3,500. With a DTI threshold of 50%:

Disposable monthly income = 18,000 - 5,000 - 1,200 - 3,500 = 8,300 CNY
Monthly payment cap = 8,300 × 50% = 4,150 CNY
At 5.2% annual rate, 3-year term (36 installments), annuity present value factor ≈ 33.23
Credit limit ≈ 4,150 × 33.23 ≈ 137,904 CNY

This calculation is fully codable and auditable — the starting point for intelligent credit approval.

Equal Installment Algorithm: From Formula to Code

Equal installment (等额本息) is the dominant repayment method in retail lending. Understanding its mathematical nature is prerequisite for rate pricing, prepayment calculation, and default cost analysis.

Formula Derivation

Let $P$ be the loan principal, $r$ the monthly rate (annual rate / 12), $n$ the number of installments, and $M$ the monthly payment.

At the end of the $k$-th period, the remaining balance is:

$$P(1+r)^k - M[(1+r)^{k-1} + (1+r)^{k-2} + \cdots + 1]$$

Setting the balance at period $n$ to zero:

$$M = \frac{P \cdot r \cdot (1+r)^n}{(1+r)^n - 1}$$

Code Implementation

def calc_equal_installment(principal, annual_rate, months):
    """
    Calculate monthly payment for equal installment loans.
    principal: loan amount
    annual_rate: annual interest rate (e.g. 0.052 for 5.2%)
    months: number of installments
    """
    r = annual_rate / 12
    if r == 0:
        return principal / months  # Zero-rate: simple split

    monthly_payment = (principal * r * (1 + r) ** months) / ((1 + r) ** months - 1)
    return monthly_payment

def amortization_schedule(principal, annual_rate, months):
    """
    Generate full amortization schedule.
    Returns principal, interest, and remaining balance for each period.
    """
    r = annual_rate / 12
    monthly_payment = calc_equal_installment(principal, annual_rate, months)
    balance = principal
    schedule = []

    for period in range(1, months + 1):
        interest = balance * r                    # Current interest = remaining balance × monthly rate
        principal_paid = monthly_payment - interest  # Current principal = payment - interest
        balance -= principal_paid                  # Remaining balance decreases
        schedule.append({
            'period': period,
            'payment': round(monthly_payment, 2),
            'principal': round(principal_paid, 2),
            'interest': round(interest, 2),
            'remaining_balance': round(max(balance, 0), 2)
        })
    return schedule

# Example: 500K consumer loan, 5.2% annual rate, 36 months
result = amortization_schedule(500000, 0.052, 36)
print(f"Monthly payment: {result[0]['payment']} CNY")
print(f"Period 1 - Interest: {result[0]['interest']}, Principal: {result[0]['principal']}")
print(f"Total interest: {sum(r['interest'] for r in result):.2f} CNY")

Prepayment Calculation

def prepayment_savings(principal, annual_rate, months, prepay_month, prepay_amount):
    """
    Calculate interest savings from prepayment.
    prepay_month: the period when prepayment occurs
    prepay_amount: additional principal paid
    """
    schedule = amortization_schedule(principal, annual_rate, months)
    # Original total interest
    total_interest_original = sum(r['interest'] for r in schedule)
    # After prepayment, remaining balance decreases — recalculate subsequent payments
    balance_at_prepay = schedule[prepay_month - 1]['remaining_balance'] - prepay_amount
    remaining_months = months - prepay_month
    # New monthly payment for remaining term
    new_schedule = amortization_schedule(balance_at_prepay, annual_rate, remaining_months)
    total_interest_new = sum(r['interest'] for r in schedule[:prepay_month]) + \
                         sum(r['interest'] for r in new_schedule)
    savings = total_interest_original - total_interest_new
    return round(savings, 2)

# Example: Prepay 50K at period 12
saving = prepayment_savings(500000, 0.052, 36, 12, 50000)
print(f"Prepayment interest savings: {saving} CNY")

Key insight: With equal installment loans, interest proportion is high early on and principal proportion is low. Prepayment savings diminish over time — maximum benefit in the first period, negligible in the last few.

Credit Scoring Model: The Engine for Automated Decisions

Credit scoring compresses multi-dimensional risk features into a single number, driving approval automation. When the score exceeds a threshold (e.g., 88), the system automatically approves loan applications within a certain limit — no human intervention needed.

Scorecard Modeling Pipeline

Raw Features → WOE Encoding → Logistic Regression → Probability Mapping → Standard Score (300-850)

WOE (Weight of Evidence) transforms the good/bad ratio within each feature bin into a log-likelihood ratio:

$$WOE_i = \ln\left(\frac{Good_i / Good_{total}}{Bad_i / Bad_{total}}\right)$$

IV (Information Value) measures a feature's discriminative power:

$$IV = \sum_{i} (Good_i/Good_{total} - Bad_i/Bad_{total}) \times WOE_i$$

IV > 0.3 = strong feature, 0.1-0.3 = moderate, < 0.02 = can be dropped.

Code Implementation

import numpy as np

def calc_woe_iv(df, feature, target):
    """
    Calculate WOE and IV for a feature.
    df: DataFrame, feature: column name, target: label column (1=bad, 0=good)
    """
    total_good = (df[target] == 0).sum()
    total_bad = (df[target] == 1).sum()

    woe_dict = {}
    iv = 0.0

    for bucket in df[feature].unique():
        n_good = ((df[feature] == bucket) & (df[target] == 0)).sum()
        n_bad = ((df[feature] == bucket) & (df[target] == 1)).sum()

        # Smoothing factor to prevent division by zero
        ratio_good = (n_good + 0.5) / total_good
        ratio_bad = (n_bad + 0.5) / total_bad
        woe = np.log(ratio_good / ratio_bad)
        woe_dict[bucket] = woe
        iv += (ratio_good - ratio_bad) * woe

    return woe_dict, iv

def score_from_probability(prob_bad, base_score=650, pdo=50):
    """
    Map default probability to standard credit score.
    base_score: score at 1:1 odds
    pdo: points to double the odds (PDO)
    """
    factor = pdo / np.log(2)
    offset = base_score - factor * np.log(1)
    odds = (1 - prob_bad) / prob_bad
    score = offset + factor * np.log(odds)
    return round(score)

def auto_approve(score, threshold=88, max_amount=500000):
    """
    Automated approval decision.
    score: credit score (0-100 scale)
    threshold: auto-approval threshold
    max_amount: maximum auto-approved amount
    """
    if score >= threshold:
        # High-score customer: auto-approve, amount linearly mapped from score
        approved_amount = max_amount * (score / 100)
        return {'decision': 'auto_approved', 'amount': round(approved_amount), 'manual_review': False}
    elif score >= 60:
        return {'decision': 'manual_review', 'amount': None, 'manual_review': True}
    else:
        return {'decision': 'rejected', 'amount': 0, 'manual_review': False}

# Example: 88-point customer, auto-approved
result = auto_approve(88)
print(f"Decision: {result['decision']}, Amount: {result['amount']} CNY")

When a credit score of 88 auto-approves 500K, the essence is decision front-loading based on quantified risk — when model confidence is high enough, human intervention actually increases operational risk.

RFM + AI: Engineering Customer Segmentation

The RFM model is the classic framework for retail banking customer segmentation, quantifying customer value across three dimensions:

R (Recency): Days since last transaction — smaller = more active
F (Frequency): Transaction count in a period — higher = more engaged
M (Monetary): Transaction amount in a period — higher = more valuable

Traditional RFM Limitations and AI Enhancement

Traditional RFM uses median splits on each dimension (above = 1, below = 0) to form 8 groups. Three problems: threshold choice is too crude, equal-weight across dimensions ignores business context, and it can't discover non-linear segmentation patterns.

AI enhancement: K-Means clustering replaces hard binning, feature weighting embeds business priors, and silhouette scores automatically determine optimal cluster count.

Code Implementation

import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

def compute_rfm(transactions, customer_id_col='customer_id',
                date_col='date', amount_col='amount', reference_date=None):
    """
    Compute RFM metrics from transaction records.
    transactions: transaction DataFrame
    reference_date: reference date (defaults to max transaction date + 1 day)
    """
    import pandas as pd
    if reference_date is None:
        reference_date = transactions[date_col].max() + pd.Timedelta(days=1)

    rfm = transactions.groupby(customer_id_col).agg(
        Recency=(date_col, lambda x: (reference_date - x.max()).days),
        Frequency=(date_col, 'count'),
        Monetary=(amount_col, 'sum')
    ).reset_index()

    return rfm

def rfm_cluster(rfm_df, n_clusters=5, weights=None):
    """
    K-Means-based RFM customer segmentation.
    weights: business weights [R, F, M], default equal weight
    """
    features = rfm_df[['Recency', 'Frequency', 'Monetary']].copy()

    # Standardize
    scaler = StandardScaler()
    features_scaled = scaler.fit_transform(features)

    # Apply business weights
    if weights:
        features_scaled = features_scaled * np.array(weights)

    # Cluster
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    rfm_df['Cluster'] = kmeans.fit_predict(features_scaled)

    # Generate business labels for each cluster
    cluster_stats = rfm_df.groupby('Cluster')[['Recency', 'Frequency', 'Monetary']].mean()
    labels = {}
    for idx, row in cluster_stats.iterrows():
        if row['Recency'] < cluster_stats['Recency'].median() and \
           row['Monetary'] > cluster_stats['Monetary'].median():
            labels[idx] = 'High-Value Active'
        elif row['Frequency'] > cluster_stats['Frequency'].median():
            labels[idx] = 'High-Freq Low-Value'
        elif row['Recency'] > cluster_stats['Recency'].median():
            labels[idx] = 'Churn Risk'
        else:
            labels[idx] = 'Growth Potential'

    rfm_df['Segment'] = rfm_df['Cluster'].map(labels)
    return rfm_df, cluster_stats

def silhouette_score_optimal(rfm_df, k_range=range(3, 9)):
    """
    Automatically select optimal cluster count using silhouette score.
    """
    from sklearn.metrics import silhouette_score
    scaler = StandardScaler()
    features = scaler.fit_transform(
        rfm_df[['Recency', 'Frequency', 'Monetary']]
    )

    scores = {}
    for k in k_range:
        km = KMeans(n_clusters=k, random_state=42, n_init=10)
        labels = km.fit_predict(features)
        scores[k] = silhouette_score(features, labels)

    best_k = max(scores, key=scores.get)
    return best_k, scores

# Business weight example: Banking M (assets) matters more
# weights = [0.8, 1.0, 1.5]  # R=0.8, F=1.0, M=1.5

RFM + K-Means isn't just swapping binning methods. It lets segmentation be driven by data shape — silhouette score auto-selects k, feature weights embed business priors. Together: "data speaks, business steers."

Retail Marketing Intelligence: Precision Recommendation and AUM Growth

Customer segmentation is the means; precision marketing and AUM growth are the goal. Linking segmentation results with a recommendation engine creates a closed loop: segment → recommend → reach → feedback.

Segment-Driven Differentiated Strategies

Segment	Core Strategy	Recommended Products	Channel	AUM Target
High-Value Active	Retain + upgrade	Private banking, family trust	Dedicated relationship manager	Maintain stability
High-Freq Low-Value	Drive AUM up	Fund SIP, large CDs	App popup + SMS	+30%
Growth Potential	Activate + cross-sell	Credit cards, consumer loans	In-app recommendations	+50%
Churn Risk	Recall + retain	High-yield deposits, exclusive perks	Phone + WeChat	Stop decline

Item-Based Collaborative Filtering

def item_based_recommend(customer_segment, product_matrix, top_n=3):
    """
    Item-similarity-based product recommendation.
    customer_segment: segment label
    product_matrix: product-feature matrix (products × features)
    top_n: number of products to recommend
    """
    from sklearn.metrics.pairwise import cosine_similarity

    # Compute product-to-product similarity
    sim_matrix = cosine_similarity(product_matrix)

    # Get seed products (highest持有率 for this segment)
    segment_products = segment_seed_products[customer_segment]

    scores = {}
    for seed in segment_products:
        seed_idx = product_index[seed]
        for j, prod in enumerate(product_index.keys()):
            if prod not in segment_products:
                scores[prod] = scores.get(prod, 0) + sim_matrix[seed_idx][j]

    # Sort and return Top N
    recommended = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
    return recommended

def aum_uplift_estimate(current_aum, segment, treatment_effect):
    """
    AUM uplift estimation.
    treatment_effect: incremental effect per segment (from historical A/B tests)
    """
    expected_uplift = current_aum * treatment_effect.get(segment, 0)
    return round(expected_uplift, 2)

# Per-segment estimated incremental effects (based on historical A/B tests)
segment_effects = {
    'High-Value Active': 0.03,   # 3% stabilization
    'High-Freq Low-Value': 0.30, # 30% uplift
    'Growth Potential': 0.50,    # 50% uplift
    'Churn Risk': -0.10          # Stop 10% decline
}

AUM growth isn't about doing the same thing for everyone — it's about doing different things for different segments. Growth potential customers: lift AUM. High-frequency customers: drive large tickets. Churn-risk customers: stabilize first.

Wealth Management AI: Asset Allocation to Retirement Planning

Markowitz Mean-Variance Optimization

import numpy as np
from scipy.optimize import minimize

def portfolio_optimize(expected_returns, cov_matrix, target_risk=None,
                       target_return=None):
    """
    Mean-variance portfolio optimization.
    expected_returns: expected return for each asset
    cov_matrix: return covariance matrix
    target_risk: target volatility (alternative to target_return)
    """
    n = len(expected_returns)

    def portfolio_volatility(w):
        return np.sqrt(w @ cov_matrix @ w)

    def portfolio_return(w):
        return w @ expected_returns

    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    bounds = [(0, 1)] * n  # No short selling

    if target_risk is not None:
        # Maximize return at given risk level
        constraints.append({
            'type': 'ineq',
            'fun': lambda w: target_risk - portfolio_volatility(w)
        })
        objective = lambda w: -portfolio_return(w)
    else:
        # Minimize risk
        objective = lambda w: portfolio_volatility(w)
        if target_return is not None:
            constraints.append({
                'type': 'eq',
                'fun': lambda w: portfolio_return(w) - target_return
            })

    result = minimize(objective, np.ones(n) / n, method='SLSQP',
                      bounds=bounds, constraints=constraints)
    return result.x

# Example: Stock/Bond/Commodity three-asset allocation
returns = np.array([0.08, 0.04, 0.05])
cov = np.array([
    [0.04, 0.005, 0.01],
    [0.005, 0.01, 0.003],
    [0.01, 0.003, 0.02]
])
weights = portfolio_optimize(returns, cov, target_risk=0.10)
print(f"Optimal allocation: Equity {weights[0]:.1%}, Bonds {weights[1]:.1%}, Commodities {weights[2]:.1%}")

Retirement Planning Model

def retirement_projection(current_age, retire_age, current_savings,
                          monthly_contribution, annual_return,
                          inflation_rate, monthly_expense_at_retire):
    """
    Retirement planning calculator.
    Returns total assets at retirement and years sustainable.
    """
    months_to_retire = (retire_age - current_age) * 12
    r = annual_return / 12
    inf = inflation_rate / 12

    # Accumulated savings: compound growth + regular contributions
    savings_at_retire = current_savings * (1 + r) ** months_to_retire + \
        monthly_contribution * ((1 + r) ** months_to_retire - 1) / r

    # Post-retirement monthly expense (inflation-adjusted)
    real_monthly_expense = monthly_expense_at_retire * (1 + inf) ** months_to_retire

    # Calculate sustainable years
    balance = savings_at_retire
    months_sustained = 0
    while balance > 0 and months_sustained < 12 * 50:
        balance = balance * (1 + r) - real_monthly_expense
        months_sustained += 1

    return {
        'total_at_retirement': round(savings_at_retire, 0),
        'monthly_expense_retirement': round(real_monthly_expense, 0),
        'years_sustainable': round(months_sustained / 12, 1)
    }

# Example: 35-year-old customer, retiring at 60
proj = retirement_projection(
    current_age=35, retire_age=60,
    current_savings=300000,
    monthly_contribution=8000,
    annual_return=0.06,
    inflation_rate=0.03,
    monthly_expense_at_retire=8000
)
print(proj)

Wealth management AI doesn't make decisions for customers — it turns the vague "will I have enough to retire?" into a quantifiable "at current pace, your savings sustain X years." Visualization beats prediction.

The Complete Picture: From Algorithms to Business Closed Loops

Business Scenario	Core Algorithm	Key Output	Business Value
Repayment capacity	DTI + annuity PV	Credit limit	Approval standardization
Equal installment	Annuity formula	Payment schedule	Rate pricing foundation
Credit scoring	WOE + logistic regression	Standard score → auto-approval	88 points auto-approves 500K
Customer segmentation	RFM + K-Means	Segment labels	Differentiated management
Precision recommendation	Collaborative filtering + segments	Product Top N	30-50% AUM uplift
Asset allocation	Mean-variance optimization	Portfolio weights	Max return at controlled risk
Retirement planning	Compound + inflation model	Sustainable years	Long-term financial security

These seven algorithms form three business threads:

Credit Thread: repayment assessment → equal installment → credit scoring → auto-approval
Retail Thread: RFM segmentation → precision recommendation → AUM uplift
Wealth Thread: asset allocation → retirement planning → long-term companionship

All three threads share a unified data foundation: consolidated customer profiles and transaction flows. That's banking's natural AI advantage — complete data, closed business loops, clear scenarios.

The algorithms aren't the point. The point is that each algorithm serves a specific business node, and the nodes connect through shared data and features. That's what turns "AI demos" into "AI that ships."

DEV Community

Banking Retail & Credit AI in Practice: From RFM Models to Credit Algorithms — A Complete Code Walkthrough

From RFM Models to Credit Scoring: Python Implementation for Banking AI

Repayment Capacity Assessment: The Foundation

Equal Installment Algorithm: From Formula to Code

Formula Derivation

Code Implementation

Prepayment Calculation

Credit Scoring Model: The Engine for Automated Decisions

Scorecard Modeling Pipeline

Code Implementation

RFM + AI: Engineering Customer Segmentation

Traditional RFM Limitations and AI Enhancement

Code Implementation

Retail Marketing Intelligence: Precision Recommendation and AUM Growth

Segment-Driven Differentiated Strategies

Item-Based Collaborative Filtering

Wealth Management AI: Asset Allocation to Retirement Planning

Markowitz Mean-Variance Optimization

Retirement Planning Model

The Complete Picture: From Algorithms to Business Closed Loops

Top comments (0)