DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Customer Segmentation Toolkit

Customer Segmentation Toolkit

Production-ready Python toolkit for RFM analysis, customer lifetime value (CLV) calculation, churn prediction, and targeted campaign generation. Transform raw transaction data into actionable customer segments.

Key Features

  • RFM Scoring Engine — Recency, Frequency, Monetary scoring with configurable quintile boundaries
  • CLV Calculator — Historical and predictive lifetime value using BG/NBD-inspired models
  • Churn Prediction — Rule-based churn risk scoring with configurable thresholds
  • Campaign Targeting — Auto-generate segment-specific campaign lists with export support
  • SQL-First Approach — Includes ready-to-run SQL queries for common data warehouses
  • Configurable Segments — Define custom segment labels, score weights, and tier boundaries

Quick Start

# 1. Extract and enter the project
unzip customer-segmentation-toolkit.zip
cd customer-segmentation-toolkit

# 2. Copy and edit configuration
cp config.example.yaml config.yaml

# 3. Run the segmentation pipeline
python -m customer_segmentation_toolkit.core --config config.yaml
Enter fullscreen mode Exit fullscreen mode

Architecture

src/customer_segmentation_toolkit/
├── __init__.py          # Package init, version info
├── core.py              # Main pipeline: load → score → segment → export
└── utils.py             # Date math, percentile bucketing, CSV I/O helpers
Enter fullscreen mode Exit fullscreen mode

Data Flow: Raw Transactions → RFM Scores → Segment Assignment → Campaign Lists

Usage Examples

RFM Scoring

from customer_segmentation_toolkit.core import RFMScorer

scorer = RFMScorer(
    recency_bins=5,
    frequency_bins=5,
    monetary_bins=5,
    reference_date="2026-03-23"
)

# transactions: list of dicts with customer_id, order_date, order_total
segments = scorer.score(transactions)

for segment in segments[:3]:
    print(f"Customer {segment['customer_id']}: "
          f"RFM={segment['r_score']}{segment['f_score']}{segment['m_score']} "
          f"{segment['segment_label']}")
# Customer C-1001: RFM=544 → Champions
# Customer C-1002: RFM=155 → At Risk
# Customer C-1003: RFM=311 → Needs Attention
Enter fullscreen mode Exit fullscreen mode

CLV Calculation

from customer_segmentation_toolkit.core import CLVCalculator

calc = CLVCalculator(
    margin_rate=0.35,
    discount_rate=0.10,
    horizon_months=12
)

clv_results = calc.compute(transactions)
print(f"Top customer CLV: ${clv_results[0]['predicted_clv']:.2f}")
Enter fullscreen mode Exit fullscreen mode

SQL: Extract RFM Base Data

-- Pull RFM base metrics from your data warehouse
SELECT
    customer_id,
    DATEDIFF(DAY, MAX(order_date), CURRENT_DATE) AS recency_days,
    COUNT(DISTINCT order_id)                       AS frequency,
    SUM(order_total)                               AS monetary_total
FROM orders
WHERE order_status = 'completed'
  AND order_date >= DATEADD(YEAR, -2, CURRENT_DATE)
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 2;
Enter fullscreen mode Exit fullscreen mode

Churn Risk Scoring

from customer_segmentation_toolkit.core import ChurnPredictor

predictor = ChurnPredictor(
    inactivity_threshold_days=90,
    frequency_drop_pct=0.50,
    risk_tiers={"high": 0.7, "medium": 0.4, "low": 0.0}
)

at_risk = predictor.predict(segments)
print(f"High-risk customers: {len([c for c in at_risk if c['risk_tier'] == 'high'])}")
Enter fullscreen mode Exit fullscreen mode

Configuration

# config.yaml — all options documented
rfm:
  recency_bins: 5              # Number of recency quintiles (3-10)
  frequency_bins: 5            # Number of frequency quintiles
  monetary_bins: 5             # Number of monetary quintiles
  reference_date: "auto"       # "auto" = today, or "YYYY-MM-DD"

clv:
  margin_rate: 0.35            # Gross margin applied to revenue
  discount_rate: 0.10          # Annual discount rate for NPV
  horizon_months: 12           # Prediction horizon

churn:
  inactivity_threshold_days: 90
  frequency_drop_pct: 0.50     # Flag if frequency drops by 50%+

segments:
  labels:                      # Custom segment names mapped to RFM ranges
    Champions: { r: [4,5], f: [4,5], m: [4,5] }
    Loyal: { r: [3,5], f: [3,5], m: [3,5] }
    At Risk: { r: [1,2], f: [3,5], m: [3,5] }
    Lost: { r: [1,2], f: [1,2], m: [1,2] }

export:
  format: "csv"                # csv | json
  output_dir: "./output"
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use 2+ years of data — Shorter windows skew frequency and monetary scores
  2. Exclude outliers — Filter bulk/wholesale orders that distort monetary quintiles
  3. Refresh weekly — Customer segments drift; automate with cron or Airflow
  4. Combine with campaign tools — Export segment CSVs into your email platform
  5. Monitor segment migration — Track how customers move between tiers over time

Troubleshooting

Issue Cause Fix
All customers in one segment Too few transactions Reduce bins to 3
CLV values seem too high Missing margin adjustment Set margin_rate correctly
Empty output file No data matching filters Check date range and status filters
Score ties in quintiles Low cardinality data Use bins: 3 for smaller datasets

This is 1 of 11 resources in the Retail Automation Pro toolkit. Get the complete [Customer Segmentation Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire Retail Automation Pro bundle (11 products) for $139 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)