Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Customer Segmentation Toolkit

#ecommerce #python #analytics #business

Customer Segmentation Toolkit

Production-ready Python toolkit for RFM analysis, customer lifetime value (CLV) calculation, churn prediction, and targeted campaign generation. Transform raw transaction data into actionable customer segments.

Key Features

RFM Scoring Engine — Recency, Frequency, Monetary scoring with configurable quintile boundaries
CLV Calculator — Historical and predictive lifetime value using BG/NBD-inspired models
Churn Prediction — Rule-based churn risk scoring with configurable thresholds
Campaign Targeting — Auto-generate segment-specific campaign lists with export support
SQL-First Approach — Includes ready-to-run SQL queries for common data warehouses
Configurable Segments — Define custom segment labels, score weights, and tier boundaries

Quick Start

# 1. Extract and enter the project
unzip customer-segmentation-toolkit.zip
cd customer-segmentation-toolkit

# 2. Copy and edit configuration
cp config.example.yaml config.yaml

# 3. Run the segmentation pipeline
python -m customer_segmentation_toolkit.core --config config.yaml

Architecture

src/customer_segmentation_toolkit/
├── __init__.py          # Package init, version info
├── core.py              # Main pipeline: load → score → segment → export
└── utils.py             # Date math, percentile bucketing, CSV I/O helpers

Data Flow: Raw Transactions → RFM Scores → Segment Assignment → Campaign Lists

Usage Examples

RFM Scoring

from customer_segmentation_toolkit.core import RFMScorer

scorer = RFMScorer(
    recency_bins=5,
    frequency_bins=5,
    monetary_bins=5,
    reference_date="2026-03-23"
)

# transactions: list of dicts with customer_id, order_date, order_total
segments = scorer.score(transactions)

for segment in segments[:3]:
    print(f"Customer {segment['customer_id']}: "
          f"RFM={segment['r_score']}{segment['f_score']}{segment['m_score']} "
          f"→ {segment['segment_label']}")
# Customer C-1001: RFM=544 → Champions
# Customer C-1002: RFM=155 → At Risk
# Customer C-1003: RFM=311 → Needs Attention

CLV Calculation

from customer_segmentation_toolkit.core import CLVCalculator

calc = CLVCalculator(
    margin_rate=0.35,
    discount_rate=0.10,
    horizon_months=12
)

clv_results = calc.compute(transactions)
print(f"Top customer CLV: ${clv_results[0]['predicted_clv']:.2f}")

SQL: Extract RFM Base Data

-- Pull RFM base metrics from your data warehouse
SELECT
    customer_id,
    DATEDIFF(DAY, MAX(order_date), CURRENT_DATE) AS recency_days,
    COUNT(DISTINCT order_id)                       AS frequency,
    SUM(order_total)                               AS monetary_total
FROM orders
WHERE order_status = 'completed'
  AND order_date >= DATEADD(YEAR, -2, CURRENT_DATE)
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 2;

Churn Risk Scoring

from customer_segmentation_toolkit.core import ChurnPredictor

predictor = ChurnPredictor(
    inactivity_threshold_days=90,
    frequency_drop_pct=0.50,
    risk_tiers={"high": 0.7, "medium": 0.4, "low": 0.0}
)

at_risk = predictor.predict(segments)
print(f"High-risk customers: {len([c for c in at_risk if c['risk_tier'] == 'high'])}")

Configuration

# config.yaml — all options documented
rfm:
  recency_bins: 5              # Number of recency quintiles (3-10)
  frequency_bins: 5            # Number of frequency quintiles
  monetary_bins: 5             # Number of monetary quintiles
  reference_date: "auto"       # "auto" = today, or "YYYY-MM-DD"

clv:
  margin_rate: 0.35            # Gross margin applied to revenue
  discount_rate: 0.10          # Annual discount rate for NPV
  horizon_months: 12           # Prediction horizon

churn:
  inactivity_threshold_days: 90
  frequency_drop_pct: 0.50     # Flag if frequency drops by 50%+

segments:
  labels:                      # Custom segment names mapped to RFM ranges
    Champions: { r: [4,5], f: [4,5], m: [4,5] }
    Loyal: { r: [3,5], f: [3,5], m: [3,5] }
    At Risk: { r: [1,2], f: [3,5], m: [3,5] }
    Lost: { r: [1,2], f: [1,2], m: [1,2] }

export:
  format: "csv"                # csv | json
  output_dir: "./output"

Best Practices

Use 2+ years of data — Shorter windows skew frequency and monetary scores
Exclude outliers — Filter bulk/wholesale orders that distort monetary quintiles
Refresh weekly — Customer segments drift; automate with cron or Airflow
Combine with campaign tools — Export segment CSVs into your email platform
Monitor segment migration — Track how customers move between tiers over time

Troubleshooting

Issue	Cause	Fix
All customers in one segment	Too few transactions	Reduce `bins` to 3
CLV values seem too high	Missing margin adjustment	Set `margin_rate` correctly
Empty output file	No data matching filters	Check date range and status filters
Score ties in quintiles	Low cardinality data	Use `bins: 3` for smaller datasets

This is 1 of 11 resources in the Retail Automation Pro toolkit. Get the complete [Customer Segmentation Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire Retail Automation Pro bundle (11 products) for $139 — save 30%.

Get the Complete Bundle →

DEV Community

Customer Segmentation Toolkit

Customer Segmentation Toolkit

Key Features

Quick Start

Architecture

Usage Examples

RFM Scoring

CLV Calculation

SQL: Extract RFM Base Data

Churn Risk Scoring

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (0)