Customer Segmentation Toolkit
Production-ready Python toolkit for RFM analysis, customer lifetime value (CLV) calculation, churn prediction, and targeted campaign generation. Transform raw transaction data into actionable customer segments.
Key Features
- RFM Scoring Engine — Recency, Frequency, Monetary scoring with configurable quintile boundaries
- CLV Calculator — Historical and predictive lifetime value using BG/NBD-inspired models
- Churn Prediction — Rule-based churn risk scoring with configurable thresholds
- Campaign Targeting — Auto-generate segment-specific campaign lists with export support
- SQL-First Approach — Includes ready-to-run SQL queries for common data warehouses
- Configurable Segments — Define custom segment labels, score weights, and tier boundaries
Quick Start
# 1. Extract and enter the project
unzip customer-segmentation-toolkit.zip
cd customer-segmentation-toolkit
# 2. Copy and edit configuration
cp config.example.yaml config.yaml
# 3. Run the segmentation pipeline
python -m customer_segmentation_toolkit.core --config config.yaml
Architecture
src/customer_segmentation_toolkit/
├── __init__.py # Package init, version info
├── core.py # Main pipeline: load → score → segment → export
└── utils.py # Date math, percentile bucketing, CSV I/O helpers
Data Flow: Raw Transactions → RFM Scores → Segment Assignment → Campaign Lists
Usage Examples
RFM Scoring
from customer_segmentation_toolkit.core import RFMScorer
scorer = RFMScorer(
recency_bins=5,
frequency_bins=5,
monetary_bins=5,
reference_date="2026-03-23"
)
# transactions: list of dicts with customer_id, order_date, order_total
segments = scorer.score(transactions)
for segment in segments[:3]:
print(f"Customer {segment['customer_id']}: "
f"RFM={segment['r_score']}{segment['f_score']}{segment['m_score']} "
f"→ {segment['segment_label']}")
# Customer C-1001: RFM=544 → Champions
# Customer C-1002: RFM=155 → At Risk
# Customer C-1003: RFM=311 → Needs Attention
CLV Calculation
from customer_segmentation_toolkit.core import CLVCalculator
calc = CLVCalculator(
margin_rate=0.35,
discount_rate=0.10,
horizon_months=12
)
clv_results = calc.compute(transactions)
print(f"Top customer CLV: ${clv_results[0]['predicted_clv']:.2f}")
SQL: Extract RFM Base Data
-- Pull RFM base metrics from your data warehouse
SELECT
customer_id,
DATEDIFF(DAY, MAX(order_date), CURRENT_DATE) AS recency_days,
COUNT(DISTINCT order_id) AS frequency,
SUM(order_total) AS monetary_total
FROM orders
WHERE order_status = 'completed'
AND order_date >= DATEADD(YEAR, -2, CURRENT_DATE)
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 2;
Churn Risk Scoring
from customer_segmentation_toolkit.core import ChurnPredictor
predictor = ChurnPredictor(
inactivity_threshold_days=90,
frequency_drop_pct=0.50,
risk_tiers={"high": 0.7, "medium": 0.4, "low": 0.0}
)
at_risk = predictor.predict(segments)
print(f"High-risk customers: {len([c for c in at_risk if c['risk_tier'] == 'high'])}")
Configuration
# config.yaml — all options documented
rfm:
recency_bins: 5 # Number of recency quintiles (3-10)
frequency_bins: 5 # Number of frequency quintiles
monetary_bins: 5 # Number of monetary quintiles
reference_date: "auto" # "auto" = today, or "YYYY-MM-DD"
clv:
margin_rate: 0.35 # Gross margin applied to revenue
discount_rate: 0.10 # Annual discount rate for NPV
horizon_months: 12 # Prediction horizon
churn:
inactivity_threshold_days: 90
frequency_drop_pct: 0.50 # Flag if frequency drops by 50%+
segments:
labels: # Custom segment names mapped to RFM ranges
Champions: { r: [4,5], f: [4,5], m: [4,5] }
Loyal: { r: [3,5], f: [3,5], m: [3,5] }
At Risk: { r: [1,2], f: [3,5], m: [3,5] }
Lost: { r: [1,2], f: [1,2], m: [1,2] }
export:
format: "csv" # csv | json
output_dir: "./output"
Best Practices
- Use 2+ years of data — Shorter windows skew frequency and monetary scores
- Exclude outliers — Filter bulk/wholesale orders that distort monetary quintiles
- Refresh weekly — Customer segments drift; automate with cron or Airflow
- Combine with campaign tools — Export segment CSVs into your email platform
- Monitor segment migration — Track how customers move between tiers over time
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| All customers in one segment | Too few transactions | Reduce bins to 3 |
| CLV values seem too high | Missing margin adjustment | Set margin_rate correctly |
| Empty output file | No data matching filters | Check date range and status filters |
| Score ties in quintiles | Low cardinality data | Use bins: 3 for smaller datasets |
This is 1 of 11 resources in the Retail Automation Pro toolkit. Get the complete [Customer Segmentation Toolkit] with all files, templates, and documentation for $39.
Or grab the entire Retail Automation Pro bundle (11 products) for $139 — save 30%.
Top comments (0)