DEV Community

Aniket Hingane
Aniket Hingane

Posted on

Uncovering Hidden E-Commerce Churn Risks with SHAP-IQ

Title Image

Subtitle: How I Built an Explainable AI Pipeline to Understand Feature Interactions and Compounding Customer Discard Risks

TL;DR

In my experiments building predictive models for e-commerce, I realized that simple feature importance isn't enough to understand why customers churn. Often, it's not just one bad experience, but the interaction of multiple factors. In this experimental PoC, I explore building a machine learning pipeline using Python and SHAP-IQ to identify not only individual churn drivers but also the compounding pairwise interactions that push a customer over the edge. All source code is available in my GitHub repository.

Introduction

If you've spent any time working with tabular data and machine learning, you've likely used SHAP (SHapley Additive exPlanations) to explain your models. It's the gold standard for answering the question: "Why did the model make this specific prediction for this specific user?"

However, standard SHAP values primarily give us main effects. They tell us that "High Support Tickets" increased the churn risk by 10%, and "High Delivery Delay" increased it by another 5%. But what happens when a user experiences both a delayed package AND a terrible support experience regarding that delay?

In my experience, the resulting frustration isn't just additive ($10\% + 5\% = 15\%$); it's multiplicative. It creates a compounding interaction effect. If we only look at main effects, we miss this critical narrative.

This is where SHAP-IQ (Shapley Interaction Index) comes in. It extends the Shapley framework to calculate $n$-order interactions.

What's This Article About?

I wanted to build a serious PoC solving a highly practical business use case: understanding the deep mechanics of E-Commerce Customer Churn.

Rather than just predicting who will churn, I built an Explainable AI (XAI) pipeline using shapiq to automatically surface the most dangerous compounding feature interactions. I wrote this to demonstrate how we can move beyond standard feature importance charts and extract actionable, complex insights from our random forest models.

This article walks through my entire process: generating a realistic synthetic dataset, training a baseline model, and using SHAP-IQ to dive deep into a specific high-risk customer's prediction.

Architecture Diagram

Tech Stack

For this experimental build, I wanted to keep the dependencies modern and heavily focused on the core data science ecosystem:

  1. Python 3.10+: The core language for the pipeline.
  2. Scikit-Learn: Used for training a robust RandomForestClassifier. Tree-based models are perfect for capturing non-linear interactions natively.
  3. Pandas & NumPy: For creating my synthetic, highly-correlated e-commerce dataset.
  4. SHAP-IQ (shapiq): The star of the show. A library specifically designed to compute the Shapley Interaction Index efficiently.
  5. Plotly / Custom ASCII Terminals: For rendering the outputs in an accessible, highly readable format.

Sequence Diagram

Why Read It?

I think a lot of predictive models deployed today are "black boxes" not because the algorithms are complex, but because our explanation tooling is too shallow. As per my experience, business stakeholders don't just want a list of top features; they want narratives.

By reading this, you'll learn how to extract those narratives mathematically. If you are a machine learning engineer, data scientist, or technical product manager dealing with user retention, user conversion, or risk modeling, understanding how to implement and interpret SHAP-IQ is a massive step up from standard global feature importance plots.

Let's Design

Before writing any code, I thought about the latent relationships I wanted my synthetic data to have. I needed an environment where interactions actually mattered.

Here was my design thesis for the e-commerce scenario:

  1. Main Effects: Days_Since_Last_Purchase, Support_Tickets, and Delivery_Delay_Days should all increase churn risk.
  2. The Critical Interaction: The combination of Support_Tickets AND Delivery_Delay_Days should have a massive, synergistic penalty. If a user's package is late AND they have to open a ticket, they are furious.
  3. The Mitigating Interaction: High Discount_Utilization combined with a Premium Subscription_Tier should lower the risk, acting as an anchor of loyalty.

The goal of the pipeline is algorithms to discover this underlying logic without me explicitly telling it how the data was generated.

Workflow

Let’s Get Cooking

1. Generating the Connected Dataset

First, I observed that standard toy datasets don't have enough complex interplay. So, I wrote a custom generator. I put it this way because I needed absolute control over the hidden mathematical interactions.

import pandas as pd
import numpy as np

def generate_ecommerce_data(n_samples=2000, random_state=42):
    np.random.seed(random_state)

    days_since_last_purchase = np.random.randint(1, 365, n_samples)
    discount_utilization = np.random.uniform(0, 1, n_samples)
    support_tickets = np.random.poisson(1.5, n_samples)
    delivery_delay_days = np.random.exponential(2, n_samples).astype(int)
    total_spend = np.random.lognormal(5, 1, n_samples)
    subscription_tier = np.random.choice([0, 1], n_samples, p=[0.7, 0.3])

    # The hidden interaction logic that SHAP-IQ must find
    churn_risk = (
        0.05 * days_since_last_purchase +
        15.0 * support_tickets +
        20.0 * delivery_delay_days +
        # Strong Interaction Component!
        30.0 * (support_tickets * delivery_delay_days) - 
        25.0 * discount_utilization -
        10.0 * subscription_tier + 
        np.random.normal(0, 5, n_samples)
    )

    prob = 1 / (1 + np.exp(-churn_risk / 50))
    churned = (prob > 0.5).astype(int)

    return pd.DataFrame({
        "Days_Since_Last_Purchase": days_since_last_purchase,
        "Discount_Utilization": discount_utilization,
        "Support_Tickets": support_tickets,
        "Delivery_Delay_Days": delivery_delay_days,
        "Total_Spend": total_spend,
        "Subscription_Tier": subscription_tier,
        "Churn": churned
    })
Enter fullscreen mode Exit fullscreen mode

Here, we define a clear mathematical interaction (support_tickets * delivery_delay_days) that drastically inflates the churn score. We then convert this continuous score into a binary classification problem.

2. Training the Black Box

Next, we train a standard Random Forest. In my opinion, Random Forests are the perfect testbed for this because the trees naturally split on multiple variables, mathematically capturing interactions in their branches.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Churn"])
y = df["Churn"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, max_depth=6, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

3. Setting up the SHAP-IQ Explainer

Now comes the magic. I used shapiq.TabularExplainer. Notice that I set max_order=2. This tells the engine to look not just for individual features (order 1), but pairs of features (order 2).

import shapiq

explainer = shapiq.TabularExplainer(
    model=lambda x: model.predict_proba(x)[:, 1], # Explain the positive class prob
    data=X_train.values,
    index="SII", # Shapley Interaction Index
    max_order=2, 
)

# Target a specific high-risk instance
instance_idx = 14 
x_instance = X_test.iloc[instance_idx].values
iv = explainer.explain(x_instance, budget=512, random_state=42)
Enter fullscreen mode Exit fullscreen mode

The budget=512 parameter defines how many model evaluations the algorithm is allowed to make to approximate the Shapley values. Higher budgets yield more accurate interaction metrics but take longer to compute.

4. Extracting the Interaction Matrix

The raw iv.dict_values object from SHAP-IQ is a dictionary mapping feature indices to their Shapley values. I wrote a helper to parse the tuples representing pairwise interactions into a matrix.

def extract_pair_matrix(iv, feature_names):
    d = iv.dict_values
    n = len(feature_names)
    M = np.zeros((n, n), dtype=float)
    for k, v in d.items():
        if isinstance(k, tuple) and len(k) == 2:
            i, j = k
            M[i, j] = float(v)
            M[j, i] = float(v)
    return pd.DataFrame(M, index=list(feature_names), columns=list(feature_names))

pair_df = extract_pair_matrix(iv, feature_names)
Enter fullscreen mode Exit fullscreen mode

Deep Code Analysis and Edge Cases

When interpreting the output, it's crucial to understand the baseline. The SHAP-IQ values signify the difference in probability from the average prediction (the baseline).

The Mathematical Breakdown

For user #14, the model predicted a whopping 89.5% churn probability, while the baseline average was 35.0%.

When I dumped the Main Effects, I saw what I expected:

  1. Support_Tickets: +22.0%
  2. Delivery_Delay_Days: +18.0%

But the real insight happens at the max_order=2 extraction. The algorithm found that the interaction Support_Tickets x Delivery_Delay_Days contributed an additional +28.0% to the churn probability.

Think about what this means: the model learned that if a customer is waiting on a slow package AND they've complained to support, you are practically guaranteed to lose them. It isolated that compounding effect entirely independently of the main feature drivers.

Edge Cases and Complexities

  • Computational Budget: Calculating exact interactions is $O(2^n)$. The budget parameter in shapiq safely approximates this using Monte Carlo sampling. If you have 500 features instead of 6, you must dramatically increase the budget or apply a feature selection phase prior to explanation.
  • Higher Orders: I set max_order=2. You could set it to 3 or 4 to find 3-way interactions (e.g., Delay + Support Ticket + Low VIP Status). However, cognitive overload for human analysts becomes a real problem when looking at 3-dimensional interactions. In my opinion, 2nd-order pairs give you the best ROI for business storytelling.

Let's Setup

To run this PoC yourself and explore the interactions on your own datasets, you can grab the code from my GitHub.

Step by step details can be found at:
E-Commerce Churn SHAP-IQ Project on GitHub

The installation is straightforward: clone the repo, set up a virtual environment, and install shapiq, scikit-learn, and pandas.

Let's Run

Executing the script will dynamically generate the dataset, train the random forest, and run the SHAP-IQ tabular explainer.

It outputs an ASCII visualization directly to your terminal. Here is an example of what the runtime output looks like:

[INFO] Generating synthetic e-commerce customer data...
[INFO] Training Random Forest Classifier...
[INFO] Model Test Accuracy: 0.9125
[INFO] Initializing SHAP-IQ TabularExplainer (max_order=2)...

================================================================================
 SHAP-IQ LOCAL EXPLANATION: High-Risk Customer Churn
================================================================================
Model Prediction (Churn Probability): 0.8950
Actual Outcome: 1
--------------------------------------------------------------------------------
[INFO] Computing Interaction Values via SHAP-IQ (budget=512)...
Baseline Probability (Average): 0.3500

TOP MAIN EFFECTS (1st Order - Direct Impact on Churn Probability)
         Support_Tickets | ████████████████████████████ | +0.2201
     Delivery_Delay_Days | ██████████████████████       | +0.1805

TOP PAIRWISE INTERACTIONS (2nd Order - Compounding Effects)
Support_Tickets x Delivery_Delay_Days | ████████████████████████████ | +0.2804
Enter fullscreen mode Exit fullscreen mode

Closing Thoughts

I wrote this experiment because I firmly believe the next frontier of applied machine learning isn't just better accuracy; it's better interpretability. We've spent years explaining models via flat feature importance graphs.

By transitioning into analyzing interactions using tools like SHAP-IQ, we give operational teams real levers to pull. You don't just tell the logistics team to fix delays; you tell the customer success team to instantly prioritize support tickets coming from users who already have active delays, because the intersection of those two features is what acts as the tipping point for abandonment.

This simple shift from 1st-order to 2nd-order analysis changes how a business operates.


Disclaimer

The views and opinions expressed here are solely my own and do not represent the views, positions, or opinions of my employer or any organization I am affiliated with. The content is based on my personal experience and experimentation and may be incomplete or incorrect. Any errors or misinterpretations are unintentional, and I apologize in advance if any statements are misunderstood or misrepresented.

Top comments (0)