DEV Community

Cover image for Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)

Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)

Hey folks! πŸ‘‹

I recently built a customer churn prediction system that not only predicts who will leave β€” but also explains why in plain English using Amazon Bedrock.

In this tutorial, I'll walk you through building the entire pipeline from scratch.

What we achieved:

  • βœ… 84.2% AUC on validation data
  • βœ… Real-time predictions via SageMaker endpoint
  • βœ… Natural language explanations powered by Claude (Bedrock)

Let's dive in!


🎯 What We're Building

An end-to-end ML pipeline that:

  1. Ingests customer data into S3
  2. Trains a churn prediction model with SageMaker XGBoost
  3. Deploys a real-time inference endpoint
  4. Explains predictions using Amazon Bedrock (Claude)
  5. Exposes everything via API Gateway + Lambda

Prerequisites: AWS account, basic Python knowledge


πŸ—οΈ Architecture Overview

AWS Churn Prediction Architecture

The pipeline consists of 5 tiers:

Tier Services Purpose
Data Ingestion S3 Store raw customer data
ML Training SageMaker Training Train XGBoost model
Model Storage S3 Store model artifacts
Inference & AI SageMaker Endpoint, Bedrock Real-time predictions + NL explanations
API Layer API Gateway, Lambda Expose REST API

Step 1: Set Up S3 and Upload Data

First, create an S3 bucket and upload the dataset.

# Set bucket name with your account ID
export BUCKET_NAME=churn-prediction-$(aws sts get-caller-identity --query Account --output text)

# Create bucket
aws s3 mb s3://$BUCKET_NAME

# Upload your data
aws s3 cp WA_Fn-UseC_-Telco-Customer-Churn.csv s3://$BUCKET_NAME/raw/
Enter fullscreen mode Exit fullscreen mode

πŸ“₯ Dataset: Download the Telco Customer Churn dataset from Kaggle.


Step 2: Create SageMaker IAM Role

In AWS Console:

  1. Go to IAM β†’ Roles β†’ Create role
  2. Select SageMaker - Execution
  3. Add policies: AmazonSageMakerFullAccess + AmazonS3FullAccess
  4. Name it: SageMakerChurnRole

Step 3: Train the Model

Create train_churn.py:

import boto3
import sagemaker
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sagemaker.inputs import TrainingInput

# Config
BUCKET = os.environ['BUCKET_NAME']
ROLE = os.environ['ROLE_ARN']
PREFIX = 'churn-prediction'

session = sagemaker.Session()
region = session.boto_region_name

# Load and prepare data
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce').fillna(0)
df['Churn'] = (df['Churn'] == 'Yes').astype(int)

# Encode categorical columns
cat_cols = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 
            'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 
            'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 
            'PaperlessBilling', 'PaymentMethod']

for col in cat_cols:
    df[col] = df[col].astype('category').cat.codes

# Features
feature_cols = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges'] + cat_cols
X = df[feature_cols]
y = df['Churn']

# Split and save
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

train_df = pd.concat([y_train.reset_index(drop=True), X_train.reset_index(drop=True)], axis=1)
test_df = pd.concat([y_test.reset_index(drop=True), X_test.reset_index(drop=True)], axis=1)
train_df.to_csv('train.csv', index=False, header=False)
test_df.to_csv('test.csv', index=False, header=False)

# Upload to S3
s3 = boto3.client('s3')
s3.upload_file('train.csv', BUCKET, f'{PREFIX}/train/train.csv')
s3.upload_file('test.csv', BUCKET, f'{PREFIX}/test/test.csv')

# Train XGBoost
container = sagemaker.image_uris.retrieve('xgboost', region, '1.7-1')

xgb = sagemaker.estimator.Estimator(
    image_uri=container,
    role=ROLE,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=f's3://{BUCKET}/{PREFIX}/output',
    sagemaker_session=session
)

xgb.set_hyperparameters(
    objective='binary:logistic',
    num_round=100,
    max_depth=5,
    eta=0.2,
    eval_metric='auc'
)

xgb.fit({
    'train': TrainingInput(f's3://{BUCKET}/{PREFIX}/train', content_type='csv'),
    'validation': TrainingInput(f's3://{BUCKET}/{PREFIX}/test', content_type='csv')
})

# Deploy endpoint
predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    endpoint_name='churn-prediction-endpoint',
    serializer=sagemaker.serializers.CSVSerializer()
)
Enter fullscreen mode Exit fullscreen mode

Run it:

export BUCKET_NAME=churn-prediction-YOUR_ACCOUNT_ID
export ROLE_ARN=arn:aws:iam::YOUR_ACCOUNT_ID:role/SageMakerChurnRole
python3 train_churn.py
Enter fullscreen mode Exit fullscreen mode

Training output:

2026-01-01 00:24:27 Uploading - Uploading generated training model
2026-01-01 00:24:27 Completed - Training job completed
Training seconds: 103
Billable seconds: 103

βœ… Training complete!
Model artifact: s3://churn-prediction-905418352184/churn-prediction/output/sagemaker-xgboost-2026-01-01-00-22-03-339/output/model.tar.gz

Deploying endpoint (3-5 min)...
INFO:sagemaker:Creating model with name: sagemaker-xgboost-2026-01-01-00-24-53-959
INFO:sagemaker:Creating endpoint-config with name churn-prediction-endpoint
INFO:sagemaker:Creating endpoint with name churn-prediction-endpoint
---------------!
βœ… Endpoint deployed: churn-prediction-endpoint
Test prediction: 0.4% churn probability
Enter fullscreen mode Exit fullscreen mode

Step 4: Create Lambda with Bedrock Integration

Create a Lambda function ChurnPredictionAPI with this code:

import json
import boto3
import os

sagemaker_runtime = boto3.client('sagemaker-runtime')
bedrock = boto3.client('bedrock-runtime')

ENDPOINT_NAME = os.environ.get('SAGEMAKER_ENDPOINT', 'churn-prediction-endpoint')

def lambda_handler(event, context):
    body = json.loads(event['body']) if isinstance(event.get('body'), str) else event

    # Get prediction from SageMaker
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=ENDPOINT_NAME,
        ContentType='text/csv',
        Body=body['features']
    )

    churn_prob = float(response['Body'].read().decode())

    # Generate explanation with Bedrock Claude
    prompt = f"""A customer has {churn_prob:.1%} churn probability.
Customer: Tenure {body.get('tenure', 'N/A')} months, ${body.get('monthly_charges', 'N/A')}/month, {body.get('contract', 'N/A')} contract.
In 2 sentences, explain the risk and suggest one retention action."""

    bedrock_response = bedrock.invoke_model(
        modelId='anthropic.claude-3-haiku-20240307-v1:0',
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": [{"role": "user", "content": prompt}]
        })
    )

    explanation = json.loads(bedrock_response['body'].read())['content'][0]['text']
    risk = "High" if churn_prob > 0.7 else "Medium" if churn_prob > 0.4 else "Low"

    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'churn_probability': f"{churn_prob:.1%}",
            'risk_level': risk,
            'explanation': explanation
        })
    }
Enter fullscreen mode Exit fullscreen mode

Test Successful

Lambda configuration:

  • Runtime: Python 3.11
  • Timeout: 30 seconds
  • Role: LambdaChurnRole (with SageMaker + Bedrock permissions)
  • Environment variable: SAGEMAKER_ENDPOINT=churn-prediction-endpoint

Step 5: Create API Gateway

  1. Create an HTTP API in API Gateway
  2. Add Lambda integration β†’ ChurnPredictionAPI
  3. Create POST route: /predict
  4. Deploy and get your invoke URL

Create API Gateway

configure


πŸ§ͺ Test the API

curl -X POST "https://YOUR_API_URL/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
  }'
Enter fullscreen mode Exit fullscreen mode

ChurnAPI

Response:


(.venv) server@DLG lambda_package % curl -X POST "https://jxairjovmi.execute-api.us-east-1.amazonaws.com/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
  }'

{"churn_probability": "0.6%", "risk_level": "Low", "explanation": "The customer's high churn probability of 0.6% and the month-to-month contract indicate a significant risk of losing the customer. To mitigate this risk, a retention action could be to offer the customer a longer-term contract with a discounted monthly rate or additional benefits, which may help increase their loyalty and reduce the likelihood of churn."}%     
Enter fullscreen mode Exit fullscreen mode

🧹 Cleanup

Don't forget to delete resources to avoid charges:

# Delete SageMaker endpoint (most expensive!)
aws sagemaker delete-endpoint --endpoint-name churn-prediction-endpoint
aws sagemaker delete-endpoint-config --endpoint-config-name churn-prediction-endpoint

# Delete Lambda
aws lambda delete-function --function-name ChurnPredictionAPI

# Delete S3 bucket
aws s3 rb s3://$BUCKET_NAME --force
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Key Lessons Learned

  1. SageMaker XGBoost is production-ready β€” achieved 84% AUC with minimal tuning.
  2. Bedrock adds real business value β€” converting predictions to actionable insights makes ML accessible to non-technical stakeholders.
  3. IAM permissions are tricky β€” create roles via Console if CLI gives explicit deny errors.
  4. Cost awareness matters β€” always delete endpoints when not in use (~$0.05/hour adds up!)

Resources

Thanks for reading! If this helped you, follow me for more AWS + Data Engineering content.

Questions? Leave a comment below!

Top comments (0)