You open the Playground Series S6E3 competition, see 250k+ rows of customer data, and think: “Where do I even start?”
I’ve been there. This post is exactly the first notebook I wish I had when I jumped in a dead-simple, copy-paste-ready pipeline that takes you from raw CSV to a solid submission. No theory overload, just the steps that actually work (and why they matter). Let’s go!
1. Grab the Tools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier
from lightgbm import LGBMClassifier
import warnings
warnings.filterwarnings("ignore")
These are my go-to imports for every tabular comp. LightGBM will be your hero later.
2. Load & Quick Look
df = pd.read_csv("/kaggle/input/competitions/playground-series-s6e3/train.csv")
X = df.drop(columns=["Churn", "id"])
y = df["Churn"]
Run df.shape, df.head(), df.info(). Clean data, zero missing values — we’re lucky today!
3. Tiny Cleanup (Just in Case)
X["TotalCharges"] = pd.to_numeric(X["TotalCharges"], errors="coerce")
Always make sure numbers are actually numbers.
4. Know Your Columns
- Numbers: tenure, MonthlyCharges, TotalCharges, SeniorCitizen
- Categories: gender, Contract, PaymentMethod, streaming stuff, etc.
Models only understand numbers, so categories need love.
5. My Secret Weapon: Merge Columns
This one trick makes everything faster and cleaner:
X['StreamingAny'] = ((X['StreamingTV'] == 'Yes') | (X['StreamingMovies'] == 'Yes')).astype(int)
X = X.drop(columns=['StreamingTV', 'StreamingMovies'])
Why I do this every time:
- Cuts 4–5 columns → 20–40% faster training
- Saves RAM (huge on big datasets)
- Removes confusing duplicate signals
- Model learns real customer habits instead of memorizing noise
Feels like decluttering your code suddenly everything runs smoother.
6. Turn Words into Numbers
Easy Yes/No first:
binary_cols = ['Partner', 'Dependents', 'PhoneService', 'PaperlessBilling']
for col in binary_cols:
X[col] = X[col].map({'Yes': 1, 'No': 0})
Then the rest:
X = pd.get_dummies(X, drop_first=True)
All numeric now. Boom.
7. Split Smart
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
Stratify keeps the churn ratio the same critical for this competition.
8. Train Two Models (Quick Check + Real Deal)
Baseline (Random Forest):
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
print("RF ROC-AUC:", roc_auc_score(y_val, rf.predict_proba(X_val)[:, 1]))
The one that actually scores well (LightGBM):
lgb = LGBMClassifier(random_state=42)
lgb.fit(X_train, y_train)
print("LGB ROC-AUC:", roc_auc_score(y_val, lgb.predict_proba(X_val)[:, 1]))
LightGBM usually jumps ahead — this is your starting leaderboard model.
9. Test Set (Same Steps, No Leaks!)
test = pd.read_csv("/kaggle/input/competitions/playground-series-s6e3/test.csv")
test_X = test.drop(columns=['id'])
# Same merge
test_X['StreamingAny'] = ((test_X['StreamingTV'] == 'Yes') | (test_X['StreamingMovies'] == 'Yes')).astype(int)
test_X = test_X.drop(columns=['StreamingTV', 'StreamingMovies'])
# Same encoding
test_X = pd.get_dummies(test_X, drop_first=True)
test_X = test_X.reindex(columns=X.columns, fill_value=0)
preds = lgb.predict_proba(test_X)[:, 1]
submission = pd.DataFrame({"id": test["id"], "Churn": preds})
submission.to_csv("submission.csv", index=False)
Want to Level Up Later?
- Add cross-validation
- Merge more groups (add-ons, contract type)
- Tune LightGBM with Optuna
- Try CatBoost (zero encoding needed)
One-Sentence Recap
Start with clean loading → merge redundant columns → encode → split → train LGB → apply exact same steps to test → submit.
That’s the real starting point every Kaggler needs.
Copy this notebook, run it, and you’re already ahead.
Got a score? Hit a bug? Drop it in the comments or tag me I reply to every one.
Happy starting !
Girma Wakeyo
Kaggle → https://www.kaggle.com/girmawakeyo
GitHub → https://github.com/Girma35
X → https://x.com/Girma880731631
Follow for more quick-start notebooks and competition tips. Let’s climb those leaderboards together!
Top comments (0)