🛡️ Building FraudShield: Credit Card Fraud Detection with Imbalanced Data

Mahira Banu — Tue, 28 Apr 2026 18:23:32 +0000

Fraud detection is one of those problems that looks simple on the surface — classify transactions as “fraud” or “not fraud”. But once you look at real data, it becomes a completely different challenge.

In this project, I built FraudShield, an end-to-end machine learning system to detect fraudulent credit card transactions using both supervised and unsupervised approaches, along with a live dashboard.

📊 The Problem

The dataset I used contains over 284,000 transactions, but only:

👉 0.17% are fraud

This creates a highly imbalanced dataset, where a model can achieve 99% accuracy just by predicting everything as “not fraud”.

So the real question becomes:

How do we detect fraud when it’s so rare?

🔍 Dataset Overview

The dataset contains real-world credit card transactions made by European cardholders, anonymised using PCA transformation to protect sensitive information. It includes 284,807 transactions, of which only 492 are fraudulent (~0.17%), making it a highly imbalanced classification problem.

🧠 What are V1–V28?

These are PCA-transformed features.

In simple terms:

The original features are hidden
Data is transformed into mathematical components
We can’t interpret them directly

👉 This makes the problem harder — models must learn patterns without human-readable features.

📈 Exploratory Data Analysis (EDA)

Some key observations:

The dataset is extremely imbalanced
Most transactions are low value
Fraud doesn’t follow obvious patterns
Features are weakly correlated due to PCA transformation

One important realization early on:

Accuracy is NOT a useful metric here

⚠️ Why Accuracy is Misleading

If a model predicts:

text All transactions = Normal

It gets:

👉 99.8% accuracy

…but detects zero fraud

So instead, I focused on:

Precision
Recall
F1 Score

🤖 Model 1 — XGBoost (Supervised Learning)

I trained an XGBoost classifier, which is well-suited for tabular data and imbalanced problems.

Key setup:

scale_pos_weight to handle imbalance
Stratified train/test split
Feature scaling

📊 Results:

Precision: 0.71
Recall: 0.87 🔥
F1 Score: 0.78

🧠 Insight:

The model successfully detects 87% of fraud cases, which is critical in real-world systems.

🧪 Model 2 — Isolation Forest (Unsupervised)

To compare approaches, I also used Isolation Forest, an anomaly detection model.

📊 Results:

Precision: 0.29
Recall: 0.30
F1 Score: 0.30

🧠 Insight:

Unsupervised models struggle to detect subtle fraud patterns without labelled data.

⚖️ Model Comparison

Model	Precision	Recall	F1
XGBoost	0.71	0.87	0.78
Isolation Forest	0.29	0.30	0.30

🚀 Key takeaway:

Supervised learning significantly outperforms unsupervised anomaly detection when labelled data is available.

🔍 Explainability with SHAP

To understand how the model makes decisions, I used SHAP (SHapley Additive exPlanations).

This helps answer:

Which features influence predictions?
Why was a transaction classified as fraud?

👉 This adds transparency and trust to the system.

🖥️ Deployment — Streamlit Dashboard

To make the system usable, I built a Streamlit dashboard.

Features:

Input transaction data
Predict fraud probability
Display risk level
Show model metrics

🌐 Live Demo & Code

💻 GitHub: https://github.com/mahira-code/fraudshield-ml
🌍 Live Demo: https://fraudshield-ml-mahira.streamlit.app/

🧠 What I Learned

This project taught me a lot about real-world machine learning:

Handling imbalanced datasets
Choosing the right evaluation metrics
Comparing supervised vs unsupervised models
Using SHAP for explainability
Building and deploying end-to-end ML systems

🚀 What’s Next

Hyperparameter tuning
Model monitoring (drift detection)
API deployment (FastAPI)
MLOps integration

👩‍💻 About Me

I’m Mahira Banu, a Data Scientist and AI Engineer focused on building practical, real-world AI systems.

🌐 Portfolio: https://mahirabanu.website
💻 GitHub: https://github.com/mahira-code
🔗 LinkedIn: https://www.linkedin.com/in/mahira-banu

💬 Final Thoughts

Fraud detection isn’t just about building a model — it’s about understanding data, handling imbalance, and making reliable decisions in high-risk scenarios.

If you’re working on similar problems, I’d love to hear your thougts

Starting My DSA Prep Journey Join Me!

Mahira Banu — Sat, 28 Jun 2025 16:43:23 +0000

Hi everyone!

I'm excited to share that I've started working on beginner-level Data Structures & Algorithms (DSA) questions — not just solving them, but explaining each one step by step. My goal is to strengthen my fundamentals, build consistency, and grow through community feedback and collaboration.

Whether you're preparing for coding interviews or brushing up your basics, feel free to check out my work and suggest improvements.

GitHub Repository:
https://github.com/mahira-code/Interview-Prep

What I’m including:

Problem statements
Step-by-step logic
Python solutions
Time & space complexity breakdowns
Notebook-style clarity

I’d love to connect with fellow learners, mentors, and tech enthusiasts! Your feedback or collaboration would mean a lot.

Daily practice is helping me stay sharp and focused — and it's one of the key pillars I’m building toward my future goals, including career growth and international tech opportunities.

Let’s keep learning and building — together!

DEV Community: Mahira Banu

🛡️ Building FraudShield: Credit Card Fraud Detection with Imbalanced Data

📊 The Problem

🔍 Dataset Overview

🧠 What are V1–V28?

📈 Exploratory Data Analysis (EDA)

⚠️ Why Accuracy is Misleading

🤖 Model 1 — XGBoost (Supervised Learning)

Key setup:

📊 Results:

🧠 Insight:

🧪 Model 2 — Isolation Forest (Unsupervised)

📊 Results:

🧠 Insight:

⚖️ Model Comparison

🚀 Key takeaway:

🔍 Explainability with SHAP

🖥️ Deployment — Streamlit Dashboard

Features:

🌐 Live Demo & Code

🧠 What I Learned

🚀 What’s Next

👩‍💻 About Me

💬 Final Thoughts

Starting My DSA Prep Journey Join Me!