<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mahira Banu</title>
    <description>The latest articles on DEV Community by Mahira Banu (@mahira_banu).</description>
    <link>https://dev.to/mahira_banu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3293587%2F957e1fb7-cef5-44e5-bf03-a1f8ed810c94.jpeg</url>
      <title>DEV Community: Mahira Banu</title>
      <link>https://dev.to/mahira_banu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mahira_banu"/>
    <language>en</language>
    <item>
      <title>🛡️ Building FraudShield: Credit Card Fraud Detection with Imbalanced Data</title>
      <dc:creator>Mahira Banu</dc:creator>
      <pubDate>Tue, 28 Apr 2026 18:23:32 +0000</pubDate>
      <link>https://dev.to/mahira_banu/building-fraudshield-credit-card-fraud-detection-with-imbalanced-data-4628</link>
      <guid>https://dev.to/mahira_banu/building-fraudshield-credit-card-fraud-detection-with-imbalanced-data-4628</guid>
      <description>&lt;p&gt;Fraud detection is one of those problems that looks simple on the surface — classify transactions as “fraud” or “not fraud”. But once you look at real data, it becomes a completely different challenge.&lt;/p&gt;

&lt;p&gt;In this project, I built FraudShield, an end-to-end machine learning system to detect fraudulent credit card transactions using both supervised and unsupervised approaches, along with a live dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 The Problem
&lt;/h2&gt;

&lt;p&gt;The dataset I used contains over 284,000 transactions, but only:&lt;/p&gt;

&lt;p&gt;👉 0.17% are fraud&lt;/p&gt;

&lt;p&gt;This creates a highly imbalanced dataset, where a model can achieve 99% accuracy just by predicting everything as “not fraud”.&lt;/p&gt;

&lt;p&gt;So the real question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do we detect fraud when it’s so rare?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔍 Dataset Overview
&lt;/h2&gt;

&lt;p&gt;The dataset contains real-world credit card transactions made by European cardholders, anonymised using PCA transformation to protect sensitive information. It includes 284,807 transactions, of which only 492 are fraudulent (~0.17%), making it a highly imbalanced classification problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 What are V1–V28?
&lt;/h3&gt;

&lt;p&gt;These are PCA-transformed features.&lt;/p&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The original features are hidden
&lt;/li&gt;
&lt;li&gt;Data is transformed into mathematical components
&lt;/li&gt;
&lt;li&gt;We can’t interpret them directly
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This makes the problem harder — models must learn patterns without human-readable features.&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 Exploratory Data Analysis (EDA)
&lt;/h2&gt;

&lt;p&gt;Some key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dataset is extremely imbalanced&lt;/li&gt;
&lt;li&gt;Most transactions are low value&lt;/li&gt;
&lt;li&gt;Fraud doesn’t follow obvious patterns&lt;/li&gt;
&lt;li&gt;Features are weakly correlated due to PCA transformation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important realization early on:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Accuracy is NOT a useful metric here&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ⚠️ Why Accuracy is Misleading
&lt;/h2&gt;

&lt;p&gt;If a model predicts:&lt;/p&gt;

&lt;p&gt;text All transactions = Normal &lt;/p&gt;

&lt;p&gt;It gets:&lt;/p&gt;

&lt;p&gt;👉 99.8% accuracy&lt;/p&gt;

&lt;p&gt;…but detects zero fraud&lt;/p&gt;

&lt;p&gt;So instead, I focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Precision&lt;/li&gt;
&lt;li&gt;Recall&lt;/li&gt;
&lt;li&gt;F1 Score&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🤖 Model 1 — XGBoost (Supervised Learning)
&lt;/h2&gt;

&lt;p&gt;I trained an XGBoost classifier, which is well-suited for tabular data and imbalanced problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key setup:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;scale_pos_weight to handle imbalance
&lt;/li&gt;
&lt;li&gt;Stratified train/test split
&lt;/li&gt;
&lt;li&gt;Feature scaling
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📊 Results:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Precision: 0.71
&lt;/li&gt;
&lt;li&gt;Recall: 0.87 🔥
&lt;/li&gt;
&lt;li&gt;F1 Score: 0.78
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 Insight:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The model successfully detects 87% of fraud cases, which is critical in real-world systems.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧪 Model 2 — Isolation Forest (Unsupervised)
&lt;/h2&gt;

&lt;p&gt;To compare approaches, I also used Isolation Forest, an anomaly detection model.&lt;/p&gt;

&lt;h3&gt;
  
  
  📊 Results:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Precision: 0.29
&lt;/li&gt;
&lt;li&gt;Recall: 0.30
&lt;/li&gt;
&lt;li&gt;F1 Score: 0.30
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 Insight:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Unsupervised models struggle to detect subtle fraud patterns without labelled data.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ⚖️ Model Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Recall&lt;/th&gt;
&lt;th&gt;F1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;XGBoost&lt;/td&gt;
&lt;td&gt;0.71&lt;/td&gt;
&lt;td&gt;0.87&lt;/td&gt;
&lt;td&gt;0.78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Isolation Forest&lt;/td&gt;
&lt;td&gt;0.29&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  🚀 Key takeaway:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Supervised learning significantly outperforms unsupervised anomaly detection when labelled data is available.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔍 Explainability with SHAP
&lt;/h2&gt;

&lt;p&gt;To understand how the model makes decisions, I used SHAP (SHapley Additive exPlanations).&lt;/p&gt;

&lt;p&gt;This helps answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which features influence predictions?&lt;/li&gt;
&lt;li&gt;Why was a transaction classified as fraud?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This adds transparency and trust to the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  🖥️ Deployment — Streamlit Dashboard
&lt;/h2&gt;

&lt;p&gt;To make the system usable, I built a Streamlit dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Input transaction data
&lt;/li&gt;
&lt;li&gt;Predict fraud probability
&lt;/li&gt;
&lt;li&gt;Display risk level
&lt;/li&gt;
&lt;li&gt;Show model metrics
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🌐 Live Demo &amp;amp; Code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;💻 GitHub: &lt;a href="https://github.com/mahira-code/fraudshield-ml" rel="noopener noreferrer"&gt;https://github.com/mahira-code/fraudshield-ml&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌍 Live Demo: &lt;a href="https://fraudshield-ml-mahira.streamlit.app/" rel="noopener noreferrer"&gt;https://fraudshield-ml-mahira.streamlit.app/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 What I Learned
&lt;/h2&gt;

&lt;p&gt;This project taught me a lot about real-world machine learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling imbalanced datasets&lt;/li&gt;
&lt;li&gt;Choosing the right evaluation metrics&lt;/li&gt;
&lt;li&gt;Comparing supervised vs unsupervised models&lt;/li&gt;
&lt;li&gt;Using SHAP for explainability&lt;/li&gt;
&lt;li&gt;Building and deploying end-to-end ML systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 What’s Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hyperparameter tuning
&lt;/li&gt;
&lt;li&gt;Model monitoring (drift detection)
&lt;/li&gt;
&lt;li&gt;API deployment (FastAPI)
&lt;/li&gt;
&lt;li&gt;MLOps integration
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  👩‍💻 About Me
&lt;/h2&gt;

&lt;p&gt;I’m Mahira Banu, a Data Scientist and AI Engineer focused on building practical, real-world AI systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌐 Portfolio: &lt;a href="https://mahirabanu.website" rel="noopener noreferrer"&gt;https://mahirabanu.website&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 GitHub: &lt;a href="https://github.com/mahira-code" rel="noopener noreferrer"&gt;https://github.com/mahira-code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔗 LinkedIn: &lt;a href="https://www.linkedin.com/in/mahira-banu" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/mahira-banu&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💬 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Fraud detection isn’t just about building a model — it’s about understanding data, handling imbalance, and making reliable decisions in high-risk scenarios.&lt;/p&gt;

&lt;p&gt;If you’re working on similar problems, I’d love to hear your thougts&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>security</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Starting My DSA Prep Journey Join Me!</title>
      <dc:creator>Mahira Banu</dc:creator>
      <pubDate>Sat, 28 Jun 2025 16:43:23 +0000</pubDate>
      <link>https://dev.to/mahira_banu/starting-my-dsa-prep-journey-join-me-1pp5</link>
      <guid>https://dev.to/mahira_banu/starting-my-dsa-prep-journey-join-me-1pp5</guid>
      <description>&lt;p&gt;Hi everyone! &lt;/p&gt;

&lt;p&gt;I'm excited to share that I've started working on beginner-level Data Structures &amp;amp; Algorithms (DSA) questions — not just solving them, but explaining each one step by step. My goal is to strengthen my fundamentals, build consistency, and grow through community feedback and collaboration.&lt;/p&gt;

&lt;p&gt;Whether you're preparing for coding interviews or brushing up your basics, feel free to check out my work and suggest improvements.&lt;/p&gt;

&lt;p&gt;GitHub Repository:&lt;br&gt;
&lt;a href="https://github.com/mahira-code/Interview-Prep" rel="noopener noreferrer"&gt;https://github.com/mahira-code/Interview-Prep&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I’m including:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem statements&lt;/li&gt;
&lt;li&gt;Step-by-step logic&lt;/li&gt;
&lt;li&gt;Python solutions&lt;/li&gt;
&lt;li&gt;Time &amp;amp; space complexity breakdowns&lt;/li&gt;
&lt;li&gt;Notebook-style clarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d love to connect with fellow learners, mentors, and tech enthusiasts! Your feedback or collaboration would mean a lot. &lt;/p&gt;

&lt;p&gt;Daily practice is helping me stay sharp and focused — and it's one of the key pillars I’m building toward my future goals, including career growth and international tech opportunities. &lt;/p&gt;

&lt;p&gt;Let’s keep learning and building — together! &lt;/p&gt;

</description>
      <category>dsa</category>
      <category>100daysofcode</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
