DEV Community

Hanan
Hanan

Posted on

How I Found 1,370 Fraudsters Hiding in Our Data (And Saved My Company $51,000)

Tuesday, 9:14 AM

My manager's message was short: "We need to talk about fraud losses. My office."

That's how it started—three CSV files, millions of transactions, and a sinking feeling that somewhere in those rows, fraudsters were stealing from us while we watched.

I didn't know then that by Friday, I'd discover something so obvious we'd kick ourselves for not seeing it sooner.

The First Clue: When Numbers Tell a Story

Opening the data felt like looking at two different worlds. Our credit card transactions showed fraud in just 0.5% of cases—tiny red dots in a sea of green. But our e-commerce platform? Nearly 1 in 3 transactions were fraudulent.

I remember thinking: "How are we even still in business?"

That's when I built my first visualization—side-by-side bars showing the stark difference. Seeing it visually made the problem real. It wasn't just numbers anymore; it was a pattern screaming for attention.

The Breakthrough: The 1-Hour Rule

It started as a hunch. "What if fraudsters work fast?"

I created a simple calculation: hours between account creation and first purchase. When I plotted it, my coffee went cold.

There it was—a massive spike at the beginning. Transactions within the first hour had a 99.5% fraud rate. Six thousand six hundred eighty-five cases of "sign up, steal, disappear."

The visualization looked like a mountain with the peak shoved all the way to the left. It was so clear, so obvious. How had we missed this?

Building the Fraud Catchers

For credit cards, I chose XGBoost—it's like training a team of detectives who get smarter together. The results surprised me: 76 fraudsters caught, only 15 false alarms. A precision machine.

For e-commerce, I went simpler with Logistic Regression. Why? Because when we flag Grandma's Christmas purchase, we need to explain why. The trade-off: slightly fewer catches (1,370 vs potential 1,409) for much better explainability.

My model comparison chart told the story—different problems need different tools.

The Most Fascinating Part: Asking "Why?"

Using SHAP felt like putting on X-ray glasses. Suddenly I could see what the model was thinking.

The top predictors weren't what I expected. Some anonymized "V4" feature mattered most, followed by our custom anomaly score. The model was finding patterns in places I hadn't even looked.

But the real magic was in the individual cases. Looking at a force plot for a caught $257 fraud, I could trace exactly why—the timing, the weird V14 value, the new account. It wasn't magic; it was math we could explain.

From Insights to Action: Three Changes We're Making

1. The 1-Hour Checkpoint

Starting Monday, any purchase within an hour of signup gets a gentle extra verification step. Not a block—just a "Hey, confirm this is you?" Bas

The Most Fascinating Part: Asking "Why?"

Using SHAP felt like putting on X-ray glasses. Suddenly I could see what the model was thinking.

The top predictors weren't what I expected. Some anonymized "V4" feature mattered most, followed by our custom anomaly score. The model was finding patterns in places I hadn't even looked.

But the real magic was in the individual cases. Looking at a force plot for a caught $257 fraud, I could trace exactly why—the timing, the weird V14 value, the new account. It wasn't magic; it was math we could explain.

From Insights to Action: Three Changes We're Making

1. The 1-Hour Checkpoint

Starting Monday, any purchase within an hour of signup gets a gentle extra verification step. Not a block—just a "Hey, confirm this is you?" Based on our data, this alone could stop thousands of fraudulent attempts.

2. Smarter Geography

We found countries with shockingly high fraud rates (looking at you, Turkmenistan at 100%!). But we're not blocking nations—we're adding intelligent scrutiny. Legitimate customers get through; fraudsters hit roadblocks.

3. Dynamic Decisions

Our confusion matrices showed we need different approaches. Credit cards? Be super sure. E-commerce? Catch more, explain better. It's not one-size-fits-all.

The Business Impact (Or: How I Justified My Salary)

Let's talk numbers:

  • Test data impact: $51,000 saved
  • Monthly projection: $200,000+
  • Annual potential: Millions

But more than money? Trust. We can now tell customers exactly why their transaction was flagged. No more "the system says so" black boxes.

The financial impact visualization made my case to management in 10 seconds flat.

What I Wish I Knew Then

  1. Simple beats complex: The 1-hour rule required no machine learning to discover
  2. Explainability matters: Logistic Regression won for e-commerce because we could defend it
  3. Fraudsters adapt: Today's patterns are tomorrow's history

The Big Realization

The most valuable insight wasn't in the fancy algorithms. It was in asking a simple question: "What happens right after someone signs up?"

Sometimes the most powerful data science is asking obvious questions and having the courage to believe the answers, even when they seem too simple to be true.

Want to see how we did it? The code, the struggles, and the celebrations are all here: https://github.com/hann2004/fraud-detection.git

Question for you: What's the most surprising pattern you've found in your data?


Coffee consumption during this project: 47 cups ☕

Regrets: Zero

Top comments (0)