Temidayo Akindahunsi

Posted on Jun 5

Building a Simple Fraud Detection Baseline in Python

#machinelearning #datascience #ai #python

A practical first step for turning transaction data into useful fraud risk signals

If you work in fintech or financial services, fraud detection eventually becomes more than a data science problem.

It becomes a question of how fast you can spot unusual behaviour, how well you can explain it, and how useful your signal is to the people reviewing cases.

In this article, I want to show a simple way to start building a fraud detection baseline in Python using transaction data. Nothing overly fancy. Just a clean, practical approach that helps you move from raw transactions to risk signals you can actually use.

The goal is not to build the perfect fraud model.

The goal is to build a useful starting point.

What we are trying to solve

Imagine you have a transaction table like this:

customer_id
transaction_time
amount
merchant_category
device_id
location
is_fraud

Your job is to identify suspicious behaviour before it becomes a loss.

A good first step is to create features that describe behaviour over time, such as:

how often a customer transacts
whether amounts are changing suddenly
whether the transaction pattern looks unusual
whether a customer is behaving differently from their own history

That is where the value starts.

Step 1: Load the data

import pandas as pd

df = pd.read_csv("transactions.csv")

df["transaction_time"] = pd.to_datetime(df["transaction_time"])
df = df.sort_values(["customer_id", "transaction_time"]).reset_index(drop=True)

print(df.head())

Before doing anything fancy, make sure your timestamps are in datetime format and your data is sorted correctly.

That matters because most fraud features depend on order.

Step 2: Create basic behavioural features

A transaction on its own does not say much.

The useful question is: how different is this transaction from normal behaviour?

Here are a few simple features you can build.

df["hour"] = df["transaction_time"].dt.hour
df["dayofweek"] = df["transaction_time"].dt.dayofweek

df["tx_count"] = df.groupby("customer_id")["amount"].transform("count")
df["avg_amount"] = df.groupby("customer_id")["amount"].transform("mean")
df["std_amount"] = df.groupby("customer_id")["amount"].transform("std").fillna(0)

df["amount_deviation"] = df["amount"] - df["avg_amount"]
df["amount_zscore"] = df["amount_deviation"] / (df["std_amount"] + 1e-6)

These are simple, but they are often surprisingly useful.

A fraudster may not always spend huge amounts. Sometimes they just behave differently from the customer’s usual pattern.

Step 3: Add rolling window features

This is where things start to get more interesting.

Rolling features help capture recent behaviour. For fraud detection, recency matters.

df["tx_count_24h"] = (
    df.groupby("customer_id")["amount"]
      .transform(lambda x: x.rolling(window=5, min_periods=1).count())
)

df["avg_amount_5"] = (
    df.groupby("customer_id")["amount"]
      .transform(lambda x: x.rolling(window=5, min_periods=1).mean())
)

df["recent_amount_change"] = df["amount"] - df["avg_amount_5"]

In a production setup, you would usually define rolling windows by time rather than by row count, but this is a good simple baseline.

The idea is to compare each transaction with the customer’s recent history.

That usually gives you a much better signal than using the raw amount alone.

Step 4: Encode categorical variables

Fraud data often contains useful categories like merchant type, device, or location.

Machine learning models usually need these converted into numeric form.

categorical_cols = ["merchant_category", "location"]
df_encoded = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

This is a simple one-hot encoding approach. It works well as a baseline.

If your dataset is large, you may later want to try target encoding or frequency encoding, but one-hot encoding is a good place to start.

Step 5: Train a simple model

Let us build a basic classification model.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score

target = "is_fraud"

feature_cols = [
    "amount",
    "hour",
    "dayofweek",
    "tx_count",
    "avg_amount",
    "std_amount",
    "amount_deviation",
    "amount_zscore",
    "tx_count_24h",
    "avg_amount_5",
    "recent_amount_change"
]

feature_cols = [col for col in feature_cols if col in df_encoded.columns]

X = df_encoded[feature_cols]
y = df_encoded[target]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

pred_proba = model.predict_proba(X_test)[:, 1]
pred = (pred_proba >= 0.5).astype(int)

print("ROC AUC:", roc_auc_score(y_test, pred_proba))
print(classification_report(y_test, pred))

I like starting with logistic regression because it is simple, fast, and easy to explain.

That matters in fraud detection.

A model that is easy to explain is often more useful than a complicated one nobody trusts.

Step 6: Tune the threshold

This is one of the most important parts of fraud detection.

A model score is not the same as a business decision.

You may want to review cases only when the risk score is high enough.

threshold = 0.7
df_test = X_test.copy()
df_test["risk_score"] = pred_proba
df_test["decision"] = df_test["risk_score"].apply(
    lambda x: "review" if x >= threshold else "approve"
)

print(df_test[["risk_score", "decision"]].head())

In real operations, threshold selection depends on:

review team capacity
fraud loss tolerance
customer friction
false positive cost
compliance requirements

That means thresholding is not just a technical step. It is a business decision.

Step 7: Check feature importance

If you want to understand what is driving the model, look at the coefficients.

feature_importance = pd.DataFrame({
    "feature": X.columns,
    "coefficient": model.coef_[0]
}).sort_values("coefficient", key=abs, ascending=False)

print(feature_importance)

This gives you a quick view of which signals are pushing the model up or down.

For example, if amount_zscore or recent_amount_change is important, that tells you the model is picking up unusual behaviour rather than just raw transaction size.

That is exactly what you want in fraud work.

What this baseline teaches us

This simple workflow gives you a lot already:

transaction-level risk signals
customer behaviour context
a first-pass model
explainable outputs
a decision threshold

It is not production-grade yet, but it is a strong foundation.

And in many teams, that foundation is what unlocks the next step.

What I would improve next

If I were taking this further, I would add:

time-based rolling windows
device change features
merchant change features
location distance features
class imbalance handling
model calibration
monitoring for drift
analyst feedback loops

I would also test tree-based models like Random Forest, XGBoost, or LightGBM once the feature set is more mature.

But I would always keep the same principle in mind:

build something the business can actually use

Final thoughts

Fraud detection works best when data science is tied to operations.

It is not enough to build a model that looks good in a notebook. The model must help people make better decisions in the real world.

That is why behavioural features, explainability, and threshold design matter so much.

If you are just starting out, do not wait for the perfect system.

Start with a clean baseline.
Add behaviour-based features.
Evaluate carefully.
Then improve from there.

That is how useful fraud systems are built.

DEV Community