A practical first step for turning transaction data into useful fraud risk signals
If you work in fintech or financial services, fraud detection eventually becomes more than a data science problem.
It becomes a question of how fast you can spot unusual behaviour, how well you can explain it, and how useful your signal is to the people reviewing cases.
In this article, I want to show a simple way to start building a fraud detection baseline in Python using transaction data. Nothing overly fancy. Just a clean, practical approach that helps you move from raw transactions to risk signals you can actually use.
The goal is not to build the perfect fraud model.
The goal is to build a useful starting point.
What we are trying to solve
Imagine you have a transaction table like this:
customer_idtransaction_timeamountmerchant_categorydevice_idlocationis_fraud
Your job is to identify suspicious behaviour before it becomes a loss.
A good first step is to create features that describe behaviour over time, such as:
- how often a customer transacts
- whether amounts are changing suddenly
- whether the transaction pattern looks unusual
- whether a customer is behaving differently from their own history
That is where the value starts.
Step 1: Load the data
import pandas as pd
df = pd.read_csv("transactions.csv")
df["transaction_time"] = pd.to_datetime(df["transaction_time"])
df = df.sort_values(["customer_id", "transaction_time"]).reset_index(drop=True)
print(df.head())
Before doing anything fancy, make sure your timestamps are in datetime format and your data is sorted correctly.
That matters because most fraud features depend on order.
Step 2: Create basic behavioural features
A transaction on its own does not say much.
The useful question is: how different is this transaction from normal behaviour?
Here are a few simple features you can build.
df["hour"] = df["transaction_time"].dt.hour
df["dayofweek"] = df["transaction_time"].dt.dayofweek
df["tx_count"] = df.groupby("customer_id")["amount"].transform("count")
df["avg_amount"] = df.groupby("customer_id")["amount"].transform("mean")
df["std_amount"] = df.groupby("customer_id")["amount"].transform("std").fillna(0)
df["amount_deviation"] = df["amount"] - df["avg_amount"]
df["amount_zscore"] = df["amount_deviation"] / (df["std_amount"] + 1e-6)
These are simple, but they are often surprisingly useful.
A fraudster may not always spend huge amounts. Sometimes they just behave differently from the customer’s usual pattern.
Step 3: Add rolling window features
This is where things start to get more interesting.
Rolling features help capture recent behaviour. For fraud detection, recency matters.
df["tx_count_24h"] = (
df.groupby("customer_id")["amount"]
.transform(lambda x: x.rolling(window=5, min_periods=1).count())
)
df["avg_amount_5"] = (
df.groupby("customer_id")["amount"]
.transform(lambda x: x.rolling(window=5, min_periods=1).mean())
)
df["recent_amount_change"] = df["amount"] - df["avg_amount_5"]
In a production setup, you would usually define rolling windows by time rather than by row count, but this is a good simple baseline.
The idea is to compare each transaction with the customer’s recent history.
That usually gives you a much better signal than using the raw amount alone.
Step 4: Encode categorical variables
Fraud data often contains useful categories like merchant type, device, or location.
Machine learning models usually need these converted into numeric form.
categorical_cols = ["merchant_category", "location"]
df_encoded = pd.get_dummies(df, columns=categorical_cols, drop_first=True)
This is a simple one-hot encoding approach. It works well as a baseline.
If your dataset is large, you may later want to try target encoding or frequency encoding, but one-hot encoding is a good place to start.
Step 5: Train a simple model
Let us build a basic classification model.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
target = "is_fraud"
feature_cols = [
"amount",
"hour",
"dayofweek",
"tx_count",
"avg_amount",
"std_amount",
"amount_deviation",
"amount_zscore",
"tx_count_24h",
"avg_amount_5",
"recent_amount_change"
]
feature_cols = [col for col in feature_cols if col in df_encoded.columns]
X = df_encoded[feature_cols]
y = df_encoded[target]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
pred_proba = model.predict_proba(X_test)[:, 1]
pred = (pred_proba >= 0.5).astype(int)
print("ROC AUC:", roc_auc_score(y_test, pred_proba))
print(classification_report(y_test, pred))
I like starting with logistic regression because it is simple, fast, and easy to explain.
That matters in fraud detection.
A model that is easy to explain is often more useful than a complicated one nobody trusts.
Step 6: Tune the threshold
This is one of the most important parts of fraud detection.
A model score is not the same as a business decision.
You may want to review cases only when the risk score is high enough.
threshold = 0.7
df_test = X_test.copy()
df_test["risk_score"] = pred_proba
df_test["decision"] = df_test["risk_score"].apply(
lambda x: "review" if x >= threshold else "approve"
)
print(df_test[["risk_score", "decision"]].head())
In real operations, threshold selection depends on:
- review team capacity
- fraud loss tolerance
- customer friction
- false positive cost
- compliance requirements
That means thresholding is not just a technical step. It is a business decision.
Step 7: Check feature importance
If you want to understand what is driving the model, look at the coefficients.
feature_importance = pd.DataFrame({
"feature": X.columns,
"coefficient": model.coef_[0]
}).sort_values("coefficient", key=abs, ascending=False)
print(feature_importance)
This gives you a quick view of which signals are pushing the model up or down.
For example, if amount_zscore or recent_amount_change is important, that tells you the model is picking up unusual behaviour rather than just raw transaction size.
That is exactly what you want in fraud work.
What this baseline teaches us
This simple workflow gives you a lot already:
- transaction-level risk signals
- customer behaviour context
- a first-pass model
- explainable outputs
- a decision threshold
It is not production-grade yet, but it is a strong foundation.
And in many teams, that foundation is what unlocks the next step.
What I would improve next
If I were taking this further, I would add:
- time-based rolling windows
- device change features
- merchant change features
- location distance features
- class imbalance handling
- model calibration
- monitoring for drift
- analyst feedback loops
I would also test tree-based models like Random Forest, XGBoost, or LightGBM once the feature set is more mature.
But I would always keep the same principle in mind:
build something the business can actually use
Final thoughts
Fraud detection works best when data science is tied to operations.
It is not enough to build a model that looks good in a notebook. The model must help people make better decisions in the real world.
That is why behavioural features, explainability, and threshold design matter so much.
If you are just starting out, do not wait for the perfect system.
- Start with a clean baseline.
- Add behaviour-based features.
- Evaluate carefully.
- Then improve from there.
That is how useful fraud systems are built.
Top comments (0)