Introduction
Credit card fraud poses a significant threat to the financial industry, leading to billions of dollars in losses every year. To combat this, machine learning models have been developed to detect and prevent fraudulent transactions in real time. In this article, we'll walk through the process of building a real-time credit card fraud detection system using FastAPI, a modern web framework for Python, and a Random Forest classifier trained on the popular Credit Card Fraud Detection Dataset from Kaggle.
Overview of the Project
The goal of this project is to create a web service that predicts the likelihood of a credit card transaction being fraudulent. The service accepts transaction data, preprocesses it, and returns a prediction along with the probability of fraud. This system is designed to be fast, scalable, and easy to integrate into existing financial systems.
Key Components
- Machine Learning Model: A Random Forest classifier trained to distinguish between fraudulent and legitimate transactions.
- Data Preprocessing: Standardization of transaction features to ensure the model performs optimally.
- API: A RESTful API built with FastAPI to handle prediction requests in real time.
Step 1: Preparing the Dataset
The dataset used in this project is the Credit Card Fraud Detection Dataset from Kaggle, which contains 284,807 transactions, of which only 492 are fraudulent. This class imbalance presents a challenge, but it's addressed by oversampling the minority class.
Data Preprocessing
The features are first standardized using a StandardScaler
from scikit-learn
. The dataset is then split into training and testing sets. Given the imbalance, the RandomOverSampler
technique is applied to balance the classes before training the model.
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Balance the dataset
ros = RandomOverSampler(random_state=42)
X_resampled, y_resampled = ros.fit_resample(X_scaled, y)
Step 2: Training the Machine Learning Model
We train a Random Forest classifier, which is well-suited for handling imbalanced datasets and provides robust predictions. The model is trained on the oversampled data, and its performance is evaluated using accuracy, precision, recall, and the AUC-ROC curve.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
# Train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_resampled, y_resampled)
# Evaluate the model
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
print("AUC-ROC:", roc_auc_score(y_test, model.predict_proba(X_test_scaled)[:, 1]))
Step 3: Building the FastAPI Application
With the trained model and scaler saved using joblib
, we move on to building the FastAPI application. FastAPI is chosen for its speed and ease of use, making it ideal for real-time applications.
Creating the API
The FastAPI application defines a POST endpoint /predict/
that accepts transaction data, processes it, and returns the model's prediction and probability.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import pandas as pd
# Load the trained model and scaler
model = joblib.load("random_forest_model.pkl")
scaler = joblib.load("scaler.pkl")
app = FastAPI()
class Transaction(BaseModel):
V1: float
V2: float
# Include all other features used in your model
Amount: float
@app.post("/predict/")
def predict(transaction: Transaction):
try:
data = pd.DataFrame([transaction.dict()])
scaled_data = scaler.transform(data)
prediction = model.predict(scaled_data)
prediction_proba = model.predict_proba(scaled_data)
return {"fraud_prediction": int(prediction[0]), "probability": float(prediction_proba[0][1])}
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
Step 4: Deploying the Application
To test the application locally, you can run the FastAPI server using uvicorn
and send POST requests to the /predict/
endpoint. The service will process incoming requests, scale the data, and return whether the transaction is fraudulent.
Running the API Locally
uvicorn main:app --reload
You can then test the API using curl
or a tool like Postman:
curl -X POST http://127.0.0.1:8000/predict/ \
-H "Content-Type: application/json" \
-d '{"V1": -1.359807134, "V2": -0.072781173, ..., "Amount": 149.62}'
The API will return a JSON object with the fraud prediction and the associated probability.
Conclusion
In this article, we've built a real-time credit card fraud detection system that combines machine learning with a modern web framework. The github link is here. The system is designed to handle real-time transaction data and provide instant predictions, making it a valuable tool for financial institutions looking to combat fraud.
By deploying this model using FastAPI, we ensure that the service is not only fast but also scalable, capable of handling multiple requests concurrently. This project can be further extended with more sophisticated models, improved feature engineering, or integration with a production environment.
Next Steps
To enhance the system further, consider the following:
- Model Improvements: Experiment with more advanced models like XGBoost or neural networks.
- Feature Engineering: Explore additional features that might improve model accuracy.
- Real-World Deployment: Deploy the application on cloud platforms like AWS or GCP for production use.
Top comments (0)