DEV Community: Joseph Tobi

Why Full-Stack ML Engineers Are More Valuable Than Pure Data Scientists

Joseph Tobi — Thu, 07 May 2026 06:09:48 +0000

There is a conversation happening in every tech company right now.
A data scientist presents a model. It has 94% accuracy. The AUC-ROC is excellent. The confusion matrix looks clean. Everyone is impressed.
Then someone asks: "How do we use this in our product?"
Silence.
The model lives in a Jupyter notebook. It has never seen real user input. It has no API. It cannot be called from a frontend. It cannot be deployed. It exists purely as a demonstration of what could be — not what is.
This is the gap that costs companies millions of dollars in delayed products and wasted engineering time. And it is the gap that makes full-stack ML engineers the most valuable technical hire in the market right now.
The Myth of the Pure Data Scientist
The traditional data science role was defined by a clear boundary. Data scientists build models. Software engineers deploy them. These are separate disciplines requiring separate people.
This made sense in 2015. It makes much less sense in 2026.
The tools have changed. PyTorch makes model building accessible to software engineers. FastAPI makes serving models accessible to data scientists. Docker makes deployment consistent across both worlds. The boundary between building a model and shipping a product has never been thinner.
Yet most hiring pipelines still recruit as if that boundary is a wall.
What Pure Data Scientists Cannot Do
I want to be precise here because this is not an attack on data scientists. Many are exceptional at what they do. The limitation is not skill — it is scope.
A pure data scientist typically cannot:
Build a production API. Training a model in a notebook and serving it to real users via a REST endpoint are completely different skills. FastAPI, request validation, error handling, response formatting — these are engineering concerns that most data science curricula never cover.
Handle preprocessing consistency. This is the silent killer of ML products. A model trained on standardized features must receive standardized features at inference time — using the exact same scaler fitted on training data. Pure data scientists often understand this conceptually but struggle to implement it reliably in a production codebase.
Build the interface users interact with. A fraud detection model is useless without a dashboard showing fraud alerts. A house price estimator is useless without a form users can fill in. The last mile between model and user requires frontend engineering skills that pure data science roles never develop.
Debug production failures. When a model returns unexpected predictions in production, the bug could be in the model, the preprocessing pipeline, the API layer, or the frontend. A data scientist can only debug one of these four places.
What Full-Stack ML Engineers Can Do
A full-stack ML engineer closes every one of these gaps.
They train the model. They save the model weights and preprocessing artifacts. They build the FastAPI inference endpoint. They containerize everything with Docker. They deploy the backend to a cloud platform. They build the React frontend that users interact with. And when something breaks in production they can trace the failure from the user interface all the way back to the model weights.
This is not a theoretical advantage. It is a direct business advantage.
A full-stack ML engineer ships a complete AI feature in the time it takes a traditional team to finish the handoff meeting between data science and engineering.
The Handoff Problem
In organizations that separate data science from engineering, every ML project has a handoff problem.
The data scientist finishes the model and hands it to the engineering team. The engineering team rebuilds the preprocessing pipeline from scratch because the data scientist wrote it in notebook code that cannot run in production. The preprocessing is slightly different. The model underperforms. Debugging takes weeks. Nobody knows whose fault it is.
I have spoken to engineers at multiple companies who have lived this exact experience. The handoff is where ML projects go to die.
A full-stack ML engineer eliminates the handoff entirely. The person who trained the model is the person who deploys it. Preprocessing consistency is guaranteed because there is only one person and one codebase.
The Career Argument
Beyond organizational value, full-stack ML engineering is a stronger career position than pure data science for one simple reason.
It is harder to replace.
A pure data scientist who builds models in notebooks can be replaced by AutoML tools, foundation models, and increasingly capable AI assistants. The model building step — the part that used to require years of expertise — is becoming commoditized.
But the engineer who understands how to integrate a model into a real product, ensure preprocessing consistency, serve predictions at low latency, monitor model drift in production, and build the interface users actually interact with — that person is not being replaced by any tool available today.
Full-stack ML engineers operate at the intersection of two disciplines. Replacing them requires replacing two people. Companies rarely have the budget or patience for that.
What This Means For You
If you are a data scientist, learn to deploy. Pick up FastAPI. Understand Docker. Build one complete end-to-end project — model to API to frontend. Put it on GitHub with a live demo link. You will immediately separate yourself from 90% of data science candidates who have only ever submitted Kaggle notebooks.
If you are a software engineer, learn ML fundamentals. Understand how models are trained and evaluated. Learn PyTorch or scikit-learn. Build one ML-powered feature in a real application. You will immediately become relevant to every company investing in AI — which at this point is every company.
If you are starting from scratch, skip the specialization entirely. Build both skills simultaneously. The combination is rarer and more valuable than either skill alone.
The Honest Caveat
I want to be clear about what full-stack ML engineering is not.
It is not being the best data scientist in the room. Research scientists at top AI labs have depth of expertise that full-stack ML engineers cannot match. If your goal is to publish papers and advance the frontier of machine learning, specialize deeply in ML research.
It is not being the best software engineer in the room either. Senior engineers with ten years of systems programming experience will outperform a full-stack ML engineer on pure engineering tasks.
Full-stack ML engineering is being the person who can ship AI products. That is a different goal from being the best at any single discipline. It is also currently the most in-demand goal in the industry.
Conclusion
The most valuable technical hire at an AI-focused company in 2026 is not the person who builds the best model. It is the person who ships the complete product.
Data science produced brilliant models that lived in notebooks. ML engineering ships those models to users. Full-stack ML engineering does both — and eliminates every bottleneck in between.
The boundary between data science and software engineering is not a wall. It is an opportunity.
The engineers who cross it are the ones building the products everyone else is still planning.
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software for African markets.

Handling Class Imbalance in Fraud Detection with scikit-learn

Joseph Tobi — Thu, 07 May 2026 06:08:06 +0000

Handling Class Imbalance in Fraud Detection with scikit-learn
Every fraud detection tutorial I've seen makes the same mistake. They train a model, print the accuracy score — 99.8% — and declare success.
That model is useless.
In a dataset where 0.17% of transactions are fraudulent, a model that predicts "legitimate" for every single transaction achieves 99.83% accuracy. It has never detected a single fraud case in its life.
This is the class imbalance problem and it's the most important thing to understand before building any fraud detection system.
In this tutorial I'll show you exactly how to handle it correctly using scikit-learn. By the end you'll have a working fraud detection pipeline that actually catches fraud.
Prerequisites
Python 3.8+
Basic understanding of classification
pip installed
The Dataset
We'll use the Credit Card Fraud Detection dataset from Kaggle. It contains 284,807 transactions with only 492 fraud cases — a fraud rate of 0.17%. This is a real-world class imbalance problem.
Download it from Kaggle and save it as creditcard.csv.
Step 1 — Explore the Data First
Never start modeling without understanding your data.
import pandas as pd
import numpy as np

df = pd.read_csv("creditcard.csv")

Always check this first

print(f"Dataset shape: {df.shape}")
print(f"\nClass distribution:")
print(df["Class"].value_counts())
print(f"\nFraud rate: {df['Class'].mean():.4%}")
print(f"\nMissing values: {df.isnull().sum().sum()}")
Output:
Dataset shape: (284807, 31)

Class distribution:
0 284315
1 492

Fraud rate: 0.1727%

Missing values: 0
This tells us everything we need to know. 492 fraud cases against 284,315 legitimate transactions. This is severe class imbalance.
Step 2 — Why Accuracy Is the Wrong Metric
Before we build anything, let's prove why accuracy is meaningless here.
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X = df.drop("Class", axis=1)
y = df["Class"]

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)

A model that predicts majority class every time

dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(X_train, y_train)
y_pred = dummy.predict(X_test)

print(f"Dummy model accuracy: {accuracy_score(y_test, y_pred):.4%}")
Output:
Dummy model accuracy: 99.8274%
A model that has learned absolutely nothing achieves 99.83% accuracy. This is why you must never use accuracy as your primary metric for imbalanced classification.
Step 3 — Use the Right Metrics
The correct metrics for fraud detection are:
from sklearn.metrics import (
classification_report,
roc_auc_score,
confusion_matrix,
precision_score,
recall_score,
f1_score
)

def evaluate_model(model, X_test, y_test, model_name):
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

print(f"\n{'='*50}")
print(f"Model: {model_name}")
print(f"{'='*50}")
print(f"\nAUC-ROC:   {roc_auc_score(y_test, y_prob):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall:    {recall_score(y_test, y_pred):.4f}")
print(f"F1 Score:  {f1_score(y_test, y_pred):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred,
      target_names=["Legitimate", "Fraud"]))
print(f"\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Here is what each metric means in fraud context:
AUC-ROC — measures how well the model separates fraud from legitimate transactions across all thresholds. 1.0 is perfect, 0.5 is random guessing. This is your primary metric.
Recall — of all actual fraud cases, how many did we catch? Missing real fraud is the most costly mistake. Prioritize this.
Precision — of all predicted fraud cases, how many were real? Low precision means too many false alarms blocking legitimate customers.
F1 Score — harmonic mean of precision and recall. Good overall measure when you need to balance both.
Step 4 — Preprocess the Data
from sklearn.preprocessing import StandardScaler

X = df.drop("Class", axis=1)
y = df["Class"]

Stratify ensures both splits maintain

the same fraud ratio

X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42,
stratify=y # Critical for imbalanced data
)

Scale features

Fit only on training data — never on test data

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set fraud rate: {y_train.mean():.4%}")
print(f"Test set fraud rate: {y_test.mean():.4%}")
Output:
Training set fraud rate: 0.1727%
Test set fraud rate: 0.1727%
Stratify ensures both splits have the same fraud rate. Without it you might accidentally create a test set with no fraud cases at all.
Step 5 — Approach 1: Class Weights
The simplest approach. Tell the model to penalize misclassifying fraud cases more heavily.
from sklearn.linear_model import LogisticRegression

Without class weights — baseline

lr_baseline = LogisticRegression(
random_state=42,
max_iter=1000
)
lr_baseline.fit(X_train_scaled, y_train)
evaluate_model(lr_baseline, X_test_scaled,
y_test, "Logistic Regression (No Weights)")

With class weights — handles imbalance

lr_weighted = LogisticRegression(
class_weight="balanced", # This is the key change
random_state=42,
max_iter=1000
)
lr_weighted.fit(X_train_scaled, y_train)
evaluate_model(lr_weighted, X_test_scaled,
y_test, "Logistic Regression (Balanced)")
class_weight="balanced" automatically calculates weights inversely proportional to class frequencies. Fraud cases get much higher weight so misclassifying them costs more.
Step 6 — Approach 2: Random Forest with Class Weights
Tree-based models handle imbalance better than linear models and support class weighting too.
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(
n_estimators=100,
class_weight="balanced",
random_state=42,
n_jobs=-1 # Use all CPU cores
)
rf.fit(X_train_scaled, y_train)
evaluate_model(rf, X_test_scaled,
y_test, "Random Forest (Balanced)")
Random Forest typically outperforms Logistic Regression on fraud detection because fraud patterns are highly nonlinear.
Step 7 — Approach 3: SMOTE Oversampling
SMOTE (Synthetic Minority Oversampling Technique) creates synthetic fraud samples to balance the dataset.
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier

Install: pip install imbalanced-learn

smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(
X_train_scaled, y_train
)

print(f"Before SMOTE: {y_train.value_counts().to_dict()}")
print(f"After SMOTE: {pd.Series(y_train_resampled).value_counts().to_dict()}")

rf_smote = RandomForestClassifier(
n_estimators=100,
random_state=42,
n_jobs=-1
)
rf_smote.fit(X_train_resampled, y_train_resampled)
evaluate_model(rf_smote, X_test_scaled,
y_test, "Random Forest + SMOTE")
Important — apply SMOTE only to training data, never to test data. You want to evaluate on real distribution, not synthetic data.
Step 8 — Tune the Classification Threshold
By default scikit-learn uses 0.5 as the fraud threshold. This is almost never optimal for imbalanced problems.
import numpy as np
from sklearn.metrics import precision_recall_curve

y_prob = rf.predict_proba(X_test_scaled)[:, 1]
precisions, recalls, thresholds = precision_recall_curve(
y_test, y_prob
)

Find threshold that maximizes F1

f1_scores = 2 * (precisions * recalls) / (precisions + recalls + 1e-8)
best_threshold = thresholds[np.argmax(f1_scores)]

print(f"Default threshold (0.5) results:")
y_pred_default = (y_prob >= 0.5).astype(int)
print(f"Recall: {recall_score(y_test, y_pred_default):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_default):.4f}")

print(f"\nOptimal threshold ({best_threshold:.3f}) results:")
y_pred_optimal = (y_prob >= best_threshold).astype(int)
print(f"Recall: {recall_score(y_test, y_pred_optimal):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_optimal):.4f}")
In fraud detection you usually want to lower the threshold to catch more fraud at the cost of more false alarms. The right threshold depends on the business cost of each error type.
Step 9 — Feature Importance
Understanding which features drive fraud predictions helps you build better models and explain decisions to stakeholders.
import pandas as pd
import matplotlib.pyplot as plt

feature_importance = pd.DataFrame({
"feature": X.columns,
"importance": rf.feature_importances_
}).sort_values("importance", ascending=False)

print("Top 10 most important features:")
print(feature_importance.head(10))
Step 10 — Save the Model for Production
import joblib

Save model and scaler

joblib.dump(rf, "fraud_model.pkl")
joblib.dump(scaler, "scaler.pkl")
joblib.dump(best_threshold, "threshold.pkl")

print("Model, scaler and threshold saved")
Save the threshold too — you'll need it when serving predictions in production to apply the same optimal cutoff.
Summary — What To Always Do
Here's your checklist for any imbalanced classification problem:
Never use accuracy alone — use AUC-ROC, Recall, F1.
Always stratify your splits — use stratify=y in train_test_split.
Always handle class imbalance — at minimum use class_weight="balanced".
Always tune your threshold — 0.5 is almost never optimal.
Always save preprocessing artifacts — scaler, encoder, threshold together with the model.
Conclusion
Class imbalance is not a data problem — it is a modeling problem. The solution is not to collect more data. The solution is to choose the right metrics, handle the imbalance explicitly, and tune your decision threshold for your specific business context.
A fraud detection model is not measured by how often it is right. It is measured by how much fraud it catches and how many legitimate customers it wrongly blocks. Keep that in mind every time you evaluate a model.
The complete code for this tutorial is available on my GitHub at github.com/josephtobimayokun
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software for African markets.

How to Serve a PyTorch Model with FastAPI: A Complete Guide

Joseph Tobi — Thu, 07 May 2026 06:01:44 +0000

How to Serve a PyTorch Model with FastAPI: A Complete Guide
Most machine learning tutorials stop at model training. You get a trained model, a good validation score, and then — nothing. No one tells you how to actually use that model in a real application.
In this tutorial I'll show you exactly how to take a trained PyTorch model and serve it as a REST API using FastAPI. By the end you'll have a working inference endpoint that any frontend or application can call to get predictions.
I built this exact pipeline for my house price estimator project — a PyTorch MLP model served via FastAPI with a React frontend. Everything in this tutorial comes from real production experience.
Prerequisites
Python 3.8+
Basic PyTorch knowledge
Basic understanding of REST APIs
pip installed
What We're Building
Trained PyTorch Model (.pth file)
↓
FastAPI Server
↓
POST /predict endpoint
↓
Returns prediction as JSON
A client sends input features as JSON. FastAPI preprocesses them, runs inference through the model, and returns the prediction. Simple, clean, production-ready.
Step 1 — Train and Save Your Model
First let's define a simple MLP model and save it after training.

model.py

import torch
import torch.nn as nn

class MLP(nn.Module):
def init(self, input_dim, hidden_dim, output_dim):
super().init()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_dim, hidden_dim // 2),
nn.ReLU(),
nn.Linear(hidden_dim // 2, output_dim)
)

def forward(self, x):
    return self.net(x)

After training, save the model weights and scaler:

train.py

import torch
import joblib
from model import MLP

--- your training loop here ---

Save model weights

torch.save(model.state_dict(), "model.pth")

Save scaler — critical for consistent preprocessing

joblib.dump(scaler, "scaler.pkl")

print("Model and scaler saved successfully")
Two things are saved — the model weights and the scaler. Both are required for consistent inference.
Step 2 — Understand Why You Save Both
This is the most important concept in production ML and the one most tutorials skip.
During training you fit a StandardScaler on your training data. This scaler learns the mean and standard deviation of each feature. During inference you must apply the exact same transformation using the exact same statistics.
If you refit the scaler on new data during inference, your features will be scaled differently from how the model was trained. The model receives input it has never seen before and predictions become unreliable.
Always save your fitted scaler. Always load it at inference time. Never refit it.
Step 3 — Install Dependencies
pip install fastapi uvicorn torch joblib numpy scikit-learn
Step 4 — Build the FastAPI Application

main.py

import torch
import joblib
import numpy as np
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from model import MLP

app = FastAPI(title="PyTorch Model API")

Allow frontend applications to call this API

app.add_middleware(
CORSMiddleware,
allow_origins=[""],
allow_methods=[""],
allow_headers=["*"],
)

--- Load model and scaler once on startup ---

Loading inside the predict function would reload

on every request — slow and inefficient

INPUT_DIM = 8
HIDDEN_DIM = 128
OUTPUT_DIM = 1

model = MLP(INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM)
model.load_state_dict(
torch.load("model.pth", map_location="cpu")
)
model.eval() # Disables dropout during inference

scaler = joblib.load("scaler.pkl")

--- Input schema ---

class PredictionInput(BaseModel):
feature1: float
feature2: float
feature3: float
feature4: float
feature5: float
feature6: float
feature7: float
feature8: float

--- Prediction endpoint ---

@app.get("/")
def root():
return {"status": "Model API is running"}

@app.post("/predict")
def predict(data: PredictionInput):
try:
# Build feature array
features = np.array([[
data.feature1,
data.feature2,
data.feature3,
data.feature4,
data.feature5,
data.feature6,
data.feature7,
data.feature8,
]])

    # Preprocess using saved scaler
    features_scaled = scaler.transform(features)

    # Convert to tensor
    tensor = torch.tensor(
        features_scaled,
        dtype=torch.float32
    )

    # Run inference
    with torch.no_grad():
        # torch.no_grad() tells PyTorch not to
        # track gradients — faster and uses less memory
        prediction = model(tensor)

        # If you trained on log(target),
        # reverse the transformation
        result = torch.exp(prediction).item()

    return {
        "prediction": round(result, 2),
        "status": "success"
    }

except Exception as e:
    raise HTTPException(
        status_code=500,
        detail=str(e)
    )

Step 5 — Run the Server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Your API is now running at http://localhost:8000
Visit http://localhost:8000/docs to see the automatic interactive documentation FastAPI generates. You can test your endpoint directly from the browser.
Step 6 — Test Your Endpoint
Using curl:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"feature1": 1500,
"feature2": 2005,
"feature3": 7,
"feature4": 3,
"feature5": 2,
"feature6": 2,
"feature7": 850,
"feature8": 1
}'
Expected response:
{
"prediction": 185432.50,
"status": "success"
}
Step 7 — Connect a React Frontend
In your React component, call the API like this:
const getPrediction = async (formData) => {
const response = await fetch("http://localhost:8000/predict", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(formData)
});

const data = await response.json();
return data.prediction;
};
Step 8 — Deploy to Production
Frontend → Deploy to Vercel (free)
Backend → Deploy to Render.com (free tier available)
On Render, set your start command to:
uvicorn main:app --host 0.0.0.0 --port $PORT
Make sure your model.pth and scaler.pkl files are included in your GitHub repository so Render can access them during deployment.
Update your React frontend to use the Render URL instead of localhost:
const API_URL = "https://your-app.onrender.com/predict";
Key Concepts to Remember
model.eval() — Always call this after loading your model. It switches off dropout and batch normalization layers which behave differently during training versus inference.
torch.no_grad() — Always wrap inference in this context manager. It disables gradient tracking which saves memory and speeds up inference significantly.
Scaler consistency — Save your fitted scaler during training and load the same artifact during inference. Never refit on new data.
Load once on startup — Load your model and scaler at the top of main.py, not inside the predict function. Loading on every request is slow and wasteful.
Conclusion
Serving a PyTorch model with FastAPI follows a consistent pattern regardless of your model architecture or problem type. Train your model, save both the weights and preprocessing artifacts, load them once on server startup, and expose a clean prediction endpoint.
This pattern is what separates ML engineers who build demo notebooks from those who build production systems. The model is only half the job — getting it into a working API that real applications can consume is the other half.
The complete code for this tutorial is available on my GitHub at github.com/josephtobimayokun
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software.