<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joseph Tobi</title>
    <description>The latest articles on DEV Community by Joseph Tobi (@joseph_tobi_b7ccf5406909f).</description>
    <link>https://dev.to/joseph_tobi_b7ccf5406909f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2707303%2F8fd1abf1-57b3-479a-8be2-1f7ae0a4b435.jpg</url>
      <title>DEV Community: Joseph Tobi</title>
      <link>https://dev.to/joseph_tobi_b7ccf5406909f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joseph_tobi_b7ccf5406909f"/>
    <language>en</language>
    <item>
      <title>Why Full-Stack ML Engineers Are More Valuable Than Pure Data Scientists</title>
      <dc:creator>Joseph Tobi</dc:creator>
      <pubDate>Thu, 07 May 2026 06:09:48 +0000</pubDate>
      <link>https://dev.to/joseph_tobi_b7ccf5406909f/why-full-stack-ml-engineers-are-more-valuable-than-pure-data-scientists-5h53</link>
      <guid>https://dev.to/joseph_tobi_b7ccf5406909f/why-full-stack-ml-engineers-are-more-valuable-than-pure-data-scientists-5h53</guid>
      <description>&lt;p&gt;There is a conversation happening in every tech company right now.&lt;br&gt;
A data scientist presents a model. It has 94% accuracy. The AUC-ROC is excellent. The confusion matrix looks clean. Everyone is impressed.&lt;br&gt;
Then someone asks: "How do we use this in our product?"&lt;br&gt;
Silence.&lt;br&gt;
The model lives in a Jupyter notebook. It has never seen real user input. It has no API. It cannot be called from a frontend. It cannot be deployed. It exists purely as a demonstration of what could be — not what is.&lt;br&gt;
This is the gap that costs companies millions of dollars in delayed products and wasted engineering time. And it is the gap that makes full-stack ML engineers the most valuable technical hire in the market right now.&lt;br&gt;
The Myth of the Pure Data Scientist&lt;br&gt;
The traditional data science role was defined by a clear boundary. Data scientists build models. Software engineers deploy them. These are separate disciplines requiring separate people.&lt;br&gt;
This made sense in 2015. It makes much less sense in 2026.&lt;br&gt;
The tools have changed. PyTorch makes model building accessible to software engineers. FastAPI makes serving models accessible to data scientists. Docker makes deployment consistent across both worlds. The boundary between building a model and shipping a product has never been thinner.&lt;br&gt;
Yet most hiring pipelines still recruit as if that boundary is a wall.&lt;br&gt;
What Pure Data Scientists Cannot Do&lt;br&gt;
I want to be precise here because this is not an attack on data scientists. Many are exceptional at what they do. The limitation is not skill — it is scope.&lt;br&gt;
A pure data scientist typically cannot:&lt;br&gt;
Build a production API. Training a model in a notebook and serving it to real users via a REST endpoint are completely different skills. FastAPI, request validation, error handling, response formatting — these are engineering concerns that most data science curricula never cover.&lt;br&gt;
Handle preprocessing consistency. This is the silent killer of ML products. A model trained on standardized features must receive standardized features at inference time — using the exact same scaler fitted on training data. Pure data scientists often understand this conceptually but struggle to implement it reliably in a production codebase.&lt;br&gt;
Build the interface users interact with. A fraud detection model is useless without a dashboard showing fraud alerts. A house price estimator is useless without a form users can fill in. The last mile between model and user requires frontend engineering skills that pure data science roles never develop.&lt;br&gt;
Debug production failures. When a model returns unexpected predictions in production, the bug could be in the model, the preprocessing pipeline, the API layer, or the frontend. A data scientist can only debug one of these four places.&lt;br&gt;
What Full-Stack ML Engineers Can Do&lt;br&gt;
A full-stack ML engineer closes every one of these gaps.&lt;br&gt;
They train the model. They save the model weights and preprocessing artifacts. They build the FastAPI inference endpoint. They containerize everything with Docker. They deploy the backend to a cloud platform. They build the React frontend that users interact with. And when something breaks in production they can trace the failure from the user interface all the way back to the model weights.&lt;br&gt;
This is not a theoretical advantage. It is a direct business advantage.&lt;br&gt;
A full-stack ML engineer ships a complete AI feature in the time it takes a traditional team to finish the handoff meeting between data science and engineering.&lt;br&gt;
The Handoff Problem&lt;br&gt;
In organizations that separate data science from engineering, every ML project has a handoff problem.&lt;br&gt;
The data scientist finishes the model and hands it to the engineering team. The engineering team rebuilds the preprocessing pipeline from scratch because the data scientist wrote it in notebook code that cannot run in production. The preprocessing is slightly different. The model underperforms. Debugging takes weeks. Nobody knows whose fault it is.&lt;br&gt;
I have spoken to engineers at multiple companies who have lived this exact experience. The handoff is where ML projects go to die.&lt;br&gt;
A full-stack ML engineer eliminates the handoff entirely. The person who trained the model is the person who deploys it. Preprocessing consistency is guaranteed because there is only one person and one codebase.&lt;br&gt;
The Career Argument&lt;br&gt;
Beyond organizational value, full-stack ML engineering is a stronger career position than pure data science for one simple reason.&lt;br&gt;
It is harder to replace.&lt;br&gt;
A pure data scientist who builds models in notebooks can be replaced by AutoML tools, foundation models, and increasingly capable AI assistants. The model building step — the part that used to require years of expertise — is becoming commoditized.&lt;br&gt;
But the engineer who understands how to integrate a model into a real product, ensure preprocessing consistency, serve predictions at low latency, monitor model drift in production, and build the interface users actually interact with — that person is not being replaced by any tool available today.&lt;br&gt;
Full-stack ML engineers operate at the intersection of two disciplines. Replacing them requires replacing two people. Companies rarely have the budget or patience for that.&lt;br&gt;
What This Means For You&lt;br&gt;
If you are a data scientist, learn to deploy. Pick up FastAPI. Understand Docker. Build one complete end-to-end project — model to API to frontend. Put it on GitHub with a live demo link. You will immediately separate yourself from 90% of data science candidates who have only ever submitted Kaggle notebooks.&lt;br&gt;
If you are a software engineer, learn ML fundamentals. Understand how models are trained and evaluated. Learn PyTorch or scikit-learn. Build one ML-powered feature in a real application. You will immediately become relevant to every company investing in AI — which at this point is every company.&lt;br&gt;
If you are starting from scratch, skip the specialization entirely. Build both skills simultaneously. The combination is rarer and more valuable than either skill alone.&lt;br&gt;
The Honest Caveat&lt;br&gt;
I want to be clear about what full-stack ML engineering is not.&lt;br&gt;
It is not being the best data scientist in the room. Research scientists at top AI labs have depth of expertise that full-stack ML engineers cannot match. If your goal is to publish papers and advance the frontier of machine learning, specialize deeply in ML research.&lt;br&gt;
It is not being the best software engineer in the room either. Senior engineers with ten years of systems programming experience will outperform a full-stack ML engineer on pure engineering tasks.&lt;br&gt;
Full-stack ML engineering is being the person who can ship AI products. That is a different goal from being the best at any single discipline. It is also currently the most in-demand goal in the industry.&lt;br&gt;
Conclusion&lt;br&gt;
The most valuable technical hire at an AI-focused company in 2026 is not the person who builds the best model. It is the person who ships the complete product.&lt;br&gt;
Data science produced brilliant models that lived in notebooks. ML engineering ships those models to users. Full-stack ML engineering does both — and eliminates every bottleneck in between.&lt;br&gt;
The boundary between data science and software engineering is not a wall. It is an opportunity.&lt;br&gt;
The engineers who cross it are the ones building the products everyone else is still planning.&lt;br&gt;
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software for African markets.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>career</category>
      <category>webdev</category>
      <category>python</category>
    </item>
    <item>
      <title>Handling Class Imbalance in Fraud Detection with scikit-learn</title>
      <dc:creator>Joseph Tobi</dc:creator>
      <pubDate>Thu, 07 May 2026 06:08:06 +0000</pubDate>
      <link>https://dev.to/joseph_tobi_b7ccf5406909f/handling-class-imbalance-in-fraud-detection-with-scikit-learn-aa3</link>
      <guid>https://dev.to/joseph_tobi_b7ccf5406909f/handling-class-imbalance-in-fraud-detection-with-scikit-learn-aa3</guid>
      <description>&lt;p&gt;Handling Class Imbalance in Fraud Detection with scikit-learn&lt;br&gt;
Every fraud detection tutorial I've seen makes the same mistake. They train a model, print the accuracy score — 99.8% — and declare success.&lt;br&gt;
That model is useless.&lt;br&gt;
In a dataset where 0.17% of transactions are fraudulent, a model that predicts "legitimate" for every single transaction achieves 99.83% accuracy. It has never detected a single fraud case in its life.&lt;br&gt;
This is the class imbalance problem and it's the most important thing to understand before building any fraud detection system.&lt;br&gt;
In this tutorial I'll show you exactly how to handle it correctly using scikit-learn. By the end you'll have a working fraud detection pipeline that actually catches fraud.&lt;br&gt;
Prerequisites&lt;br&gt;
Python 3.8+&lt;br&gt;
Basic understanding of classification&lt;br&gt;
pip installed&lt;br&gt;
The Dataset&lt;br&gt;
We'll use the Credit Card Fraud Detection dataset from Kaggle. It contains 284,807 transactions with only 492 fraud cases — a fraud rate of 0.17%. This is a real-world class imbalance problem.&lt;br&gt;
Download it from Kaggle and save it as creditcard.csv.&lt;br&gt;
Step 1 — Explore the Data First&lt;br&gt;
Never start modeling without understanding your data.&lt;br&gt;
import pandas as pd&lt;br&gt;
import numpy as np&lt;/p&gt;

&lt;p&gt;df = pd.read_csv("creditcard.csv")&lt;/p&gt;

&lt;h1&gt;
  
  
  Always check this first
&lt;/h1&gt;

&lt;p&gt;print(f"Dataset shape: {df.shape}")&lt;br&gt;
print(f"\nClass distribution:")&lt;br&gt;
print(df["Class"].value_counts())&lt;br&gt;
print(f"\nFraud rate: {df['Class'].mean():.4%}")&lt;br&gt;
print(f"\nMissing values: {df.isnull().sum().sum()}")&lt;br&gt;
Output:&lt;br&gt;
Dataset shape: (284807, 31)&lt;/p&gt;

&lt;p&gt;Class distribution:&lt;br&gt;
0    284315&lt;br&gt;
1       492&lt;/p&gt;

&lt;p&gt;Fraud rate: 0.1727%&lt;/p&gt;

&lt;p&gt;Missing values: 0&lt;br&gt;
This tells us everything we need to know. 492 fraud cases against 284,315 legitimate transactions. This is severe class imbalance.&lt;br&gt;
Step 2 — Why Accuracy Is the Wrong Metric&lt;br&gt;
Before we build anything, let's prove why accuracy is meaningless here.&lt;br&gt;
from sklearn.dummy import DummyClassifier&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;br&gt;
from sklearn.metrics import accuracy_score&lt;/p&gt;

&lt;p&gt;X = df.drop("Class", axis=1)&lt;br&gt;
y = df["Class"]&lt;/p&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(&lt;br&gt;
    X, y, test_size=0.2, random_state=42, stratify=y&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  A model that predicts majority class every time
&lt;/h1&gt;

&lt;p&gt;dummy = DummyClassifier(strategy="most_frequent")&lt;br&gt;
dummy.fit(X_train, y_train)&lt;br&gt;
y_pred = dummy.predict(X_test)&lt;/p&gt;

&lt;p&gt;print(f"Dummy model accuracy: {accuracy_score(y_test, y_pred):.4%}")&lt;br&gt;
Output:&lt;br&gt;
Dummy model accuracy: 99.8274%&lt;br&gt;
A model that has learned absolutely nothing achieves 99.83% accuracy. This is why you must never use accuracy as your primary metric for imbalanced classification.&lt;br&gt;
Step 3 — Use the Right Metrics&lt;br&gt;
The correct metrics for fraud detection are:&lt;br&gt;
from sklearn.metrics import (&lt;br&gt;
    classification_report,&lt;br&gt;
    roc_auc_score,&lt;br&gt;
    confusion_matrix,&lt;br&gt;
    precision_score,&lt;br&gt;
    recall_score,&lt;br&gt;
    f1_score&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;def evaluate_model(model, X_test, y_test, model_name):&lt;br&gt;
    y_pred = model.predict(X_test)&lt;br&gt;
    y_prob = model.predict_proba(X_test)[:, 1]&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(f"\n{'='*50}")
print(f"Model: {model_name}")
print(f"{'='*50}")
print(f"\nAUC-ROC:   {roc_auc_score(y_test, y_prob):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall:    {recall_score(y_test, y_pred):.4f}")
print(f"F1 Score:  {f1_score(y_test, y_pred):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred,
      target_names=["Legitimate", "Fraud"]))
print(f"\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Here is what each metric means in fraud context:&lt;br&gt;
AUC-ROC — measures how well the model separates fraud from legitimate transactions across all thresholds. 1.0 is perfect, 0.5 is random guessing. This is your primary metric.&lt;br&gt;
Recall — of all actual fraud cases, how many did we catch? Missing real fraud is the most costly mistake. Prioritize this.&lt;br&gt;
Precision — of all predicted fraud cases, how many were real? Low precision means too many false alarms blocking legitimate customers.&lt;br&gt;
F1 Score — harmonic mean of precision and recall. Good overall measure when you need to balance both.&lt;br&gt;
Step 4 — Preprocess the Data&lt;br&gt;
from sklearn.preprocessing import StandardScaler&lt;/p&gt;

&lt;p&gt;X = df.drop("Class", axis=1)&lt;br&gt;
y = df["Class"]&lt;/p&gt;

&lt;h1&gt;
  
  
  Stratify ensures both splits maintain
&lt;/h1&gt;

&lt;h1&gt;
  
  
  the same fraud ratio
&lt;/h1&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(&lt;br&gt;
    X, y,&lt;br&gt;
    test_size=0.2,&lt;br&gt;
    random_state=42,&lt;br&gt;
    stratify=y  # Critical for imbalanced data&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  Scale features
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Fit only on training data — never on test data
&lt;/h1&gt;

&lt;p&gt;scaler = StandardScaler()&lt;br&gt;
X_train_scaled = scaler.fit_transform(X_train)&lt;br&gt;
X_test_scaled = scaler.transform(X_test)&lt;/p&gt;

&lt;p&gt;print(f"Training set fraud rate: {y_train.mean():.4%}")&lt;br&gt;
print(f"Test set fraud rate: {y_test.mean():.4%}")&lt;br&gt;
Output:&lt;br&gt;
Training set fraud rate: 0.1727%&lt;br&gt;
Test set fraud rate: 0.1727%&lt;br&gt;
Stratify ensures both splits have the same fraud rate. Without it you might accidentally create a test set with no fraud cases at all.&lt;br&gt;
Step 5 — Approach 1: Class Weights&lt;br&gt;
The simplest approach. Tell the model to penalize misclassifying fraud cases more heavily.&lt;br&gt;
from sklearn.linear_model import LogisticRegression&lt;/p&gt;

&lt;h1&gt;
  
  
  Without class weights — baseline
&lt;/h1&gt;

&lt;p&gt;lr_baseline = LogisticRegression(&lt;br&gt;
    random_state=42,&lt;br&gt;
    max_iter=1000&lt;br&gt;
)&lt;br&gt;
lr_baseline.fit(X_train_scaled, y_train)&lt;br&gt;
evaluate_model(lr_baseline, X_test_scaled,&lt;br&gt;
               y_test, "Logistic Regression (No Weights)")&lt;/p&gt;

&lt;h1&gt;
  
  
  With class weights — handles imbalance
&lt;/h1&gt;

&lt;p&gt;lr_weighted = LogisticRegression(&lt;br&gt;
    class_weight="balanced",  # This is the key change&lt;br&gt;
    random_state=42,&lt;br&gt;
    max_iter=1000&lt;br&gt;
)&lt;br&gt;
lr_weighted.fit(X_train_scaled, y_train)&lt;br&gt;
evaluate_model(lr_weighted, X_test_scaled,&lt;br&gt;
               y_test, "Logistic Regression (Balanced)")&lt;br&gt;
class_weight="balanced" automatically calculates weights inversely proportional to class frequencies. Fraud cases get much higher weight so misclassifying them costs more.&lt;br&gt;
Step 6 — Approach 2: Random Forest with Class Weights&lt;br&gt;
Tree-based models handle imbalance better than linear models and support class weighting too.&lt;br&gt;
from sklearn.ensemble import RandomForestClassifier&lt;/p&gt;

&lt;p&gt;rf = RandomForestClassifier(&lt;br&gt;
    n_estimators=100,&lt;br&gt;
    class_weight="balanced",&lt;br&gt;
    random_state=42,&lt;br&gt;
    n_jobs=-1  # Use all CPU cores&lt;br&gt;
)&lt;br&gt;
rf.fit(X_train_scaled, y_train)&lt;br&gt;
evaluate_model(rf, X_test_scaled,&lt;br&gt;
               y_test, "Random Forest (Balanced)")&lt;br&gt;
Random Forest typically outperforms Logistic Regression on fraud detection because fraud patterns are highly nonlinear.&lt;br&gt;
Step 7 — Approach 3: SMOTE Oversampling&lt;br&gt;
SMOTE (Synthetic Minority Oversampling Technique) creates synthetic fraud samples to balance the dataset.&lt;br&gt;
from imblearn.over_sampling import SMOTE&lt;br&gt;
from sklearn.ensemble import RandomForestClassifier&lt;/p&gt;

&lt;h1&gt;
  
  
  Install: pip install imbalanced-learn
&lt;/h1&gt;

&lt;p&gt;smote = SMOTE(random_state=42)&lt;br&gt;
X_train_resampled, y_train_resampled = smote.fit_resample(&lt;br&gt;
    X_train_scaled, y_train&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;print(f"Before SMOTE: {y_train.value_counts().to_dict()}")&lt;br&gt;
print(f"After SMOTE: {pd.Series(y_train_resampled).value_counts().to_dict()}")&lt;/p&gt;

&lt;p&gt;rf_smote = RandomForestClassifier(&lt;br&gt;
    n_estimators=100,&lt;br&gt;
    random_state=42,&lt;br&gt;
    n_jobs=-1&lt;br&gt;
)&lt;br&gt;
rf_smote.fit(X_train_resampled, y_train_resampled)&lt;br&gt;
evaluate_model(rf_smote, X_test_scaled,&lt;br&gt;
               y_test, "Random Forest + SMOTE")&lt;br&gt;
Important — apply SMOTE only to training data, never to test data. You want to evaluate on real distribution, not synthetic data.&lt;br&gt;
Step 8 — Tune the Classification Threshold&lt;br&gt;
By default scikit-learn uses 0.5 as the fraud threshold. This is almost never optimal for imbalanced problems.&lt;br&gt;
import numpy as np&lt;br&gt;
from sklearn.metrics import precision_recall_curve&lt;/p&gt;

&lt;p&gt;y_prob = rf.predict_proba(X_test_scaled)[:, 1]&lt;br&gt;
precisions, recalls, thresholds = precision_recall_curve(&lt;br&gt;
    y_test, y_prob&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  Find threshold that maximizes F1
&lt;/h1&gt;

&lt;p&gt;f1_scores = 2 * (precisions * recalls) / (precisions + recalls + 1e-8)&lt;br&gt;
best_threshold = thresholds[np.argmax(f1_scores)]&lt;/p&gt;

&lt;p&gt;print(f"Default threshold (0.5) results:")&lt;br&gt;
y_pred_default = (y_prob &amp;gt;= 0.5).astype(int)&lt;br&gt;
print(f"Recall: {recall_score(y_test, y_pred_default):.4f}")&lt;br&gt;
print(f"Precision: {precision_score(y_test, y_pred_default):.4f}")&lt;/p&gt;

&lt;p&gt;print(f"\nOptimal threshold ({best_threshold:.3f}) results:")&lt;br&gt;
y_pred_optimal = (y_prob &amp;gt;= best_threshold).astype(int)&lt;br&gt;
print(f"Recall: {recall_score(y_test, y_pred_optimal):.4f}")&lt;br&gt;
print(f"Precision: {precision_score(y_test, y_pred_optimal):.4f}")&lt;br&gt;
In fraud detection you usually want to lower the threshold to catch more fraud at the cost of more false alarms. The right threshold depends on the business cost of each error type.&lt;br&gt;
Step 9 — Feature Importance&lt;br&gt;
Understanding which features drive fraud predictions helps you build better models and explain decisions to stakeholders.&lt;br&gt;
import pandas as pd&lt;br&gt;
import matplotlib.pyplot as plt&lt;/p&gt;

&lt;p&gt;feature_importance = pd.DataFrame({&lt;br&gt;
    "feature": X.columns,&lt;br&gt;
    "importance": rf.feature_importances_&lt;br&gt;
}).sort_values("importance", ascending=False)&lt;/p&gt;

&lt;p&gt;print("Top 10 most important features:")&lt;br&gt;
print(feature_importance.head(10))&lt;br&gt;
Step 10 — Save the Model for Production&lt;br&gt;
import joblib&lt;/p&gt;

&lt;h1&gt;
  
  
  Save model and scaler
&lt;/h1&gt;

&lt;p&gt;joblib.dump(rf, "fraud_model.pkl")&lt;br&gt;
joblib.dump(scaler, "scaler.pkl")&lt;br&gt;
joblib.dump(best_threshold, "threshold.pkl")&lt;/p&gt;

&lt;p&gt;print("Model, scaler and threshold saved")&lt;br&gt;
Save the threshold too — you'll need it when serving predictions in production to apply the same optimal cutoff.&lt;br&gt;
Summary — What To Always Do&lt;br&gt;
Here's your checklist for any imbalanced classification problem:&lt;br&gt;
Never use accuracy alone — use AUC-ROC, Recall, F1.&lt;br&gt;
Always stratify your splits — use stratify=y in train_test_split.&lt;br&gt;
Always handle class imbalance — at minimum use class_weight="balanced".&lt;br&gt;
Always tune your threshold — 0.5 is almost never optimal.&lt;br&gt;
Always save preprocessing artifacts — scaler, encoder, threshold together with the model.&lt;br&gt;
Conclusion&lt;br&gt;
Class imbalance is not a data problem — it is a modeling problem. The solution is not to collect more data. The solution is to choose the right metrics, handle the imbalance explicitly, and tune your decision threshold for your specific business context.&lt;br&gt;
A fraud detection model is not measured by how often it is right. It is measured by how much fraud it catches and how many legitimate customers it wrongly blocks. Keep that in mind every time you evaluate a model.&lt;br&gt;
The complete code for this tutorial is available on my GitHub at github.com/josephtobimayokun&lt;br&gt;
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software for African markets.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>fraud</category>
    </item>
    <item>
      <title>How to Serve a PyTorch Model with FastAPI: A Complete Guide</title>
      <dc:creator>Joseph Tobi</dc:creator>
      <pubDate>Thu, 07 May 2026 06:01:44 +0000</pubDate>
      <link>https://dev.to/joseph_tobi_b7ccf5406909f/how-to-serve-a-pytorch-model-with-fastapi-a-complete-guide-e7h</link>
      <guid>https://dev.to/joseph_tobi_b7ccf5406909f/how-to-serve-a-pytorch-model-with-fastapi-a-complete-guide-e7h</guid>
      <description>&lt;p&gt;How to Serve a PyTorch Model with FastAPI: A Complete Guide&lt;br&gt;
Most machine learning tutorials stop at model training. You get a trained model, a good validation score, and then — nothing. No one tells you how to actually use that model in a real application.&lt;br&gt;
In this tutorial I'll show you exactly how to take a trained PyTorch model and serve it as a REST API using FastAPI. By the end you'll have a working inference endpoint that any frontend or application can call to get predictions.&lt;br&gt;
I built this exact pipeline for my house price estimator project — a PyTorch MLP model served via FastAPI with a React frontend. Everything in this tutorial comes from real production experience.&lt;br&gt;
Prerequisites&lt;br&gt;
Python 3.8+&lt;br&gt;
Basic PyTorch knowledge&lt;br&gt;
Basic understanding of REST APIs&lt;br&gt;
pip installed&lt;br&gt;
What We're Building&lt;br&gt;
Trained PyTorch Model (.pth file)&lt;br&gt;
          ↓&lt;br&gt;
    FastAPI Server&lt;br&gt;
          ↓&lt;br&gt;
  POST /predict endpoint&lt;br&gt;
          ↓&lt;br&gt;
Returns prediction as JSON&lt;br&gt;
A client sends input features as JSON. FastAPI preprocesses them, runs inference through the model, and returns the prediction. Simple, clean, production-ready.&lt;br&gt;
Step 1 — Train and Save Your Model&lt;br&gt;
First let's define a simple MLP model and save it after training.&lt;/p&gt;

&lt;h1&gt;
  
  
  model.py
&lt;/h1&gt;

&lt;p&gt;import torch&lt;br&gt;
import torch.nn as nn&lt;/p&gt;

&lt;p&gt;class MLP(nn.Module):&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self, input_dim, hidden_dim, output_dim):&lt;br&gt;
        super().&lt;strong&gt;init&lt;/strong&gt;()&lt;br&gt;
        self.net = nn.Sequential(&lt;br&gt;
            nn.Linear(input_dim, hidden_dim),&lt;br&gt;
            nn.ReLU(),&lt;br&gt;
            nn.Dropout(0.3),&lt;br&gt;
            nn.Linear(hidden_dim, hidden_dim // 2),&lt;br&gt;
            nn.ReLU(),&lt;br&gt;
            nn.Linear(hidden_dim // 2, output_dim)&lt;br&gt;
        )&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def forward(self, x):
    return self.net(x)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After training, save the model weights and scaler:&lt;/p&gt;

&lt;h1&gt;
  
  
  train.py
&lt;/h1&gt;

&lt;p&gt;import torch&lt;br&gt;
import joblib&lt;br&gt;
from model import MLP&lt;/p&gt;

&lt;h1&gt;
  
  
  --- your training loop here ---
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Save model weights
&lt;/h1&gt;

&lt;p&gt;torch.save(model.state_dict(), "model.pth")&lt;/p&gt;

&lt;h1&gt;
  
  
  Save scaler — critical for consistent preprocessing
&lt;/h1&gt;

&lt;p&gt;joblib.dump(scaler, "scaler.pkl")&lt;/p&gt;

&lt;p&gt;print("Model and scaler saved successfully")&lt;br&gt;
Two things are saved — the model weights and the scaler. Both are required for consistent inference.&lt;br&gt;
Step 2 — Understand Why You Save Both&lt;br&gt;
This is the most important concept in production ML and the one most tutorials skip.&lt;br&gt;
During training you fit a StandardScaler on your training data. This scaler learns the mean and standard deviation of each feature. During inference you must apply the exact same transformation using the exact same statistics.&lt;br&gt;
If you refit the scaler on new data during inference, your features will be scaled differently from how the model was trained. The model receives input it has never seen before and predictions become unreliable.&lt;br&gt;
Always save your fitted scaler. Always load it at inference time. Never refit it.&lt;br&gt;
Step 3 — Install Dependencies&lt;br&gt;
pip install fastapi uvicorn torch joblib numpy scikit-learn&lt;br&gt;
Step 4 — Build the FastAPI Application&lt;/p&gt;

&lt;h1&gt;
  
  
  main.py
&lt;/h1&gt;

&lt;p&gt;import torch&lt;br&gt;
import joblib&lt;br&gt;
import numpy as np&lt;br&gt;
from fastapi import FastAPI, HTTPException&lt;br&gt;
from fastapi.middleware.cors import CORSMiddleware&lt;br&gt;
from pydantic import BaseModel&lt;br&gt;
from model import MLP&lt;/p&gt;

&lt;p&gt;app = FastAPI(title="PyTorch Model API")&lt;/p&gt;

&lt;h1&gt;
  
  
  Allow frontend applications to call this API
&lt;/h1&gt;

&lt;p&gt;app.add_middleware(&lt;br&gt;
    CORSMiddleware,&lt;br&gt;
    allow_origins=["&lt;em&gt;"],&lt;br&gt;
    allow_methods=["&lt;/em&gt;"],&lt;br&gt;
    allow_headers=["*"],&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  --- Load model and scaler once on startup ---
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Loading inside the predict function would reload
&lt;/h1&gt;

&lt;h1&gt;
  
  
  on every request — slow and inefficient
&lt;/h1&gt;

&lt;p&gt;INPUT_DIM = 8&lt;br&gt;
HIDDEN_DIM = 128&lt;br&gt;
OUTPUT_DIM = 1&lt;/p&gt;

&lt;p&gt;model = MLP(INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM)&lt;br&gt;
model.load_state_dict(&lt;br&gt;
    torch.load("model.pth", map_location="cpu")&lt;br&gt;
)&lt;br&gt;
model.eval()  # Disables dropout during inference&lt;/p&gt;

&lt;p&gt;scaler = joblib.load("scaler.pkl")&lt;/p&gt;

&lt;h1&gt;
  
  
  --- Input schema ---
&lt;/h1&gt;

&lt;p&gt;class PredictionInput(BaseModel):&lt;br&gt;
    feature1: float&lt;br&gt;
    feature2: float&lt;br&gt;
    feature3: float&lt;br&gt;
    feature4: float&lt;br&gt;
    feature5: float&lt;br&gt;
    feature6: float&lt;br&gt;
    feature7: float&lt;br&gt;
    feature8: float&lt;/p&gt;

&lt;h1&gt;
  
  
  --- Prediction endpoint ---
&lt;/h1&gt;

&lt;p&gt;@app.get("/")&lt;br&gt;
def root():&lt;br&gt;
    return {"status": "Model API is running"}&lt;/p&gt;

&lt;p&gt;@app.post("/predict")&lt;br&gt;
def predict(data: PredictionInput):&lt;br&gt;
    try:&lt;br&gt;
        # Build feature array&lt;br&gt;
        features = np.array([[&lt;br&gt;
            data.feature1,&lt;br&gt;
            data.feature2,&lt;br&gt;
            data.feature3,&lt;br&gt;
            data.feature4,&lt;br&gt;
            data.feature5,&lt;br&gt;
            data.feature6,&lt;br&gt;
            data.feature7,&lt;br&gt;
            data.feature8,&lt;br&gt;
        ]])&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    # Preprocess using saved scaler
    features_scaled = scaler.transform(features)

    # Convert to tensor
    tensor = torch.tensor(
        features_scaled,
        dtype=torch.float32
    )

    # Run inference
    with torch.no_grad():
        # torch.no_grad() tells PyTorch not to
        # track gradients — faster and uses less memory
        prediction = model(tensor)

        # If you trained on log(target),
        # reverse the transformation
        result = torch.exp(prediction).item()

    return {
        "prediction": round(result, 2),
        "status": "success"
    }

except Exception as e:
    raise HTTPException(
        status_code=500,
        detail=str(e)
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Step 5 — Run the Server&lt;br&gt;
uvicorn main:app --reload --host 0.0.0.0 --port 8000&lt;br&gt;
Your API is now running at &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;&lt;br&gt;
Visit &lt;a href="http://localhost:8000/docs" rel="noopener noreferrer"&gt;http://localhost:8000/docs&lt;/a&gt; to see the automatic interactive documentation FastAPI generates. You can test your endpoint directly from the browser.&lt;br&gt;
Step 6 — Test Your Endpoint&lt;br&gt;
Using curl:&lt;br&gt;
curl -X POST &lt;a href="http://localhost:8000/predict" rel="noopener noreferrer"&gt;http://localhost:8000/predict&lt;/a&gt; \&lt;br&gt;
  -H "Content-Type: application/json" \&lt;br&gt;
  -d '{&lt;br&gt;
    "feature1": 1500,&lt;br&gt;
    "feature2": 2005,&lt;br&gt;
    "feature3": 7,&lt;br&gt;
    "feature4": 3,&lt;br&gt;
    "feature5": 2,&lt;br&gt;
    "feature6": 2,&lt;br&gt;
    "feature7": 850,&lt;br&gt;
    "feature8": 1&lt;br&gt;
  }'&lt;br&gt;
Expected response:&lt;br&gt;
{&lt;br&gt;
  "prediction": 185432.50,&lt;br&gt;
  "status": "success"&lt;br&gt;
}&lt;br&gt;
Step 7 — Connect a React Frontend&lt;br&gt;
In your React component, call the API like this:&lt;br&gt;
const getPrediction = async (formData) =&amp;gt; {&lt;br&gt;
  const response = await fetch("&lt;a href="http://localhost:8000/predict" rel="noopener noreferrer"&gt;http://localhost:8000/predict&lt;/a&gt;", {&lt;br&gt;
    method: "POST",&lt;br&gt;
    headers: { "Content-Type": "application/json" },&lt;br&gt;
    body: JSON.stringify(formData)&lt;br&gt;
  });&lt;/p&gt;

&lt;p&gt;const data = await response.json();&lt;br&gt;
  return data.prediction;&lt;br&gt;
};&lt;br&gt;
Step 8 — Deploy to Production&lt;br&gt;
Frontend → Deploy to Vercel (free)&lt;br&gt;
Backend → Deploy to Render.com (free tier available)&lt;br&gt;
On Render, set your start command to:&lt;br&gt;
uvicorn main:app --host 0.0.0.0 --port $PORT&lt;br&gt;
Make sure your model.pth and scaler.pkl files are included in your GitHub repository so Render can access them during deployment.&lt;br&gt;
Update your React frontend to use the Render URL instead of localhost:&lt;br&gt;
const API_URL = "&lt;a href="https://your-app.onrender.com/predict" rel="noopener noreferrer"&gt;https://your-app.onrender.com/predict&lt;/a&gt;";&lt;br&gt;
Key Concepts to Remember&lt;br&gt;
model.eval() — Always call this after loading your model. It switches off dropout and batch normalization layers which behave differently during training versus inference.&lt;br&gt;
torch.no_grad() — Always wrap inference in this context manager. It disables gradient tracking which saves memory and speeds up inference significantly.&lt;br&gt;
Scaler consistency — Save your fitted scaler during training and load the same artifact during inference. Never refit on new data.&lt;br&gt;
Load once on startup — Load your model and scaler at the top of main.py, not inside the predict function. Loading on every request is slow and wasteful.&lt;br&gt;
Conclusion&lt;br&gt;
Serving a PyTorch model with FastAPI follows a consistent pattern regardless of your model architecture or problem type. Train your model, save both the weights and preprocessing artifacts, load them once on server startup, and expose a clean prediction endpoint.&lt;br&gt;
This pattern is what separates ML engineers who build demo notebooks from those who build production systems. The model is only half the job — getting it into a working API that real applications can consume is the other half.&lt;br&gt;
The complete code for this tutorial is available on my GitHub at github.com/josephtobimayokun&lt;br&gt;
Joseph Tobi Mayokun is a full-stack developer and ML engineer, founder of Microlink — an AI-focused tech startup building intelligent software.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>fastapi</category>
      <category>pytorch</category>
    </item>
  </channel>
</rss>
