I Built an AI That Predicts Medical Specialties from Clinical Notes (End-to-End Deployment)

Sheikh Sadi Asif — Mon, 04 May 2026 15:54:11 +0000

Hey DEV community 👋

I recently built and deployed a full-stack AI system that predicts medical specialties from clinical text using ClinicalBERT, and I wanted to share the full journey from training to deployment. This is part of my project under GradienNinja / Astrolabsoft.

Link https://astrolab-medical-ai.netlify.app/

🚀 What I Built

I built an AI system that:

Takes clinical notes as input
Predicts the most likely medical specialty
Returns top 3 predictions with confidence scores
Runs as a fully deployed web application

Example Input

Patient is a 62-year-old male with chest pain radiating to the left arm, shortness of breath, sweating, and nausea. Symptoms worsen on exertion and improve with rest.

Example Output

Primary: Cardiovascular / Pulmonary

Confidence: 82%

Top 3 Predictions:

Cardiovascular / Pulmonary
General Medicine
SOAP / Clinical Notes

Tech Stack

Python
PyTorch
Hugging Face Transformers
ClinicalBERT
FastAPI (backend API)
HTML / CSS / JavaScript (frontend)
Netlify (frontend hosting)
Hugging Face Spaces (model deployment)

System Architecture

Frontend (Netlify UI)
↓
POST request (/predict)
↓
FastAPI backend (Hugging Face Space)
↓
ClinicalBERT model inference
↓
JSON response
↓
Frontend renders prediction

Why ClinicalBERT?

I used ClinicalBERT because it is pretrained on biomedical and clinical text, which gives it a strong understanding of:

Medical terminology
Symptoms and conditions
Clinical documentation style

This significantly improves prediction quality compared to generic NLP models.

Deployment

Backend: Hugging Face Spaces (FastAPI)
Frontend: Netlify (HTML/CSS/JS)
Fully API-based architecture

The system behaves like a lightweight AI SaaS product.

Challenges I Faced

Handling class imbalance in medical dataset
Long clinical notes exceeding token limits
Low confidence in ambiguous symptom cases
CORS issues between frontend and backend
Deployment debugging in Hugging Face Spaces

📈 What I Learned

This project taught me that:

Training a model is only 30% of the work
Deployment is where real engineering begins
Data quality matters more than model complexity
API design is critical for real-world AI systems
End-to-end thinking is required for production AI

🚀 Future Improvements

I plan to upgrade this system into:

Medical triage assistant (risk level detection)
Explainable AI (why model made a prediction)
Multi-label diagnosis support
Better dataset balancing
Hospital workflow integration

💬 Final Thoughts

This project made me realize:

“Building AI models is easy. Building usable AI systems is hard.”

This is just the beginning of my journey in building real-world AI products.

If anyone is working on AI + healthcare systems, feel free to connect

AI #MachineLearning #NLP #HuggingFace #FastAPI #HealthcareAI #Python

I Built an AI That Detects Pneumonia From Chest X-Rays Here's Exactly How I Did It

Sheikh Sadi Asif — Fri, 01 May 2026 08:36:02 +0000

A few weeks ago, I shipped PneumoScan AI a deep learning model that analyzes chest X-ray images and detects pneumonia in seconds, with 90%+ accuracy. It's live, it's free, and anyone can use it right now.

🔗 pneumonia-scan-ai.netlify.app

This is the story of how I built it — and everything I learned along the way.

Why Pneumonia?

Pneumonia kills over 2 million people annually. A huge portion of those deaths happen in low-resource areas where radiologists are scarce and diagnosis is slow.

I'm not claiming to solve that problem. But I wanted to build something that mattered not just another MNIST classifier or iris flower predictor. Medical imaging felt real.

The Dataset

I used the Chest X-Ray Images (Pneumonia) dataset from Kaggle.

5,800+ clinical chest X-ray images
Two classes: NORMAL and PNEUMONIA (Viral & Bacterial)
Real hospital data from Guangzhou Women and Children's Medical Center

One thing I learned immediately: the dataset is imbalanced. There are significantly more pneumonia images than normal ones. This is something you have to think about carefully in medical AI because a model that just predicts "pneumonia" on everything could still hit decent accuracy numbers while being completely useless.

The Architecture — MobileNetV2 + Custom Head

I chose MobileNetV2 as my base model for two reasons:

It's lightweight (14MB) perfect for deployment on free-tier infrastructure
It was pre-trained on ImageNet, so it already knows how to extract visual features

The key idea is transfer learning instead of training a CNN from scratch on 5,800 images (which isn't enough), I used MobileNetV2 as a frozen feature extractor and added my own classification head on top.

Here's the full pipeline:

Input (any size RGB image)
    ↓
Resizing Layer      → 224 × 224 (built into model)
Rescaling Layer     → pixel values ÷ 255
    ↓
MobileNetV2         → FROZEN (trainable = False)
16 inverted residual blocks
Final feature map: 7 × 7 × 1280
    ↓
GlobalAveragePooling2D  → collapses to 1280 values
    ↓
Dense(128, ReLU)        → learns pneumonia-specific patterns
    ↓
Dropout(0.3)            → prevents overfitting
    ↓
Dense(1, Sigmoid)       → outputs probability (0 = Normal, 1 = Pneumonia)

Why freeze MobileNetV2?

Because it already knows how to see. The ImageNet weights encode knowledge about edges, textures, and shapes that transfer surprisingly well to X-rays. Fine-tuning all those layers on a small dataset would just cause overfitting.

Why add Dense(128) before the output?

The raw GlobalAveragePooling output is 1280 features most of which are irrelevant to pneumonia. The Dense(128) layer acts as a bottleneck, forcing the model to compress what it learned into the most useful 128 features for this specific task.

Training config:

Optimizer: Adam (lr=0.001)
Loss: Binary Crossentropy
Final accuracy: 90%+

The Problem I Didn't Expect The Dog Test

After I deployed the first version, I decided to test it.

I uploaded a photo of a dog sitting in a bathroom smoking a cigarette.

The result?

{
  "Result": "PNEUMONIA DETECTED",
  "Confidence Score": "100.00%"
}

100% confident. That dog had pneumonia.

This is a classic failure mode of CNNs the model has no concept of "this isn't even an X-ray." It was trained only on X-rays, so it forced every single image into one of two buckets regardless of what it actually was.

I needed an input validation layer.

The Fix Saturation Gate

The solution I came up with is what I call a Saturation Gate.

The logic is simple: chest X-rays are grayscale. Any real X-ray will have near-zero color saturation. A photo of a dog, a selfie, a meme these all have high saturation.

So before the image ever reaches the model, I convert it to HSV color space and measure the mean saturation value. If it exceeds a threshold.

The dog photo gets rejected now. The model only sees what it was trained to see.

The Deployment Stack

I wanted this to be completely free to host — no cloud bills, no server management.

Here's what I ended up with:

Layer	Technology
Model Inference	Hugging Face Spaces (Gradio + TensorFlow-CPU)
Frontend	Netlify (custom Tailwind CSS portal)
Bridge	iframe

The Gradio app on Hugging Face handles all the heavy lifting loading the .h5 model, running inference, returning JSON results. The Netlify frontend is just a clean portal that embeds the Gradio Space via iframe.

Total hosting cost: $0.

What I Learned

1. Accuracy alone is a lie in medical AI.
A model can hit 90% accuracy while still missing dangerous cases. Always look at your confusion matrix. False negatives missed pneumonia cases — are the ones that matter most.

2. Input validation is not optional.
Any model you deploy in the real world needs to handle unexpected inputs gracefully. The Saturation Gate wasn't in any tutorial I followed. I had to think of it myself after seeing the model fail in a funny but revealing way.

3. Transfer learning is magic for small datasets.
If you're working with fewer than 50,000 images, you almost certainly shouldn't be training a CNN from scratch. Use pretrained weights. Freeze the base. Train only the head.

4. Ship first, improve later.
The first version was broken in obvious ways. But shipping it is what revealed those problems. The dog test only happened because I deployed it.

What's Next

Grad-CAM heatmap overlays showing which part of the X-ray triggered the detection. This is the standard in medical AI and would make the tool genuinely useful for educational purposes. Precision/Recall analysis properly evaluating false negative rate on the test set
More pathologies tuberculosis, pleural effusion, cardiomegaly

The Repo

The trained model is open source. You can download it, use it, build on it.

github.com/GradienNinja/PneumoScan-AI

Final Thought

I fix cars for a living. I don't have a degree. I didn't take a course that handed me this project.

I just built it.

If you're self-taught and reading this wondering whether you're "ready" to build something real you're not going to get ready by waiting. You get ready by shipping.

Built by Sheikh Sadi Asif — @GradienNinja | AstroLabSoft AI Lab

Disclaimer: This is a research and educational project. Not a certified medical device. Always consult a qualified healthcare professional for medical decisions.*

DEV Community: Sheikh Sadi Asif