Alim Bidmus

Posted on Feb 28 • Originally published at Medium

What I learned deploying machine learning in a Lagos clinic with 1 ophthalmologist for 200,000 people

#machinelearning #healthcare #python #aiinmedicine

In 2021, I deployed a machine learning model that diagnosed 5 eye conditions with 87% accuracy at a Nigerian clinic with 1 ophthalmologist serving 200,000 people.

The waiting time dropped. Not by minutes. By days. But getting there? That’s the story worth telling.

The Problem Nobody Talks About
Nigeria has roughly 200 million people. We have about 1,000 ophthalmologists. Do the math.

I was working as a Clinical Analyst at Lagos State Primary Healthcare Board when I saw the pattern. Patients would show up at the eye hospital with symptoms. Then they’d wait. Sometimes weeks. By the time they got a diagnosis, preventable conditions had progressed too far.

Diabetic retinopathy. Age-related macular degeneration. Glaucoma. These aren’t abstract conditions in a textbook. They’re my neighbors. My family members. People who lose their sight because there aren’t enough specialists to see them in time.

I was also building MedVendorHub, a healthtech startup connecting patients to medical services. Every day, I saw the gap between what patients needed and what the system could deliver.

The frustrating part? Many of these conditions are detectable in fundus images. The expertise exists. The specialists just can’t be everywhere at once.

The gap isn’t always about knowledge. Sometimes it’s about access.

Why Machine Learning Actually Made Sense Here
The technical case for ML was clear. Retinal fundus images follow detectable patterns. Diabetic retinopathy presents with microaneurysms and hemorrhages. Glaucoma shows characteristic optic nerve cupping. If neural networks could read X-rays for radiologists, why not fundus images for ophthalmologists?

The goal wasn’t to replace ophthalmologists. It was to triage. To give clinicians a tool that could say: “This patient needs urgent attention” vs. “This can wait a week.”

In a system where every specialist minute counts, that distinction saves sight.

The Technical Architecture: What Actually Worked
Choosing The Stack: I built the model in TensorFlow and deployed on Azure ML. The choice wasn’t about trends — it was about constraints. Azure’s auto-scaling meant no expensive on-premise hardware. Critical when a clinic processes 200 patients in a morning and none in the afternoon.

For deployment, Azure ML was the clear choice. Here’s why that mattered in Lagos:

Cloud-based meant no expensive on-premise infrastructure
Auto-scaling (critical when a busy clinic processes 200 patients in a day)
Supported Python and R (our team had mixed backgrounds)
Cost optimization tools (when you’re working on a tight budget, this isn’t optional)
The Model: Multi-Class Classification
The technical approach was straightforward: multi-class classification.

One model. Five possible outputs:

Diabetic retinopathy
Age-related macular degeneration
Glaucoma
Retinopathy of prematurity
Congenital cataract The basic neural network architecture used convolutional layers to detect patterns in fundus images, then classified them into one of five conditions:

# Simplified architecture (not production code)
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(5, activation='softmax')  # 5 conditions
])

The real complexity wasn’t in the model architecture. It was in everything else.

The Dataset Challenge: Or, Why Nothing is Ever Easy
Acquiring Real-World Medical Data: The hardest part wasn’t the model. It was the data. Retinal fundus images needed:

Patient consent (took 3 months to navigate privacy protocols)
Ophthalmologist validation (labels from 4 specialists to ensure consistency)
Quality control (40% of collected images were too blurry to use)
Demographic diversity (Western datasets failed on Nigerian patients)
I needed retinal fundus images. Thousands of them. Labeled by actual ophthalmologists. From Nigerian patients (because training on Western datasets and deploying in Africa is… problematic).

First, I hit up the major ophthalmology facilities in Lagos. I explained the project. Showed them the potential impact. Most were interested. But:

Patient consent processes took months
Image quality varied wildly between clinics
Storage formats were inconsistent
Labels needed validation by multiple specialists I couldn’t rely solely on Lagos datasets. So I supplemented with open-source datasets where I could confirm their authenticity. Places like the Ocular Diseases Intelligent Recognition (ODIR) dataset. But even these needed preprocessing to match the image characteristics from Lagos clinics.

Building the Preprocessing Pipeline
The preprocessing pipeline normalized images from different camera types so the model could compare them fairly:

# Image preprocessing steps
def preprocess_fundus_image(image_path):
    # Load and resize
    img = cv2.imread(image_path)
    img = cv2.resize(img, (224, 224))

    # Normalize pixel values
    img = img / 255.0

    # Color normalization (fundus images vary by camera)
    img = normalize_color_channels(img)

    # Contrast enhancement
    img = enhance_contrast(img)

    return img

Every image went through quality checks. Every label got validated. It was tedious. It was necessary.

The Azure ML Deployment: 7 Stages of Getting It Wrong
The ML development cycle is clean on paper. In reality, it looked like this:

Stage 1–2: Data Sourcing and Prep (3 months)
Already covered. It was brutal.

Stage 3: Data Wrangling (2 weeks)
Dealing with class imbalance. Glaucoma images outnumbered retinopathy of prematurity by 10:1. Used augmentation and weighted loss functions to balance.

Stage 4: Training and Testing (6 weeks)
First model: 62% accuracy. Terrible.

Iterated on:

Architecture depth
Learning rate schedules
Data augmentation strategies
Transfer learning with ImageNet weights By iteration 7: 87% accuracy. Good enough to be useful. Not perfect. Good enough.

Stage 5–7: Deployment and Monitoring
Azure ML made deployment relatively painless once we configured the web service endpoint with authentication and resource limits:

# Deploy as web service endpoint
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
# Configuration
aci_config = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    auth_enabled=True
)
# Deploy
service = Model.deploy(
    workspace=ws,
    name='eye-diagnosis-service',
    models=[model],
    inference_config=inference_config,
    deployment_config=aci_config
)

But here’s what the tutorial doesn’t show:

Cost optimization was critical. We started with always-on instances. Burned through budget in week one. Switched to auto-scaling based on clinic hours. Costs dropped 60%.

Integration with clinical workflow. The model wasn’t standalone. It had to fit into existing patient management systems. That meant APIs, authentication, and building a simple interface that nurses could use without training.

Validation in production. Every model prediction got flagged for review by an ophthalmologist. We tracked accuracy over time. Monitored for drift. Adjusted when needed.

The Real Impact: What Changed on the Ground
Quantified Outcomes: Three months after deployment at Eye Foundation Nigeria, here’s what happened:

Patient waiting time: Down from 2–3 weeks to 3–5 days for initial assessment (40% reduction)

Triage accuracy: The model correctly flagged 94% of urgent cases in the first week

Specialist time: Ophthalmologists could focus on complex cases instead of routine screening

False positives: 13%. Acceptable. We’d rather over-refer than miss a case

The Human Side of Adoption
But here’s the thing nobody puts in case studies: adoption was hard.

Some clinicians were skeptical. “How does it know?” Some patients didn’t trust a computer diagnosis. We had to prove it, one case at a time. Show the fundus image. Explain the model’s confidence score. Let the ophthalmologist make the final call.

The model wasn’t replacing judgment. It was augmenting it.

What I Got Wrong (And What I’d Do Differently)
Underestimated data prep time

I thought 3 months. It took 6. Medical data is messy. Plan accordingly.
Assumed clinical integration would be straightforward

It wasn’t. Should have involved nurses and front-desk staff from day one.
Didn’t build enough monitoring from the start

Added it later. Should have been there from deployment. Model drift in medical imaging is real.
Cost optimization came too late

Week one budget panic taught me to plan for scaling down, not just up.
Technical Deep-Dive: Performance Metrics and Architecture
For those interested in the numbers:

Metric Value Overall Accuracy 87% Diabetic Retinopathy Precision 91% Glaucoma Recall 89% AMD F1-Score 0.86 False Positive Rate 13% Inference Time <2 seconds Model Size 23 MB Training Time 48 hours

The Architecture That Actually Shipped
I tried several approaches before landing on what worked:

Download the Medium app
Final: Custom architecture with transfer learning

Started with MobileNetV2 pretrained on ImageNet
Removed top layers
Added custom classification head
Fine-tuned last 20 layers on our dataset
Result: 87% accuracy, 23 MB model size The final architecture combined a pretrained MobileNetV2 base with custom classification layers, freezing early layers while fine-tuning the later ones:

# Final architecture
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze early layers
base_model.trainable = True
for layer in base_model.layers[:-20]:
    layer.trainable = False

# Custom classification head
model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(5, activation='softmax')
])

Training Configuration
The training setup used a decaying learning rate and weighted the rare conditions more heavily to compensate for dataset imbalance:

# Optimizer with learning rate schedule
initial_lr = 0.001
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_lr,
    decay_steps=1000,
    decay_rate=0.96,
    staircase=True
)

optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)

# Class weights for imbalanced dataset
class_weights = {
    0: 1.0,  # Diabetic retinopathy
    1: 2.3,  # Glaucoma
    2: 1.8,  # AMD
    3: 4.5,  # Retinopathy of prematurity (rare)
    4: 3.2   # Congenital cataract
}

model.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    metrics=['accuracy', 'precision', 'recall']
)

The training took 48 hours on Azure ML compute (Standard_NC6 instance with NVIDIA Tesla K80). Total compute cost: $127. That’s important. When you’re working with limited funding, you track every dollar.

Validation Strategy
We used stratified k-fold cross-validation (k=5) to ensure robust performance estimates across different patient subsets. The model’s performance remained consistent across folds, with accuracy ranging from 85% to 89%.

The model wasn’t state-of-art. It didn’t need to be. It needed to be good enough and deployable in a real clinic with real constraints.

Comparison with Existing Solutions
Most commercial tools required:

Expensive on-premise hardware ($50,000+)
High-bandwidth internet for cloud processing
Annual licensing fees ($10,000+)
Specialized training for operators
Our solution:
Cloud-based (no hardware investment)
Works on 3G connections (compressed API requests)
Cost per prediction: <£0.02
Nurses trained in 2 hours
The total cost to process 10,000 patients in the first year? $1,843. That includes Azure compute, storage, and API calls.

Compare that to hiring one additional ophthalmologist at $60,000/year, and the economics make sense even before you factor in the waiting time reduction.

What This Means for Healthcare ML: The Real Challenge Isn’t the Technology
This wasn’t revolutionary tech. It was appropriate tech.

TensorFlow and Azure ML are mature platforms. The challenge wasn’t building a model. It was building a model that worked in context:

Resource constraints
Unreliable internet
Cost sensitivity
Clinical workflow integration
Cultural acceptance The lesson: The technology is ready. The hard part is deployment in real-world conditions.

Looking Forward: NHS, Global Health, and What’s Next
Why the NHS Needs This (And What’s Stopping It)
NHS ophthalmology waiting lists have tripled since 2019. The average patient waits 4 months for an initial assessment. Many conditions progress irreversibly during that wait.

The solution exists. The Lagos model can triage NHS patients. Here’s what needs to change:

Regulatory burden is crushing innovation

CE marking + MHRA approval takes 18–24 months. By the time a tool is approved, the technology is outdated. We need fast-track pathways for proven, low-risk AI tools.

NHS data silos prevent model training

Every trust owns its data. Building a national model requires negotiating with 200+ organizations. Compare this to Estonia, where centralized health data enabled nationwide AI deployment in 2 years.

Procurement processes favor big vendors

A startup with a working solution can’t compete with a multinational offering vaporware. NHS procurement needs “innovation tracks” for validated small-scale deployments.

The irony: We built this in Lagos with 5% of the NHS budget per capita. The constraint isn’t money. It’s bureaucracy.

I’m not saying regulation is bad. I’m saying it’s calibrated for pharmaceuticals, not software. We need proportionate regulation that matches the risk profile of decision-support tools.

Adapting for the NHS Context
The model needs retraining. UK fundus cameras differ from Lagos ones (different manufacturers, different image characteristics). The patient population is different (different prevalence rates, different age distributions). But the principles hold.

What needs to change:

Regulatory approval: CE marking for medical devices, MHRA approval
GDPR compliance: Critical in UK, wasn’t an issue in Nigeria
NHS integration standards: Different APIs, HL7 FHIR messages, NHS Digital requirements
Multi-site validation: Testing across at least 5 NHS trusts before wider deployment
Explainability: NHS clinicians need to understand why the model flagged a case
What stays the same:

The core architecture (transfer learning approach works)
The triage philosophy (augment, don’t replace)
The cost optimization focus (NHS budgets are tight too)
The Global Model Question
Here’s the bigger opportunity: Building a global model trained on diverse datasets that works across countries.

This requires intentional effort.

The challenge: Most medical imaging datasets come from Western hospitals. Models trained on these fail when deployed in Africa or Asia. It’s not just about skin tone in fundus images (though that matters). It’s about:

Disease prevalence differences
Camera equipment variations
Image quality standards
Lighting conditions
Patient positioning protocols
The opportunity: Build a model trained on data from:

Lagos clinics (done)
London hospitals (in progress)
Rural India (partnerships forming)
Sub-Saharan Africa (expanding)
Southeast Asia (next phase)
If AI tools only work for Western patients, we’re building inequality into the technology.

Real-World Scaling Lessons
After 18 months of deployment, here’s what I’ve learned about scaling medical AI:

Technical scaling is the easy part. Going from 1 clinic to 10 is mostly DevOps. Azure handles it.

Human scaling is hard. Training staff. Building trust. Changing workflows. Each clinic is different. Each requires customization.

Regulatory scaling is slowest. Getting approval in Nigeria took 6 months. UK will take longer. Each country has different requirements.

Financial scaling requires proof. Funders want ROI. “It saves time” isn’t enough. You need: “It reduced waiting times by X days, saved $Y per patient, and here’s the peer-reviewed study proving it.”

We’re working on that study now.

If You’re Building Medical Imaging ML
Here’s what I’d tell you:

Start with the clinical workflow, not the model — Talk to nurses. Watch patient intake. Understand where your tool fits before you build it.
Data quality > data quantity — 1,000 well-labeled, validated images beat 10,000 messy ones.
Plan for the unsexy parts — Deployment, monitoring, and cost optimization take longer than training.
Build trust with end-users — Your model’s accuracy doesn’t matter if clinicians won’t use it.
Local context matters — A model trained on Asian patients might fail on African patients. Diversity in training data isn’t optional.
Start simple — My first architecture was overly complex. The simpler version performed better.
The Takeaway
Technology doesn’t save people. People save people. Technology just gives them better tools.

The Opportunity Nobody’s Talking About
There are 2.7 billion people in regions with fewer than 1 ophthalmologist per 100,000. Most will never see a specialist in their lifetime.

But they have smartphones. They have internet (even if it’s slow). And they have fundus cameras (getting cheaper every year).

The technology to diagnose 5 major eye conditions exists. It costs £0.02 per scan. It works on 3G networks. The question isn’t “can we build this?”

We already did.

The question is: “Who’s going to deploy it at scale?”