Ekemini Thompson

Posted on Sep 11

ML Serving as a Microservice

#ekeminithompson #machinelearning #xai #ai

AI-in-the-Loop Healthcare: Engineering Pregnancy Fit-to-Fly

Pregnancy Fit-to-Fly is not just another full-stack app—it’s a production-ready demonstration of how machine learning can be embedded as a first-class system component, orchestrating user experience in real time.

At its essence, the platform automates the issuance of medical fitness-to-fly certificates for pregnant passengers, but beneath the surface it showcases a robust AI/ML architecture engineered for low-latency inference, modular scalability, and safety-critical decisioning.

AI-Driven System Design

Unlike traditional healthcare apps that bolt AI onto the side, this project was built around the principle of AI-in-the-loop workflows. The models are not background advisors—they are the control plane.

The system is composed of three independent but tightly integrated layers:

Frontend (clients/): A tab-based user workflow built on modern JavaScript modules.
API Backend (server/): Node.js/Express service handling authentication, certificate issuance, and integration with doctors and payments.
ML Backend (ml/): A FastAPI-powered inference engine serving lightweight predictive models.

This modular separation ensures that models evolve independently from application logic, enabling rapid iteration and scalable deployment across microservices.

Machine Learning Core

The ML service processes structured health data spanning demographics, physiological measures, behavioral scores, and binary symptoms. Feature vectors are dynamically assembled into a numerical pipeline before inference, ensuring data flexibility and consistency.

The models include:

Heart Rate Predictor (Regression) – fast BPM estimation adjusted for age, activity, and stress.
Blood Pressure Predictor (Regression) – estimates systolic/diastolic levels from feature interactions.
Eligibility Classifier (Binary Classification) – outputs probability of “fit-to-fly,” tuned for conservative, safety-first thresholds.

All models are optimized for sub-100ms inference, containerized for cloud portability, and exposed via stateless API endpoints—making them inherently horizontally scalable under load.

ML Serving as a Microservice

The ML backend operates as a stateless FastAPI microservice, designed around the following principles:

JSON schema contracts → Frontend remains model-agnostic.
Stateless inference → Seamless horizontal scaling via Kubernetes or Swarm.
API-first extensibility → New models or endpoints can be added without breaking downstream integrations.
Safety-first gating → Eligibility decisions halt unsafe workflows at runtime.

This design abstracts away complexity for the frontend, while maintaining the flexibility of plugging in new predictive engines (e.g., gestational risk models, wearable-device streams).

Production-Grade Stack

The technical stack reflects an engineering-for-scale mindset:

ML Backend: Python 3.8+, FastAPI, scikit-learn/XGBoost, NumPy, Pandas
API Backend: Node.js/Express, JWT-based authentication, pluggable data layer (MongoDB/Postgres)
Frontend: Modular ES modules, esbuild for bundling
DevOps: Docker-first, .env-based configuration, ready for cloud CI/CD pipelines

Every layer is decoupled but orchestrated, enabling teams to iterate on AI models, backend services, or frontend workflows independently—without regressions.

Workflow Orchestration

The application flow illustrates real-time AI integration:

Passenger logs in (JWT-authenticated).
Health data collected via form-driven interface.
Features sent to ML API → model inference controls navigation.

If “fit-to-fly” → system unlocks doctor selection.
If not → workflow halts, preventing unsafe certification.
1. Payment handled via external gateway (Paystack).
2. Certificate issued as a secure, verifiable PDF.

Here, model predictions aren’t just insights—they are gatekeepers, dynamically shaping user experience.

Scalability & Reliability

The architecture is microservice-native:

Independent deployments → ML and API can scale separately.
Observability hooks → Request-level logging for monitoring inference performance.
Security → JWT gating ensures only authenticated requests reach the ML backend.
Future extensibility → Designed to incorporate explainability, wearables, or federated updates.

In short, this isn’t a prototype—it’s production-ready AI infrastructure applied to healthcare.

Future AI-Centric Extensions

The roadmap pushes beyond predictive gating into next-gen applied AI systems engineering:

LLM Integration: Natural-language medical explanations (“why eligible/ineligible”).
Edge Deployment: Model exports via ONNX/TensorRT for airline devices or on-prem clinics.
Explainability Layer: SHAP/LIME to expose feature importance in real time.
Federated Learning: Decentralized, privacy-preserving model improvements.
Wearable Integration: Real-time vitals streaming for continuous eligibility tracking.

This trajectory aligns with xAI’s ethos: building real-world AI systems that adapt, explain, and scale.

Why This Project Matters

AI-first architecture – models are core, not add-ons.
Safety-critical deployment – predictions gate life-impacting workflows.
Production engineering – decoupled microservices, containerized, and scalable.
Vision alignment – AI deployed for real human impact in aviation and healthcare.

Pregnancy Fit-to-Fly isn’t just a health app. It’s a blueprint for applied AI systems at scale—a showcase of how machine learning, microservices, and modern deployment pipelines converge into a single mission-critical product.

DEV Community

ML Serving as a Microservice

AI-in-the-Loop Healthcare: Engineering Pregnancy Fit-to-Fly

AI-Driven System Design

Machine Learning Core

ML Serving as a Microservice

Production-Grade Stack

Workflow Orchestration

Scalability & Reliability

Future AI-Centric Extensions

Why This Project Matters

Top comments (0)