Dimitrii Lyomin

Posted on Dec 18, 2025

Demonstrating Bias and Mitigation in a Simple Clinical ML Pipeline

#ai #bias #machinelearning #healthcare

A Minimal Case Study of Bias in Clinical Machine Learning

Repository: https://github.com/lemind/clinical-ml-bias-demo

Background

This project was done as part of a university course on machine learning in clinical settings.

The course focused on bias and fairness in ML, clinical risk, interpretability, and the limits of technical mitigation in real-world systems.

One of the key inspirations was Ntoutsi et al. (2020), “Bias in Data-Driven AI Systems”[1], which emphasizes that bias can appear at all stages of an ML pipeline and that mitigation is usually partial and involves trade-offs.

The goal of this project was not to build a strong predictive model, but to:

demonstrate bias,
measure it explicitly,
and apply a simple mitigation strategy.

Problem Setup

We simulate a basic clinical prediction task using synthetic data:

Features: age, sex, severity score
Target: binary clinical outcome

Sex is treated as a protected attribute.

Bias is intentionally introduced so that female patients are more likely to be missed by the model.

This setup reflects real clinical scenarios where historical or systemic biases are encoded in data.

Method

Tools

Python
NumPy, Pandas
scikit-learn
Logistic Regression (interpretable baseline)

Steps

Generate biased synthetic clinical data
Train a logistic regression model
Evaluate performance by subgroup
Measure False Negative Rate (FNR) per group
Apply post-processing mitigation

False negatives were chosen as the primary metric due to their clinical relevance.

Results

Baseline (default threshold = 0.5)

Overall accuracy: 0.593
False Negative Rate (male): 0.468
False Negative Rate (female): 0.925

The model misses most positive cases for female patients, despite acceptable overall accuracy.

After mitigation (group-specific threshold)

False Negative Rate (male): 0.468
False Negative Rate (female): 0.094

The disparity is substantially reduced without retraining the model.

Interpretation

Mitigation was applied at decision time, not during training.

A more sensitive threshold was used for the disadvantaged group.

This improves recall for female patients but implicitly increases false positives, illustrating a real clinical trade-off.

The outcome aligns with the arguments of Ntoutsi et al. (2020):

bias mitigation redistributes errors rather than eliminating them.

Limitations

Synthetic data lacks real clinical complexity
Only one protected attribute was considered
Thresholds were manually chosen
Not all fairness metrics were evaluated

Despite this, the example effectively demonstrates key concepts from the course.

Conclusion

This project shows that:

bias can exist even in simple, interpretable models
overall accuracy can hide clinically important disparities
mitigation strategies reduce harm but introduce trade-offs

Bias in clinical ML is not purely a modeling issue, but a system-level problem, consistent with current research.

[1] - Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M.-E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., Broelemann, K., Kasneci, G., Tiropanis, T., & Staab, S. (2020). Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), e1356. https://doi.org/10.1002/widm.1356

DEV Community