DEV Community

Cover image for Demonstrating Bias and Mitigation in a Simple Clinical ML Pipeline
Dimitrii Lyomin
Dimitrii Lyomin

Posted on

Demonstrating Bias and Mitigation in a Simple Clinical ML Pipeline

A Minimal Case Study of Bias in Clinical Machine Learning

Repository: https://github.com/lemind/clinical-ml-bias-demo


Background

This project was done as part of a university course on machine learning in clinical settings.

The course focused on bias and fairness in ML, clinical risk, interpretability, and the limits of technical mitigation in real-world systems.

One of the key inspirations was Ntoutsi et al. (2020), “Bias in Data-Driven AI Systems”[1], which emphasizes that bias can appear at all stages of an ML pipeline and that mitigation is usually partial and involves trade-offs.

The goal of this project was not to build a strong predictive model, but to:

  • demonstrate bias,
  • measure it explicitly,
  • and apply a simple mitigation strategy.

Problem Setup

We simulate a basic clinical prediction task using synthetic data:

  • Features: age, sex, severity score
  • Target: binary clinical outcome

Sex is treated as a protected attribute.

Bias is intentionally introduced so that female patients are more likely to be missed by the model.

This setup reflects real clinical scenarios where historical or systemic biases are encoded in data.


Method

Tools

  • Python
  • NumPy, Pandas
  • scikit-learn
  • Logistic Regression (interpretable baseline)

Steps

  1. Generate biased synthetic clinical data
  2. Train a logistic regression model
  3. Evaluate performance by subgroup
  4. Measure False Negative Rate (FNR) per group
  5. Apply post-processing mitigation

False negatives were chosen as the primary metric due to their clinical relevance.


Results

Baseline (default threshold = 0.5)

  • Overall accuracy: 0.593
  • False Negative Rate (male): 0.468
  • False Negative Rate (female): 0.925

The model misses most positive cases for female patients, despite acceptable overall accuracy.

After mitigation (group-specific threshold)

  • False Negative Rate (male): 0.468
  • False Negative Rate (female): 0.094

The disparity is substantially reduced without retraining the model.


Interpretation

Mitigation was applied at decision time, not during training.

A more sensitive threshold was used for the disadvantaged group.

This improves recall for female patients but implicitly increases false positives, illustrating a real clinical trade-off.

The outcome aligns with the arguments of Ntoutsi et al. (2020):

bias mitigation redistributes errors rather than eliminating them.


Limitations

  • Synthetic data lacks real clinical complexity
  • Only one protected attribute was considered
  • Thresholds were manually chosen
  • Not all fairness metrics were evaluated

Despite this, the example effectively demonstrates key concepts from the course.


Conclusion

This project shows that:

  • bias can exist even in simple, interpretable models
  • overall accuracy can hide clinically important disparities
  • mitigation strategies reduce harm but introduce trade-offs

Bias in clinical ML is not purely a modeling issue, but a system-level problem, consistent with current research.


[1] - Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M.-E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., Broelemann, K., Kasneci, G., Tiropanis, T., & Staab, S. (2020). Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), e1356. https://doi.org/10.1002/widm.1356

Top comments (0)