I Built a Zero-Miss Cancer Screening Model Using Routine Blood Tests

#devchallenge #muxchallenge #showandtell #video

DEV's Worldwide Show and Tell Challenge Submission 🎥

This is a submission for the DEV's Worldwide Show and Tell Challenge Presented by Mux

What I Built

I developed the MEN2 Predictor, a machine learning screening tool for a rare hereditary cancer syndrome called MEN2, which is caused by mutations in the RET gene.

MEN2 often leads to medullary thyroid cancer if it's not caught early. In India, confirmation typically requires genetic testing, which costs around ₹20,000; as a result, many families delay or skip it.

This project explores whether routine blood markers and basic clinical features can act as a first screening layer before sequencing.

My Pitch Video

Demo

GitHub repository: https://github.com/ArjunCodess/men2-predictor
Live demo (Gradio / Hugging Face): https://huggingface.co/spaces/arjuncodess/men2-predictor

The demo lets you enter clinical features like calcitonin, CEA, age, and RET risk level and see the predicted cancer risk.

The Story Behind It

This started with a simple question: why does a life-saving diagnosis cost so much?

MEN2 is rare, but missing it has serious consequences. Most screening pipelines assume access to genetic sequencing, which isn't realistic in many settings. Meanwhile, clinicians already collect blood markers like calcitonin and CEA and track basic clinical features.

I wanted to see if that existing data could be used better.

I built this project by aggregating real patient data from published studies and testing whether a model could prioritise safety over performance optics. The key requirement was clear: missing a cancer case was unacceptable.

Technical Highlights

Trained on 152 real patients from 20 peer-reviewed studies across *24 RET variants
Implemented five different ML models for cross-validation and comparison
Focused on recall (sensitivity) instead of headline accuracy
Achieved 100% recall on real patient data, with zero documented cancers missed
Explicitly tested how synthetic data augmentation affects recall
End-to-end reproducible pipeline in Python with clear evaluation artefacts
Deployed an interactive demo using Gradio for transparency

The project intentionally favours interpretability and safety over complexity.

Participants

This project was not built in isolation, and I want to acknowledge the people who contributed meaningfully to it.

Harnoor Kaur
A class 12 student from my school. She is not on dev[dot]to.
Harnoor helped extensively with the research side of this project, including locating and compiling relevant peer-reviewed studies and assisting in sourcing and organising the clinical data used for model development.
Shashwat Misra
Mentor on this project.
Shashwat provided guidance throughout the development process, helped review the approach, and offered feedback on both the technical and research decisions that shaped the final pipeline.

Their support played an important role in turning this from an idea into a working, validated project.

By submitting this project, I confirm that my video adheres to Mux's terms of service: https://www.mux.com/terms