DEV Community

Manognya Lokesh Reddy
Manognya Lokesh Reddy

Posted on

🧬 Predicting Cancer Cells with Machine Learning: My Internship Project

Hey Devs! 👋

I’m Manognya Lokesh Reddy, currently pursuing my Master’s in Artificial Intelligence. During my internship at Prinston Smart Engineer, I built a Cancer Cell Prediction model that achieved 92–95% accuracy and helped reduce false positives by 15%.

In this blog, I’ll walk through the problem, approach, and lessons I learned from applying ML to medical diagnosis.

⚕️ The Problem
Cancer diagnosis often involves analyzing biopsy and cell sample data to identify malignant (cancerous) vs. benign cells.
Manual analysis is time-consuming and prone to errors, especially when the cell features are subtle.

Our goal was to:

Use machine learning to classify cells with high accuracy

Reduce false positives to avoid unnecessary stress and medical procedures

Make the model interpretable for doctors

🛠️ Tech Stack
Python

Pandas + NumPy – for data handling

Scikit-learn – for ML modeling

Matplotlib + Seaborn – for visualizations

Jupyter Notebook – for experimentation

🧪 Workflow Breakdown

  1. 📊 Data Preparation Loaded a publicly available Breast Cancer Wisconsin dataset

Checked for missing values and handled them appropriately

Standardized features for better model performance

  1. 🔍 Exploratory Data Analysis Visualized distributions of features like cell size, texture, and clump thickness

Used correlation heatmaps to identify important predictors

  1. 🧠 Model Selection Tested Logistic Regression, Random Forest, and SVM

Tuned hyperparameters using GridSearchCV

Chose the best model based on accuracy, recall, and F1-score

  1. 📈 Evaluation Achieved 92–95% accuracy on the test set

Reduced false positives by 15%

Presented feature importance graphs to make results explainable for medical teams

📊 Results
✅ High accuracy & balanced recall/precision

⚡ Reduced misdiagnosis risk

🩺 Model outputs interpretable for non-technical users

💡 What I Learned
In healthcare, false positives and false negatives carry very different risks—you must optimize carefully

Model explainability matters just as much as accuracy

Collaborating with medical professionals gives context that purely technical work lacks

🌍 Real-World Potential
Could be deployed in diagnostic labs to support decision-making

Useful in rural healthcare centers where expert pathologists aren’t available

Could be extended to detect other diseases with similar datasets

Top comments (0)