Pneumonia Detection

Overview

Throughout my journey in embracing data science, I finally decided to dive into the biomedical aspect of data science. Machine Learning is taking over the analytics in medical devices as machine learning is beginning to provide a vast amount of opportunities in medical devices.

In this hypothetical scenario, I developed a Deep-learning model that can classify Pneumonia Images with 88% accuracy. Data science is a comprehensive application and in this story, I got the chance to showcase my hard work in developing a deep-learning model

Data Understanding

The X-ray images used in this git repo are of pediatric patients. The classification is a binary case on whether a patient has Pneumonia. The dataset comes from Kermany et al. on Mendley. The dataset on Kaggle is from the original source using Version 2 that was published on January 01, 2018. The validation folder consisted on 16 images. This quantity was insufficient in my opinion to truly grasp if the deep learning model would be overfitting. Instead a custom validation folder was made by incorporating the 16 images in the val folder back into the train folder and randomly selecting 20% of the train set of each category.

Best Model

Base Convolutional Model

The model architecture was simple with one Convolutional Layer of 16 filters, one hidden Dense Layer of 256 neurons, and one output layer consisting on 1 neuron for binary classification. This model did well in training with promising generalization as seen in the Epoch vs Loss graph but, with unknown data, the model does very well in classifying Pneumonia but not Normal Images. We can see that in the metrics as in training both Precision and Recall were high in the validation and training sets. In the test set only the Recall was high at 99% and low in Precision at 71%. This is due to the small amount of Normal images in the training set in comparison to Pneumonia.

Set	Loss	Precision	Recall	Accuracy
Train	0.020	100.00%	100.00%	100.00%
Test	1.336	71.85%	99.84%	75.32%
Validation	0.097	98.56%	97.68%	97.22%

Augmentation Model

To combat the low availability of Normal images data augmentation was utilized to create synthetic examples of Normal images for the model to train on. Taking the pre-trained model I introduced data augmentation generators that act as brand new images to learn from. Data augmentation resulted in better performance of unknown data. The training metrics on the augmented data show how it began to do better on Normal Images as seen with the high Precision scores in the train and validations sets. The test set used a data generator that supplied unaltered images. This was done to truly grasp its performance on the actual test set.

Set	Loss	Precision	Recall	Accuracy
Train	0.228	98.64%	88.79%	90.78%
Test	0.360	88.43%	94.10%	88.62%
Validation	0.217	98.85%	89.04%	91.10%

Conclusion

The best model was the augmented model as this model was the best-performing model across the board. It is slightly more complex than its base version but as a result, it learned from augmented data leading to better performance in Normal images. Please check out my entire project in the GitHub repository where I try many more iterations such as Transfer Learning!

DEV Community

Pneumonia Detection

Overview

Data Understanding

Best Model

Base Convolutional Model

Augmentation Model

Conclusion

Top comments (0)

Read next

NVIDIA Ampere Architecture for Deep Learning and AI

Mastering the Art of Clean Code: Essential Principles Every Developer Should Know (and some memes)

Deploying React Apps with Vite: The Complete Guide

Geocoding-Web-Application