DEV Community

Cover image for Day 1: Tried building my first CNN (and learned why 8 validation images are useless)
Jacob
Jacob

Posted on

Day 1: Tried building my first CNN (and learned why 8 validation images are useless)

Accidentally Public: My First CNN for Pneumonia Detection

Three days ago, I made my first repo public on GitHub by accident. Panic mode: activated. But instead of hiding it, I decided to share the journey.

I’m a final-year medical student diving into the tech side of healthcare, and I just built my first Convolutional Neural Network (CNN) to detect pneumonia from chest X-rays. Spoiler: it works… but my data setup was very questionable.


What I Built

  • Goal: Classify chest X-rays as normal or pneumonia
  • Dataset: Chest X-Ray Images (Pneumonia) – 5,863 pediatric X-rays
  • Tech: Python, TensorFlow/Keras, Google Colab, LLMs to support my pathetic coding skills
  • Model: Simple CNN (~5.4M parameters, 3 conv blocks + pooling + dense head)

So sort of, how to tell the left image from the two on the right apart:

This was my way of bridging medicine (I see more than 50 shades of grey in an X-ray by now, though I’m not a radiologist) with my growing interest in AI.

I didn’t know what a CNN really was at first. For anyone new, this video is a helpful, very basic introduction:

Watch on YouTube: "What is a Convolutional Neural Network? (CNNs)"

For a deeper dive, here’s a great in-depth playlist provided by the channel Futurology - An Optimistic Future:

Deep Learning Playlist


Lessons That Humbled Me

1. Tiny validation set = chaos 🎢

The dataset’s “validation” split? Just 8 normal + 8 pneumonia images. That’s 0.14% of the data.

My metrics looked wild:

  • Epoch 1 → val_acc: 0.56, val_auc: 0.95
  • Epoch 3 → val_acc: 0.75, val_auc: 0.98

Lesson: Data splitting isn’t just about having folders—it’s about meaningful sample sizes.


2. The lazy pneumonia-predicting machine 🤖

Class imbalance:

  • Normal → 1,583
  • Pneumonia → 4,273
  • Ratio ~1:2.7

My model started predicting “pneumonia” most of the time, scoring decent accuracy but missing normals. The only purpose of the confusion matrix was to confuse me (ngl, ChatGPT had to interpret it), but it showed the truth: strong on pneumonia, weak on normals.

Accuracy ≠ whole story—precision/recall matter.


3. Model summaries aren’t magic 📋

The model.summary() with 5.4M parameters intimidated me, until I broke it down:

  • Conv2D → edge/pattern detectors
  • Dense layers → where parameter count explodes (~5.3M of 5.4M)
  • Each “neuron” = input → math → output

From a medical lens, it felt like tracing how biological neurons fire.


4. Data quality > fancy architecture 💎

My biggest realization: no clever CNN can fix broken training splits or imbalance. I spent more time fixing the pipeline than tweaking hyperparameters—and that was the right call.


What’s Next

  • Proper 70/15/15 train/val/test splits
  • Class weights to balance mistakes
  • Better metrics (precision/recall per class)
  • Augmentation for underrepresented normal cases

Follow Along

The full learning process (project research and notes, confusions, breakthroughs, even “I don’t fully get this yet” moments) is on GitHub:

👉 chest-xray-cnn

Because learning isn’t just about polished results—it’s about the messy middle too.


I’ll keep sharing as I go—because this is just the start of my journey into AI + healthcare. Happy about any inputs or feedback 🚀

Acknowledgements

  • The Kaggle dataset authors for making this resource freely available
  • TensorFlow/Keras docs + tutorials that kept me afloat
  • The open-source community on GitHub
  • ChatGPT (yes, I used it heavily!) for debugging, explaining confusion matrices, emotional support, and not judging my messy code

Top comments (0)