The universe is vast - home to billions of galaxies, each with its own unique shape and story. For over a century, astronomers have relied on human eyes to sort galaxies into categories like spiral (with elegant arms) and elliptical (smooth, featureless blobs).
In this project, I set out to build a Convolutional Neural Network (CNN) that can classify galaxies as either spiral or elliptical. What started as a curious experiment soon evolved into a full-fledged pipeline: from data preprocessing, to model training, to deploying a real-time Streamlit web app that anyone can try.
The Problem
At first glance, spirals and ellipticals might seem easy to tell apart - spirals flaunt their arms while ellipticals glow like round smudges. But astronomy datasets often contain noisy, faint, and low-resolution images where even experts can disagree.
The question:
👾Can a machine learning model learn to capture these subtle differences and classify galaxies automatically?
The Dataset
I worked with a labeled dataset of galaxy images resized to 256×256 pixels in RGB. The dataset included:
Spiral galaxies 🌀
Elliptical galaxies 🔵
The data was split into training and test sets, ensuring that the model was tested on images it had never seen before.
To prepare the data:
Images were normalized (pixel values scaled between 0–1).
Augmentation techniques (flips, rotations) were tested to make the model robust to variations.
The Neural Network
The backbone of this project was a Convolutional Neural Network (CNN).
The architecture looked like this:
- Convolution + ReLU layers: extract features like edges, curves, and textures.
- MaxPooling layers: reduce dimensionality while preserving key features.
- Dense layers: combine learned features to make a decision.
- Softmax output: predicts whether a galaxy is spiral or elliptical.
Built with TensorFlow/Keras, the model was trained for 10 epochs with a batch size of 32.
Training & Results
At first, the model struggled - often predicting spiral for almost everything. With tweaks (validation split, balancing classes, and fine-tuning hyperparameters), performance improved.
Final results:
- Accuracy on test set: ~XX% (replace with your actual score once stable)
- Confusion matrix: showed the model could distinguish spirals from ellipticals, though some borderline cases still tripped it up.
Here's a sneak peek at predictions:
Galaxy 1 → Predicted: Spiral ✅
Galaxy 2 → Predicted: Elliptical ✅
Galaxy 3 → Predicted: Spiral ❌ (actually elliptical)
It was fascinating to see where the model got it right - and where it revealed the same challenges human astronomers face.
Key Learnings
Balanced datasets matter - my model initially leaned toward "spiral" because of class imbalance.
Validation is critical - without a proper split, the model seemed perfect during training but failed miserably on new data.
Deployment completes the story - building the app made the project tangible, not just code hidden in notebooks.
Top comments (0)