DEV Community

Cover image for Classifying Galaxies with Deep From Stardust to Code: Building a Neural Network to Classify Galaxies
Muskan Aman
Muskan Aman

Posted on

Classifying Galaxies with Deep From Stardust to Code: Building a Neural Network to Classify Galaxies

The universe is vast - home to billions of galaxies, each with its own unique shape and story. For over a century, astronomers have relied on human eyes to sort galaxies into categories like spiral (with elegant arms) and elliptical (smooth, featureless blobs).

In this project, I set out to build a Convolutional Neural Network (CNN) that can classify galaxies as either spiral or elliptical. What started as a curious experiment soon evolved into a full-fledged pipeline: from data preprocessing, to model training, to deploying a real-time Streamlit web app that anyone can try.

The Problem
At first glance, spirals and ellipticals might seem easy to tell apart - spirals flaunt their arms while ellipticals glow like round smudges. But astronomy datasets often contain noisy, faint, and low-resolution images where even experts can disagree.
The question:

 👾Can a machine learning model learn to capture these subtle differences and classify galaxies automatically?

The Dataset
I worked with a labeled dataset of galaxy images resized to 256×256 pixels in RGB. The dataset included:
Spiral galaxies 🌀 

Elliptical galaxies 🔵

The data was split into training and test sets, ensuring that the model was tested on images it had never seen before.

To prepare the data:
Images were normalized (pixel values scaled between 0–1).
Augmentation techniques (flips, rotations) were tested to make the model robust to variations.

The Neural Network
The backbone of this project was a Convolutional Neural Network (CNN).
The architecture looked like this:

  • Convolution + ReLU layers: extract features like edges, curves, and textures.
  • MaxPooling layers: reduce dimensionality while preserving key features.
  • Dense layers: combine learned features to make a decision.
  • Softmax output: predicts whether a galaxy is spiral or elliptical.

Built with TensorFlow/Keras, the model was trained for 10 epochs with a batch size of 32.

Training & Results
At first, the model struggled - often predicting spiral for almost everything. With tweaks (validation split, balancing classes, and fine-tuning hyperparameters), performance improved.

Final results:

  • Accuracy on test set: ~XX% (replace with your actual score once stable)
  • Confusion matrix: showed the model could distinguish spirals from ellipticals, though some borderline cases still tripped it up.

Here's a sneak peek at predictions:
Galaxy 1 → Predicted: Spiral ✅
Galaxy 2 → Predicted: Elliptical ✅
Galaxy 3 → Predicted: Spiral ❌ (actually elliptical)

It was fascinating to see where the model got it right - and where it revealed the same challenges human astronomers face.

Key Learnings
Balanced datasets matter - my model initially leaned toward "spiral" because of class imbalance.
Validation is critical - without a proper split, the model seemed perfect during training but failed miserably on new data.
Deployment completes the story - building the app made the project tangible, not just code hidden in notebooks.

Top comments (0)