DEV Community

Cover image for Star Multi-Class Classification Neural Network With Pytorch
Ziad Alezzi
Ziad Alezzi

Posted on

Star Multi-Class Classification Neural Network With Pytorch

Introduction:

Whenever I hear any stories about stars far far away from earth, Im talking millions of light years away not possibly seen visually through any telescope, I always wonder:

"How on earth (pun intended) do they find out what typa star it is from this far away?!"


And to satisfy this curiousity, I did a little google search and found out that every star has its own life cycle, ranging from a few million to trillions of years, and its properties change as it ages. And by measuring these properties, we can deduce the type of star.

But I'm not satisfied until I do it myself. So in this notebook I'll be writing a model that classifies a star's type based on it's features. (And also serve as a good practice for Pytorch)

Dataset

The dataset consists of 6 collumns.

  • The Temperature of the star
  • The Luminosity of the star
  • It's Radius
  • It's Absolute Magnitude
  • It's General Color of Spectrum
  • The Spectral Class

Everything should be obvious, however the Spectral Class might be news.

An asteroid spectral type is assigned to asteroids based on their reflectance spectrum (its effectiveness in reflecting radiant energy)

import os
from pathlib import Path

iskaggle = os.environ.get("KAGGLE_KERNEL_RUN_TYPE", '')
if iskaggle: path = Path('../input/star-type-classification')
else:
    path = Path('data')
Enter fullscreen mode Exit fullscreen mode

Here's a quick look at the data:

import pandas as pd

data = pd.read_csv(path/'Stars.csv')
data
Enter fullscreen mode Exit fullscreen mode

output:

temperature luminosity radius magnitude color class label
0 3068 0.002400 0.1700 16.12 Red M 0
1 3042 0.000500 0.1542 16.60 Red M 0
2 2600 0.000300 0.1020 18.70 Red M 0
3 2800 0.000200 0.1600 16.65 Red M 0
4 1939 0.000138 0.1030 20.06 Red M 0
... ... ... ... ... ... ... ...
235 38940 374830.000000 1356.0000 -9.93 Blue O 5
236 30839 834042.000000 1194.0000 -10.63 Blue O 5
237 8829 537493.000000 1423.0000 -10.73 White A 5
238 9235 404940.000000 1112.0000 -11.23 White A 5
239 37882 294903.000000 1783.0000 -7.80 Blue O 5

240 rows × 7 columns

Sampling the dataset is cool and all, but we wanna see some more important information

data.describe()
Enter fullscreen mode Exit fullscreen mode

output:

temperature luminosity radius magnitude label
count 240.000000 240.000000 240.000000 240.000000 240.000000
mean 10497.462500 107188.361635 237.157781 4.382396 2.500000
std 9552.425037 179432.244940 517.155763 10.532512 1.711394
min 1939.000000 0.000080 0.008400 -11.920000 0.000000
25% 3344.250000 0.000865 0.102750 -6.232500 1.000000
50% 5776.000000 0.070500 0.762500 8.313000 2.500000
75% 15055.500000 198050.000000 42.750000 13.697500 4.000000
max 40000.000000 849420.000000 1948.500000 20.060000 5.000000

Quick Overview

Some values differ alot, and this outliers are not very good for writing a model that can generalize to new data.

The smallest value for luminosity is as low as 0.00008 (That's 4 zeros!!) whereas the largest gets up to ~850,000

Now come's a dillemma:

"Should I normalize the data, or preserve their differences?"


Best way to know, is to check if these overly small/huge values are truly outliers and must be normalized.

Almost half of the dataset has it's luminosity in the decimals!! With the rest being in the hundreds of thousands.

Alright, we'll have to normalize this. I'll use Z-Score which is very simple, you subtract the original value by the mean of the column, and divide that by the standard deviation.

z = (x - μ) / σ

import torch
import matplotlib.pyplot as plt

def z_score(x): return (x - torch.mean(x)) / torch.std(x)

normalized_luminosity = z_score(torch.tensor(data.luminosity))



plt.subplot(2, 1, 1)
plt.scatter(data['luminosity'].to_numpy(), range(240))
plt.title('Before')
plt.subplot(2, 1, 2)
plt.scatter(normalized_luminosity.numpy(), range(240))
plt.title('After')
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

From hundreds of thousand's to single digits, while still preserving the data's structure! Perfect!


Preparation

Now ill convert the dataframe into a tensor and prepare it to be chucked into a neural network.

This dataset already is very well-put together. There are no nil values, however there is a need for dummy columns.

So let's make this quick, and get it over with!

First up is defining the dependant and independant variables as tensors

import numpy as np

t_dep = torch.tensor(data['label'])
t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)
Enter fullscreen mode Exit fullscreen mode
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[22], line 4
      1 import numpy as np
      3 t_dep = torch.tensor(data['label'])
----> 4 t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\generic.py:6643, in NDFrame.astype(self, dtype, copy, errors)
   6637     results = [
   6638         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6639     ]
   6641 else:
   6642     # else, only a single dtype is given
-> 6643     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6644     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6645     return res.__finalize__(self, method="astype")


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
    427 elif using_copy_on_write():
    428     copy = False
--> 430 return self.apply(
    431     "astype",
    432     dtype=dtype,
    433     copy=copy,
    434     errors=errors,
    435     using_cow=using_copy_on_write(),
    436 )


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    755         raise ValueError("Can not squeeze with more than one column.")
    756     values = values[0, :]  # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    760 new_values = maybe_coerce_values(new_values)
    762 refs = None


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
    179     values = values.astype(dtype, copy=copy)
    181 else:
--> 182     values = _astype_nansafe(values, dtype, copy=copy)
    184 # in pandas we don't store numpy str dtypes, so convert to object
    185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):


File c:\Users\user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
    129     raise ValueError(msg)
    131 if copy or arr.dtype == object or dtype == object:
    132     # Explicit copy, or required since NumPy can't view from / to object.
--> 133     return arr.astype(dtype, copy=True)
    135 return arr.astype(dtype, copy=copy)


ValueError: could not convert string to float: 'Red'
Enter fullscreen mode Exit fullscreen mode

Oh golly jargon!! Damn, what went wrong?!
Let's see here..

ValueError: could not convert string to float: 'Red'
Enter fullscreen mode Exit fullscreen mode

We tried to convert a word into a number!

Ah.. almost forgot math needs numbers..

Dummy Columns

data['color'].unique()
Enter fullscreen mode Exit fullscreen mode

output:

array(['Red', 'Blue White', 'White', 'Yellowish White', 'Blue white',
       'Pale yellow orange', 'Blue', 'Blue-white', 'Whitish',
       'yellow-white', 'Orange', 'White-Yellow', 'white', 'yellowish',
       'Yellowish', 'Orange-Red', 'Blue-White'], dtype=object)
Enter fullscreen mode Exit fullscreen mode

As we can see, the color column has 17 possible colors a star can be.

However, since we can only accept numbers, an easy fix is using Dummy Columns

Ill give you an example, say we had data for a group of people. One of the columns is "Gender" where each person could have the value "Female" or "Male". To turn these into numbers, we'll create 2 new columns:

  • "Female?"
  • "Male?"

Say you have bob, a professional male specializing in being a man. His female column would be "0" for false, and his male column would be "1" for true.

Boom!! Problem solved.

So to apply this here, we'll just make 17 columns for every color of star.

data = pd.get_dummies(data, columns=['color', 'class'])
data[["color_Red", "color_Blue White", "color_White", "color_Yellowish White", "color_Blue white", "color_Pale yellow orange", "color_Blue", "color_Blue-white", "color_Whitish", "color_yellow-white", "color_Orange", "color_White-Yellow", "color_white", "color_yellowish", "color_Yellowish", "color_Orange-Red", "color_Blue-White"]].sample(5)
Enter fullscreen mode Exit fullscreen mode

output:

color_Red color_Blue White color_White color_Yellowish White color_Blue white color_Pale yellow orange color_Blue color_Blue-white color_Whitish color_yellow-white color_Orange color_White-Yellow color_white color_yellowish color_Yellowish color_Orange-Red color_Blue-White
202 False False False False False False True False False False False False False False False False False
56 True False False False False False False False False False False False False False False False False
103 False False False False False False True False False False False False False False False False False
176 False False False False False False True False False False False False False False False False False
8 True False False False False False False False False False False False False False False False False

Phew, let's try this again now.

import numpy as np
t_dep = torch.tensor(data['label'])
t_indep = torch.tensor(data.drop(columns=['label']).astype(np.float32).values, dtype=torch.float)
t_indep = (t_indep - t_indep.mean()) / t_indep.std()
Enter fullscreen mode Exit fullscreen mode

No errors! Eureka!

Defining the Neural Network

For the math-focused nerds, you can imagine a neural network simply as one big ol' composite function. Multiple layers, with each layer containing multiple units, and each unit taking in a matrix as input, multipliying it with a matrix of coefficients.

The main import thing to remember for the coefficients, is their matrix dimensions. Let's say we have the shape of a matrix of coefficients as [x, y]:

  • x: Number of units of previous layer
  • y: Number of units of next layer

First function we must define is one to initialize the coeffs
We'll create the number of hiddens layers, and units in each layer. Then, we'll loop over each hidden layer to create randomized matricies of parameters and constants in correct shape.

Quick note:

The values in the matricies of parameters MUST be randomized. Since during backpropagation, initializing all parameters to 0 will result in identical gradients that would effectivally cancel the training out.

initializing the coeffs

n_coeffs = t_indep.shape[1]

def init_coeffs():
    hiddens = [10, 10] # Update for each hidden layer
    sizes = [n_coeffs] + hiddens + [6]
    n = len(sizes)
    layers = [(torch.randn(sizes[i], sizes[i+1])) * 0.1 for i in range(n-1)]
    consts = [torch.randn(1, sizes[i+1]) for i in range(n-1)]
    for layer in layers + consts: layer.requires_grad_()
    return layers, consts
Enter fullscreen mode Exit fullscreen mode

If this was NumPy, I would be forced to calculate the derivatives for all parameters in the neural network. Yes, id automate it in a loop. But since I really only have an intuative understanding of derivatives, i never fully understand whats going on.

Luckily for us, in PyTorch, the derivatives are automatically calculated aslong as we define the cost function!! While intializing the coeffs, simply adding the line layer.requires_grad_() told PyTorch to start tracking that layer to layer calculate it's gradients.

import torch.nn.functional as F

def calc_preds(coeffs, indeps):
    layers, consts = coeffs
    n = len(layers)
    y_pred = indeps

    for i, layer in enumerate(layers):
        y_pred = y_pred @ layer + consts[i]
        if i != n-1: y_pred = F.relu(y_pred)
    preds = F.softmax(y_pred, dim=1)
    logits = y_pred
    return preds, logits
Enter fullscreen mode Exit fullscreen mode

So what exactly did we do here? To answer, not much.

Simply itterated over each layer (containing the randomized coeffs) and matrix multiplied the coeffs by the input independants.
Output to a layer becoming the input to the next. With each time, applying the ReLU (Rectified Linear Unit) to the output.

The ReLU is simply a linear equation thats cut off at zero. Meaning that, any number less than 0 will be turned into a zero.

The ReLU and Tanh activation functions are the most common for neural network hidden layers. I picked ReLU for this example because it's the one I've used the most. We then finish it off with a sigmoid activation to get out binary prediction.

Ah ah ah!! Stop right there. If we were doing binary classification, we'd use a Sigmoid function. But since we have multiple outputs (multiple types of stars) this is a multiclass classification problem! We must use the SoftMAX

Updating The Parameters

Pytorch automatically tracks gradients, so subtracting the gradients is as easy as using #sub_(#grad()) :D

def update_coeffs(coeffs, lr):
    layers, consts = coeffs
    for layer in layers + consts:
        if layer.grad is not None:
            layer.sub_(layer.grad * lr)
            layer.grad.zero_()
    return layers, consts
Enter fullscreen mode Exit fullscreen mode

Crucial step Needed

We've talked about Pytorch automatically tracking the gradients.. But it can't do any of that without the loss function!

So, let's refresh our memory:

This is a multiclass classification model that uses a softmax output


Thus, an appropriate loss function would be the Categorical Cross-Entropy. However, CCE uses one-hot encoded labels. Frankly, i feel too lazy to write add in one-hot encoding. So instead, we'll use Sparse CCE which works with basic integer labels.

Sparse Categorical Cross Entropy

Here's something about me, I HAAAATE jargon.

The loss function's name is "Sparse Categorical Cross-Entropy" and i think that's the stupidest thing ever.

All this is, is fancy worded jargon meant to boost the egos of those who use it.

The bad downside, is that this deters so many people from Machine learning because of how complicated it sounds.

In reality, "Sparse Categorical Cross-Entropy" is defined as:

L = -log(P(y))

wow. how complicated.

Remainder of the code used:

def accuracy(coeffs, t_dep, t_indep):
    preds, logits = calc_preds(coeffs, t_indep)
    predicted_classes = torch.argmax(preds, dim=1)
    correct = (predicted_classes == t_dep).float()
    return correct.mean().item()
Enter fullscreen mode Exit fullscreen mode
def one_epoch(t_dep, t_indep, coeffs, lr):
    preds, logits = calc_preds(coeffs, t_indep)
    loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
    loss.backward()
    with torch.no_grad(): return update_coeffs(coeffs, lr)
Enter fullscreen mode Exit fullscreen mode
def train_model(t_dep, t_indep, epochs=300000, lr=0.00055, loss_arr=[], acc_arr=[]):
    torch.manual_seed(777)
    coeffs = init_coeffs()
    for i in range(epochs):
        coeffs = one_epoch(t_dep, t_indep, coeffs, lr)

        if i % 1000 == 0:
            preds, logits = calc_preds(coeffs, t_indep)
            loss = torch.nn.CrossEntropyLoss()(logits, t_dep)
            acc = accuracy(coeffs, t_dep, t_indep) * 100
            print(f"Iteration: {i:03d} | Loss: {loss:.4f} | Accuracy: {acc:.2f}%")
            loss_arr.append(loss.item())
            acc_arr.append(acc)

    return coeffs, loss_arr, acc_arr
Enter fullscreen mode Exit fullscreen mode
coeffs, loss_arr, acc_arr = train_model(t_dep, t_indep)
Enter fullscreen mode Exit fullscreen mode
Iteration: 299000 | Loss: 0.6482 | Accuracy: 65.83%
Enter fullscreen mode Exit fullscreen mode

Showing the results:

import matplotlib.pyplot as plt

plt.plot(range(300), loss_arr)
plt.xlabel("Itterations (in thousands)")
plt.ylabel("Loss")
plt.title("Variation of Loss During Training")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Output:


plt.plot(range(300), acc_arr)
plt.xlabel("Itterations (in thousands)")
plt.ylabel("Accuracy")
plt.title("Variation of Accuracy During Training")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Output:

This Was PyTorch

lucirie (Ziad Alezzi) · GitHub

lucirie has 20 repositories available. Follow their code on GitHub.

favicon github.com

Top comments (0)