David Regordosa

Posted on Apr 11, 2020

Autoencoders to add exposure to Galaxy Images

#python #keras #autoencoders

Hello everybody! :)

To understand this post is important to understand what autoencoders are. I'm working with autoencoders a long time ago, and i'm absolutelly in love about the properties they have.

An autoencoder is just a Neural Network but with a specific topology, used to learn data encodings in an unsupervised manner. The autoencoders can learn a set of data, producing a dimensional reduction, while training the network to be able to ignore noise.

In other words, and reducing the autoencoder to the minimum configuration possible, the autoencoder has an input layer, a hidden layer called "bootleneck" (because it's smaller than the input one), and an output layer with the same size as the input one. (check the image below. Credits: https://www.jeremyjordan.me/autoencoders/)

Now let's imagine that we train this autoencoder to be able to reproduce the input. We can feed the neural network with galaxy images and train it to produce the exit as a reconstruction of the input. Note: the dataset used comes from https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/

At this point we'll have a neural network that is able to reconstruct a galaxy image from a input image galaxy...mmm...maybe not so spectacular, but with some interesting features.

If we split our autoencoder in 2 parts, the encoder and the decoder, we'll have the following features.
With the encoder we'll be able to generate a dimensional reduction of each galaxy. Note that the bootleneck layer is also known as latent space.

And the other part of the autoencoder, the decoder, can be used to choose a point in our latent space ang reconstruct the corresponding galaxy.

Ok, now that we understand what an autoencoder can do for us, let's try something different.
Imagine that we create a dataset with galaxy images with some noise added and train the network to genereate the same galaxy without noise.
Then we train the autoencoder with the noisy data as the inputs, and the clean data the outputs.
Some examples i did:

Note that these are test images, separated from the train set in order to test the autoencoder once the training was finished. So the autoencoder had never seen those images, and was able to reproduce a version without noise. Nice.

Now, let's think about another use of the autoencoder.
Some amateur astronomers (i'm one of them), have small telescopes wich give very faint galaxy images. To be able to generate good galaxy images with a amateur telescope is needed a good CCD, several exposure time, and a very good telescope calibration (polar alignment). In some cases, is very dificult to have long exposure images.

So, why not to train our autoencoder with galaxy images manipulated to have low exposure as inputs, and the original galaxy images as the output?
We are going to train the autoencoder with more than 61k galaxy images of (106x106 pixels), and the autoencoder will learn how to generate a "normal" galaxy image from a low exposure one.

The result is not perfect, but looks nice.

Original: is the original image in the dataset.
Low exposure: is the image modified to force a low exposure version and the input of the autoencoder.
Reconstructed: is the resulting autoencoder output) : Note that the point here is to get a reconstructed image as similar as possible to the Original one. And, also keep in mind that these are galaxy images never seen by our autoencoder, so the trick here is that the autoencoder, when receive a low exposure image of a galaxy (that the autoencoder had never seen), is able to reproduce a galaxy image without the low exposure.

And last, with Keras, the definition of this autoencoder is pretty easy.
First, we read the dataset and split it into training set and test (10% of the data set to test).

x_train, x_test = train_test_split(x_train, test_size=0.1, random_state=42)
x_train_noise = simulate_low_exposure(x_train)
x_test_noise = simulate_low_exposure(x_test)

def simulate_low_exposure(x,max,min,perc): 
    return np.where(x-perc>min,x-perc, min)

The simulate low exposure function is just a noisy function not exactly a low exposure (to be honest), but do the trick.
And finally we define the autoencoder, a very simple one:

def build_one_layer_autoencoder(img_shape, code_size):
    # The encoder
    encoder = Sequential()
    encoder.add(InputLayer(img_shape))
    encoder.add(Flatten())
    encoder.add(Dense(code_size))

    # The decoder
    decoder = Sequential()
    decoder.add(InputLayer((code_size,)))
    decoder.add(Dense(np.prod(img_shape))) 
    decoder.add(Reshape(img_shape))

    return encoder, decoder

The function parameters are the image shape and the size of the bootleneck layer (number of nodes), and returns the encoder and the decoder sepparately in order to allow us to play :)

#The size of the first layer will be width x Height of the IMAGE_SHAPE, and the bootleneck layer will be, for example 1000 nodes.
encoder, decoder = build_one_layer_autoencoder(IMG_SHAPE, 1000)
inp_shape = Input(IMG_SHAPE)
code = encoder(inp_shape )
reconstruction = decoder(code)
autoencoder = Model(inp,reconstruction)
autoencoder.compile(optimizer='adamax', loss='mse')

#And now we are ready to train the autoencoder with the noisy galaxies (x_train_noise) as input and the original galaxies (x_train) as output.
#Also x_test_noise and x_test are the test dataset.
history = autoencoder.fit(x=x_train_noise, y=x_train, epochs=10, validation_data=[x_test_noise, x_test])

And that's all. Just posted a little bit of the code, the most interesting. Promise i will clean the code and publish to github. :)

There are a lot of posts of noise reduction using autoencoders, i got the idea and some code snippets reading some of them.

I like the approach to treating low exposure as noise.

Thanks for reading!

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

DEV Community

Autoencoders to add exposure to Galaxy Images

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Top comments (0)

Read next

การใช้งาน Polyglot notebook กับ Python

Places365 in PyTorch

Solution for "DLL Load Failed Due to Absence of Wheel for sqlcipher3" Error

De cero a Ingeniero de Software