DEV Community

Adam
Adam

Posted on

Image super-resolution using GAN (SRGAN)

Hi and welcome to my other blog post on the series on image super-resolution using GAN. This post is heavily based on this research paper that proposed a better way to solve image super-resolution problem compared to other methods available at the time

So before we start , I will assume you have a basic understanding of super-resolution using CNN, if you're not you can check out this series i made explaining it. I also would assume that you already have basic understanding of GAN or Generative Adversarial Network. If not, you can check my previous post on it here

The problem with SRCNN

So one big problem that's the solution is proposed in the paper is the loss function used in SRCNN, the Mean Squared Error loss function. Although it works fine as a loss function, it didn't preserve the perception we human have on images, it just try to make sure the pixel values match the label as close as possible. But the human eye didn't view image on the pixel-basis, so using this loss function makes the model miss opportunities to capitalize on the more important factor for human in a high resolution image.

The paper proposed a new loss function, called the perceptual loss which is a combination of adversarial loss and content loss. The exact equation used in the paper is here, although we won't dive into the math here:

Perceptual loss function

So the perceptual loss is although a combination of content loss and adversarial loss, it seems like the content loss is given 1000x more priorities than the adversarial loss. You can think of it like this. If the adversarial loss is given high priorities or in this case weights, than the perceptual loss, than our result would be the generator produce image that are too similar to the original image, which misses the point of actually generating realistic super-resolved image.

Content loss

So basically this loss function is a form of MSE loss but modified to not depend on the pixel value, giving more priority for the perception of image itself. This is achieved by combining it with VGG loss, where VGG is a pre trained model on millions of images that have a good understanding of what image is made up of. This way, the model would have better perception of image quality compared to pixel-wise loss function.

Adversarial loss

This loss is added to the perceptual loss equation to favor the image that closely resemble the original image. This loss is used for the adversarial portion of the GAN, which is when the output of the generator act as input of the discriminator.

Discriminator

The job of discriminator here is to determine whether an image is an original image or super-resolved image. The architecture of the discriminator is as follows:

Image description

Note that k = kernel size, n = number of feature maps, and s = strides.
BN is batch normalization which is a way to further normalize the data in a batch so that if an outlier value exist during training, it won't affect the network much if it's been normalized. Note that the last 4 layers is implemented to flatten the previous layer's array onto a single dimension, like a normal ANN. It also introduces non- linearity through Leaky ReLu, combining all the weights onto single output dense layer, then finally adding sigmoid function to it for the final output which will classify whether an image is real or super-resolved.

Generator

Meanwhile, the generator job is to convince the discriminator that the image it produced is a real image, not a super-resolved one. The architecture for generator as proposed in the paper is here:

Image description

Also here, note that the actual number of residual blocks is 16, which is mentioned in the paper but omitted in the image for simplicity. This generator used the perceptual loss is used for training of this network as proposed by the researchers.

Conclusion

So that's the overview of super-resolution using GAN. Hope you guys like it and learn something new here. If you would like to learn more, I link video resources from Youtube down below in the resources section which give a more detailed explanation as well as how to implement the model in terms of code. So, that's all for me for now and see you guys next time!

Resources

If you want the complete walkthrough of the paper, I found these videos helpful:

Top comments (0)