Hello everyone! Now that I've explained the basics of CNN in the last post of this series, now I want to further expand on it's usage, specifically in super resolution or SR for short. If you guys haven't read about it, you can do so here
Super-Resolution is a type of problem where a low resolution image need to be converted to a high resolution one. There are many ways to solve this problem, but in this post I want to specifically explain a way to use CNN as it's the method this paper use to solve SR.
Types of methods in solving SR
There are two main ways SR can be achieved, either by internal-based SR or external based SR.
Internal based SR means that the high resolution (HR) image is constructed directly from the input image. This could be achieved by let's say adding an extra pixel in between each pixel, and fill that 'gap' pixel with some predicted values, like via interpolation techniques such as bicubic interpolation.
Meanwhile, external-based SR is achieved by using external data, meaning images other than the input image to find a characteristics and features that exist and trying to reconstruct a higher resolution version of the input image by the patterns we collected from many other images. One type of solution that can be classified in this category is sparse coding SR.
CNN based SR method we're discussing is another type of external based SR, where it basically improve the sparse coding based SR, but optimizing it in places where sparse coding SR didn't. To understand this further, let's understand sparse coding based SR solution.
Sparse coding
So this method is basically divided into 3 parts: patch extraction, sparse representation, dictionary learning and reconstruction.
1. Patch extraction and representation
So let's say we have an 8x8 pixels image of the number 2. We can have a patch matrix, let's give it a dimension of 2x2.
This patch matrix will overlap our image, and basically copying the overlapped pixel values and store it as a patch vector. Then, we move the patch a few pixel next, let's say of step 1, and repeat this process until the whole image is covered.
A patch could, for example, represent the horizontal straight line at the bottom of the number 2
2. Sparse representation
The patches would enter some kind of function that would make it sparse, meaning most of the values would be zeros. This would make the patch focus on the feature it extracted only.
In our example of the patch that represent the horizontal line, sparse representation of it could mean that the vector is now better represent the line, like there could be no more extra lines that connect the lower part with the upper curve of the shape 2, just the horizontal line
3. Dictionary learning
Here, we will update our dictionary with the sparse patch vector, where we can use it for reconstruction of new image later.
So, we can add the horizontal line to our dictionary, making it available to use when reconstructing another image later.
4. Reconstruction
Now that we have the patch in our dictionary, when we receive a new image as an input, we will try to represent the image using our dictionary.
For example we now get the number 7. We can use the feature horizontal line we extracted earlier from number 2, and put it to form the image of number 7.
Advantages of using CNN for this problem
So CNN solution for SR share some similarity with the sparse coding method, but instead of optimizing the algorithm by optimizing the dictionary learning part, CNN directly create a mapping between high resolution and low resolution image, essentially eliminating the use of dictionary altogether.
This is possible because of the nature of neural network that can learn complex patterns via weights and biases, different number of layers, activation functions and so on.
As a result of this, CNN method for SR is much more lightweight, making it faster and more suitable for production. Furthermore, more complex patterns could be found and used as part of reconstruction if more training data is used, hence making it a better model overall. Also, this method can also be easily customized to find the most suitable combination of advantages and trade offs by increasing and decreasing the hyperparameters like learning rate, number of layer, dimension of kernels, steps in convolution operation and so on.
How it works
So now that we know why SR using CNN could be better than other existing external based SR method, let's dive deeper into how it works.
1. Patch extraction and representation
Unlike in sparse coding where we manually extract the patch , features is extracted when the image is fed to the neural network, through the convolution operations and so on.
2. Non-linear mapping
This step is important, as it introduce non linearity to our features. Meaning, a feature could be formed from more than one type of patterns, making more possibilities for us to pick and choose the features to reconstruct the image later. This is achieved through activation functions, commonly used ones are rectified linear unit (ReLu), sigmoid and hyperbolic tangent functions.
3. Reconstruction
After training is completed, where losses is minimized and all the weights, biases and hyperparameters have been optimized, we can reconstruct a low resolution image in a feed forward manner and output a higher resolution image as an output.
Conclusion
We have discussed what is super resolution, how the problem is tackled via internal and external base methods. We also expanded on the external base method, by explaining the details of sparse coding and relating it to CNN based SR which basically improved it further by directly mapping the low resolution image to the high resolution one.
I hope this post has been helpful to anyone reading this. If I made some mistakes or overlooked some important points, please do point it out on the comments below so we all can learn more about it together. Thank you and see you guys next time!
Top comments (0)