In the field of machine learning, there's all kinds of models and architecture proposed by researchers around the world every year to solve a particular problem. One such model architecture are called Generative Adversarial Network, or GAN for short. Today we are going to dive into it and learn what is it, how it works, as well as it's application in the real world.
What is it?
So first of all, I would assume you are familiar Convolutional Neural Network (CNN) because GAN is built on top of it with a little more modification to it. If you didn't know what CNN is yet, you can read my blog post series about it here
So now that that's out of the way, let's dive into it. What does GAN stands for? It's Generative Adversarial Network. To put it simply, the network contain some adversarial or in other word competing part that generate something. So for there to be a competition there needs to be at least two people or things right? So in this context, the two things are the Generator Discriminator and Discriminator model.
Discriminator
Let's start with the discriminator. Its main job is to differentiate fake data from real data. What does that mean? Suppose we're using a GAN to generate fake human faces. The discriminator's job is simple: it tells whether an image is a real human face or not. You can imagine the structure of this model similar to a normal CNN with a sigmoid function on the output layer that gives the probability that the image is a human face. Pretty simple, right?
Generator
Now, the generator's job is to create fake images that it inputs into the discriminator to fool it. Returning to the human face example, the generator's role is to create fake human faces, starting from random noise and outputting an image to pass as input to the discriminator. Structurally, the generator is similar to a CNN that outputs pixel values of an image.
Combining them together
First, we train the discriminator model. We train it with a combination of real and fake human face images, labeled so it can learn to differentiate them. It does this by extracting features of the images, like recognizing that human faces have two eyes and a nose. Once the discriminator gets good at its job, we start training the generator. Initially, the generator produces random images that don't look like faces at all. These images are passed to the discriminator, which correctly identifies them as fake.
Based on the results, the model that loses (incorrectly identifies or generates) updates itself to improve. For example, if the discriminator correctly identifies a fake face, the generator learns from this feedback and adjusts to produce more realistic faces. Conversely, if the discriminator mistakenly identifies a real face as fake, it updates itself to improve its accuracy. This process continues iteratively until the generator produces convincingly realistic images.
Conclusion
That's the basic idea of GAN. There are many use cases of it, ranging from computer vision, natural language processing and even game development and virtual reality. In the next post though, we will see how GAN is implemented in the task of image super resolution using GAN, or SRGAN for short, based on a fairly new research paper on 2017. Until then, I hope you guys like the post and learn something from it. See you!
Top comments (0)