It was a rainy afternoon in September. The view from my window was all gray and blurry. That reminded me of the first images my GAN made โ just fuzzy chaos on the screen. ๐ง๏ธ
I'd been learning Deep Learning for months. I wanted to create something new with code. Not just numbers, but something that looks real.
So, I tried Generative Adversarial Networks. I built it from scratch with Keras and TensorFlow, taking help from ChatGPT. ๐ป
The dataset? MNIST. It has 70,000 handwritten digits. They look like quick notes from people long ago โ scribbled 7s and curvy 8s. ๐
Think of two neural networks working against each other:
- The Generator takes random noise โ just a bunch of random numbers. It tries to turn that into a digit image. At first, it makes blurry shapes. You might guess it's a 3... or something else. ๐
- The Discriminator checks the images. It knows real MNIST images. It says "fake" to the bad ones and "real" to the good ones. ๐ฎโโ๏ธ
Source: DZone Article on GAN Principles
Think of GANs like a counterfeiter (Generator) trying to make fake money that fools the police (Discriminator). The bank provides real money for training. Over time, the fakes get better! ๐ธ
Architecture of A GAN network :
Source: Jonathan Hui on Medium
The cool part? They learn together. The Generator gets better at tricking the Discriminator. The Discriminator gets better at spotting fakes. After many training steps, the fake digits look real. Like they were drawn by hand. ๐จ
I did this project as an experiment. I wanted to understand how GANs work. It was fun to see it come together. Next, I plan to use this to make realistic human faces. ๐
Building the Networks ๐ ๏ธ
I used Jupyter notebook to build it. The Generator starts with 100 random numbers. It uses dense layers and LeakyReLU to shape them. Then it turns them into a 28x28 image. It's like building a picture from nothing. ๐ผ๏ธ
The Discriminator takes 28x28 images. It uses Conv2D layers to look for patterns. It ends with a yes or no: real or fake.
Parts | Generator | Discriminator |
---|---|---|
Input | 100 random numbers | 28x28 image |
Main Layers | Dense then reshape | Conv2D then flatten |
Activations | LeakyReLU, Tanh | LeakyReLU, Sigmoid |
Output | Fake image | Real (1) or Fake (0) |
At the last of the blog I have explained more about the Discriminator and Generator network which is have used in my code.
Training took time on my GPU. I saved the models so I could start again if needed. I also saved images every few steps to see progress. Later, I made a GIF from them. And I zipped all the images together. โณ
Seeing the Digits Improve ๐
Around step 5, I saw a shaky 4 appear. That was exciting. By step 25, the digits looked good. 1s had straight lines. 9s had curves. At step 100, the Discriminator could not tell fakes from real very well. That meant success.
Here are some images from training:
Pure Noise | Epoch 1 |
---|---|
![]() |
![]() |
From total randomness to the first blurry hints of digits โ like fog lifting just a bit.
Epoch 2 | Epoch 3 |
---|---|
![]() |
![]() |
Faint shapes emerging... is that a 6? The Generator is starting to get the idea.
Epoch 9 | Epoch 17 |
---|---|
![]() |
![]() |
Things are sharpening up! Each digit feels a little more like real handwriting, with its own quirks.
Epoch 25 | Epoch 50 |
---|---|
![]() |
![]() |
Confidence building โ straight lines for 1s, smooth curves for 9s. The duel is heating up.
Epoch 75 | Epoch 100 |
---|---|
![]() |
![]() |
Nearly there! These fakes could pass for real MNIST scribbles. Magic in the making. โจ
Here is a GIF of 16 digits improving over time.
It shows how random noise turns into clear digits in 10 seconds.
GAN Architecture in My Project ๐ง
Discriminator Architecture
The Discriminator network processes 28x28 grayscale images and outputs a probability (0 for fake, 1 for real). Here's the layer-by-layer breakdown:
Layer Name | Type | Input Shape | Output Shape |
---|---|---|---|
conv2d_6 | Conv2D | (None, 28, 28, 1) | (None, 14, 14, 64) |
leaky_re_lu_30 | LeakyReLU | (None, 14, 14, 64) | (None, 14, 14, 64) |
dropout_6 | Dropout | (None, 14, 14, 64) | (None, 14, 14, 64) |
conv2d_7 | Conv2D | (None, 14, 14, 64) | (None, 7, 7, 128) |
leaky_re_lu_31 | LeakyReLU | (None, 7, 7, 128) | (None, 7, 7, 128) |
dropout_7 | Dropout | (None, 7, 7, 128) | (None, 7, 7, 128) |
flatten_3 | Flatten | (None, 7, 7, 128) | (None, 6272) |
dense_11 | Dense | (None, 6272) | (None, 1) |
This architecture uses convolutional layers for feature extraction, dropout for regularization to prevent overfitting, and LeakyReLU activations to maintain gradient flow.
Generator Architecture
The Generator network starts with random noise input and progressively upsamples it to produce 28x28 grayscale digit images. Here's the layer-by-layer breakdown:
Layer Name | Type | Input Shape | Output Shape |
---|---|---|---|
dense_9 | Dense | (None, 100) | (None, 1254) |
batch_normalization_18 | BatchNormalization | (None, 1254) | (None, 1254) |
leaky_re_lu_24 | LeakyReLU | (None, 1254) | (None, 1254) |
reshape_6 | Reshape | (None, 1254) | (None, 7, 7, 256) |
conv2d_transpose_18 | Conv2DTranspose | (None, 7, 7, 256) | (None, 7, 7, 128) |
batch_normalization_19 | BatchNormalization | (None, 7, 7, 128) | (None, 7, 7, 128) |
leaky_re_lu_25 | LeakyReLU | (None, 7, 7, 128) | (None, 7, 7, 128) |
conv2d_transpose_19 | Conv2DTranspose | (None, 7, 7, 128) | (None, 14, 14, 64) |
batch_normalization_20 | BatchNormalization | (None, 14, 14, 64) | (None, 14, 14, 64) |
leaky_re_lu_26 | LeakyReLU | (None, 14, 14, 64) | (None, 14, 14, 64) |
conv2d_transpose_20 | Conv2DTranspose | (None, 14, 14, 64) | (None, 28, 28, 1) |
This architecture uses transposed convolutions for upsampling, batch normalization for stability, and LeakyReLU activations to prevent vanishing gradients.
When the rain stopped, I looked at all the images. This project taught me a lot about GANs. It showed how trial and error leads to something new. Now, I want to try better ways to check quality, like FID scores. Maybe use more layers. Or try it on color images from CIFAR-10.
I also want to make Conditional GANs. That way, I can make a specific digit, like just 7s.
Check the code on GitHub. Try it yourself. What GAN project have you done? Tell me in the comments.
Top comments (0)