"GANs either learn to create art β or break your patience."
π Project Idea
Hey there! In this blog post, Iβll share my rollercoaster journey of building a Conditional GAN (cGAN) to generate black-and-white hand-drawn objects using the Berlin Sketches dataset.
The idea was simple:
"Give a label, get a drawing."
But of courseβ¦ it didnβt go as smoothly as expected :)
π§± Models and Training Scripts Used
Throughout the project, I used three different architectures and training strategies, each in its own phase:
1. Classic CGAN (Basic Setup)
-
Model files:
generator.py
,discriminator.py
-
Training script:
train.py
# Generator (classic)
self.label_emb = nn.Embedding(num_classes, num_classes)
x = torch.cat([noise, label_embedding], dim=1)
# Discriminator (classic)
self.label_embedding = nn.Embedding(num_classes, num_classes)
x = torch.cat([images, label_embedding.expand(...)], dim=1)
2. Improved Training
- Same model, but better training loop
-
Script:
train_2.py
- β Learning rate tweaks
- β Label smoothing
- β Fixed noise evaluation
real_targets = torch.full_like(real_preds, 0.9) # label smoothing
loss_d = (loss_d_real + loss_d_fake) / 2
3. Upgraded Architecture (Powerful, but Failed to Train)
-
Model files:
improved_generator.py
,projection_discriminator.py
-
Training scripts: new
train.py
andtrain_2.py
for improved models - β Z dim: 512
- β Feature maps: 256
- β
BCEWithLogitsLoss
, LR scheduler, gradient clipping - β
Extensive data augmentation using
data_augment.py
# Generator (improved)
self.label_emb = nn.Embedding(num_classes, z_dim * 2)
# Discriminator (projection)
proj = torch.sum(features * label_embedding, dim=1)
return out + proj
β οΈ Phase 1 β The First Attempt (train.py
)
Everything was built from scratch, and many rookie mistakes followed:
Category | Mistake | Explanation |
---|---|---|
Normalization | Didn't normalize to [-1, 1] |
Tanh activation failed to perform |
Embedding | Used num_classes as dim |
Inefficient and inflexible |
Concat Shape | Shape mismatch | Needed unsqueeze calls |
Loss Monitoring | Relied only on G loss | But visuals were bad |
Mode Collapse | Detected too late | All outputs turned white |
Each epoch took around 25β30 minutes on CPU, and although losses were decreasing, the results werenβt improving.
π Phase 2 β Resumed Training (train_2.py
)
I resumed from epoch 15 using the same model and improved the training loop:
- β
Generator LR:
1e-4
, Discriminator LR:2.5e-5
- β Label smoothing added
- β Better visual logging (fixed noise, square grid)
However:
- Mode collapse wasnβt fully gone
- Some classes never appeared
- The GUI testing script failed due to checkpoint mismatch
So⦠I managed to fix the training loop, but the damage from phase 1 was still there.
π¬ Phase 3 β Improved Model (That Couldn't Train)
I built a much more powerful model using everything Iβd learned:
- β
Z-dim increased from
100
β512
- β
Feature maps from
64
β256
- β
BCEWithLogitsLoss
instead ofBCELoss
- β LR scheduling + gradient clipping
- β Heavy augmentation pipeline
I even wrote an evaluation script:
python evaluate_diversity_and_control.py
Which tested:
- same class + different noise β
- same noise + different classes β
π§ͺ Phase 3 β Improvements Over Earlier Models
After experimenting with the basic Conditional GAN architecture in Phases 1 and 2, I realized that a more robust and expressive model was needed to truly capture the variability and structure in the Berlin Sketches dataset. So, I redesigned both the Generator and Discriminator.
Here are the key improvements:
π 1. Label Embedding
Before:
nn.Embedding(num_classes, num_classes)
Improved:
nn.Embedding(num_classes, z_dim * 2)
π 2. Latent Vector Size (z_dim)
Before: z_dim = 100
Improved: z_dim = 512
ποΈ 3. Generator Architecture
Before: Basic ConvTranspose2d
Improved: Wider layers and better label conditioning
π§ 4. Discriminator Architecture
Before: Image + label concat
Improved: Projection Discriminator:
proj = torch.sum(features * label_embedding, dim=1)
return out + proj
π― 5. Loss Function
Before: BCELoss
Improved: BCEWithLogitsLoss
π§Ή 6. Training Strategy
- LR Scheduler
- Gradient Clipping
- Label Smoothing
- Real Image Noise Injection
π§ͺ 7. Evaluation & Debugging
evaluate_diversity_and_control.py
- Class conditioning and noise variation tests
π¨ 8. Data Augmentation
Balanced mix via data_augment.py
:
- Rotation, Affine, Jitter, Erasing, Perspective
Summary Table
Component | Before (Phases 1β2) | Phase 3 Upgrade |
---|---|---|
z_dim | 100 | 512 |
Label Embed | size = num_classes | size = z_dim * 2 |
Discriminator | Simple concat + conv | Projection discriminator |
Loss | BCELoss | BCEWithLogitsLoss |
LR Strategy | Fixed | Scheduler + clipping |
Evaluation | Manual visuals | Automated test script |
Augmentation | Weak or aggressive | Balanced + structured |
π» But My Computer Said No
Training this monster model didnβt work. At all.
- Augmented dataset was huge
- Model too large for my 6GB GPU
- Training crashed with OOM errors
- On CPU, 1 batch took 7+ minutesβ¦
So phase 3 ended before it even began.
π§ What I Learned
- Donβt trust GAN losses β use visuals and consistency checks
- Mode collapse is silent but deadly
- Label conditioning needs proper embedding and architecture
- Augmentation should be balanced
- Training stability > model size
π Final Words
This project may not have "succeeded", but it taught me more than any finished one. I now understand:
- GAN architecture design
- Training dynamics
- Failure modes (like collapse and instability)
- Checkpoint compatibility issues
And most importantly β I know what not to do next time :)
"One day, when I finally complete this project, Iβll come back to this blog post and smile."
π Bonus: Loss Graph & Visual Timeline
Alongside all the model rewrites and retraining attempts, I kept track of two crucial things:
- Loss Graphs from Phase 1
- Visual Timelines showing generator output evolution
π Generator Loss Over Time (Phase 1)
In early training, the Generator loss steadily decreased β which looked promising at first. But...
It turned out that low loss didnβt mean high-quality results. Visuals were repetitive and often white blobs. Classic mode collapse in disguise.
πΌοΈ Visual Timeline of Generator Outputs
Hereβs how the generatorβs output changed across epochs (Phase 1 and 2):
Each row represents a class, and each column an epoch.
Some classes improved for a while, others vanished. It made the case for better label conditioning and model rebalancing.
π GitHub Repository and Data set
All code used for data loading, training, model is available at:
π GitHub: CGAN Project Repository
π Dataset
These artifacts helped me spot early signs of instability β and will absolutely shape how I train GANs in the future.
"Logs, visuals, and graphs β your three best friends in GAN debugging."
Top comments (0)