DEV Community

Ertugrul
Ertugrul

Posted on • Edited on

🎭 A CGAN Story: Three Attempts and an Incomplete Ending

"GANs either learn to create art β€” or break your patience."


πŸš€ Project Idea

Hey there! In this blog post, I’ll share my rollercoaster journey of building a Conditional GAN (cGAN) to generate black-and-white hand-drawn objects using the Berlin Sketches dataset.

The idea was simple:

"Give a label, get a drawing."

But of course… it didn’t go as smoothly as expected :)


🧱 Models and Training Scripts Used

Throughout the project, I used three different architectures and training strategies, each in its own phase:

1. Classic CGAN (Basic Setup)

  • Model files: generator.py, discriminator.py
  • Training script: train.py
# Generator (classic)
self.label_emb = nn.Embedding(num_classes, num_classes)
x = torch.cat([noise, label_embedding], dim=1)
Enter fullscreen mode Exit fullscreen mode
# Discriminator (classic)
self.label_embedding = nn.Embedding(num_classes, num_classes)
x = torch.cat([images, label_embedding.expand(...)], dim=1)
Enter fullscreen mode Exit fullscreen mode

2. Improved Training

  • Same model, but better training loop
  • Script: train_2.py
  • βœ… Learning rate tweaks
  • βœ… Label smoothing
  • βœ… Fixed noise evaluation
real_targets = torch.full_like(real_preds, 0.9)  # label smoothing
loss_d = (loss_d_real + loss_d_fake) / 2
Enter fullscreen mode Exit fullscreen mode

3. Upgraded Architecture (Powerful, but Failed to Train)

  • Model files: improved_generator.py, projection_discriminator.py
  • Training scripts: new train.py and train_2.py for improved models
  • βœ… Z dim: 512
  • βœ… Feature maps: 256
  • βœ… BCEWithLogitsLoss, LR scheduler, gradient clipping
  • βœ… Extensive data augmentation using data_augment.py
# Generator (improved)
self.label_emb = nn.Embedding(num_classes, z_dim * 2)

# Discriminator (projection)
proj = torch.sum(features * label_embedding, dim=1)
return out + proj
Enter fullscreen mode Exit fullscreen mode

⚠️ Phase 1 β€” The First Attempt (train.py)

Everything was built from scratch, and many rookie mistakes followed:

Category Mistake Explanation
Normalization Didn't normalize to [-1, 1] Tanh activation failed to perform
Embedding Used num_classes as dim Inefficient and inflexible
Concat Shape Shape mismatch Needed unsqueeze calls
Loss Monitoring Relied only on G loss But visuals were bad
Mode Collapse Detected too late All outputs turned white

Each epoch took around 25–30 minutes on CPU, and although losses were decreasing, the results weren’t improving.


πŸ” Phase 2 β€” Resumed Training (train_2.py)

I resumed from epoch 15 using the same model and improved the training loop:

  • βœ… Generator LR: 1e-4, Discriminator LR: 2.5e-5
  • βœ… Label smoothing added
  • βœ… Better visual logging (fixed noise, square grid)

However:

  • Mode collapse wasn’t fully gone
  • Some classes never appeared
  • The GUI testing script failed due to checkpoint mismatch

So… I managed to fix the training loop, but the damage from phase 1 was still there.


πŸ”¬ Phase 3 β€” Improved Model (That Couldn't Train)

I built a much more powerful model using everything I’d learned:

  • βœ… Z-dim increased from 100 β†’ 512
  • βœ… Feature maps from 64 β†’ 256
  • βœ… BCEWithLogitsLoss instead of BCELoss
  • βœ… LR scheduling + gradient clipping
  • βœ… Heavy augmentation pipeline

I even wrote an evaluation script:

python evaluate_diversity_and_control.py
Enter fullscreen mode Exit fullscreen mode

Which tested:

  • same class + different noise βœ…
  • same noise + different classes βœ…

same class + different noise

same noise + different classes


πŸ§ͺ Phase 3 – Improvements Over Earlier Models

After experimenting with the basic Conditional GAN architecture in Phases 1 and 2, I realized that a more robust and expressive model was needed to truly capture the variability and structure in the Berlin Sketches dataset. So, I redesigned both the Generator and Discriminator.

Here are the key improvements:

πŸ” 1. Label Embedding

Before:

nn.Embedding(num_classes, num_classes)
Enter fullscreen mode Exit fullscreen mode

Improved:

nn.Embedding(num_classes, z_dim * 2)
Enter fullscreen mode Exit fullscreen mode

πŸ“ˆ 2. Latent Vector Size (z_dim)

Before: z_dim = 100

Improved: z_dim = 512

πŸ—οΈ 3. Generator Architecture

Before: Basic ConvTranspose2d

Improved: Wider layers and better label conditioning

🧠 4. Discriminator Architecture

Before: Image + label concat

Improved: Projection Discriminator:

proj = torch.sum(features * label_embedding, dim=1)
return out + proj
Enter fullscreen mode Exit fullscreen mode

🎯 5. Loss Function

Before: BCELoss

Improved: BCEWithLogitsLoss

🧹 6. Training Strategy

  • LR Scheduler
  • Gradient Clipping
  • Label Smoothing
  • Real Image Noise Injection

πŸ§ͺ 7. Evaluation & Debugging

  • evaluate_diversity_and_control.py
  • Class conditioning and noise variation tests

🎨 8. Data Augmentation

Balanced mix via data_augment.py:

  • Rotation, Affine, Jitter, Erasing, Perspective

Summary Table

Component Before (Phases 1–2) Phase 3 Upgrade
z_dim 100 512
Label Embed size = num_classes size = z_dim * 2
Discriminator Simple concat + conv Projection discriminator
Loss BCELoss BCEWithLogitsLoss
LR Strategy Fixed Scheduler + clipping
Evaluation Manual visuals Automated test script
Augmentation Weak or aggressive Balanced + structured

πŸ’» But My Computer Said No

Training this monster model didn’t work. At all.

  • Augmented dataset was huge
  • Model too large for my 6GB GPU
  • Training crashed with OOM errors
  • On CPU, 1 batch took 7+ minutes…

So phase 3 ended before it even began.


🧠 What I Learned

  • Don’t trust GAN losses β€” use visuals and consistency checks
  • Mode collapse is silent but deadly
  • Label conditioning needs proper embedding and architecture
  • Augmentation should be balanced
  • Training stability > model size

πŸ“Œ Final Words

This project may not have "succeeded", but it taught me more than any finished one. I now understand:

  • GAN architecture design
  • Training dynamics
  • Failure modes (like collapse and instability)
  • Checkpoint compatibility issues

And most importantly β€” I know what not to do next time :)

"One day, when I finally complete this project, I’ll come back to this blog post and smile."


πŸ“Š Bonus: Loss Graph & Visual Timeline

Alongside all the model rewrites and retraining attempts, I kept track of two crucial things:

  1. Loss Graphs from Phase 1
  2. Visual Timelines showing generator output evolution

πŸ“‰ Generator Loss Over Time (Phase 1)

In early training, the Generator loss steadily decreased β€” which looked promising at first. But...

loss graph

It turned out that low loss didn’t mean high-quality results. Visuals were repetitive and often white blobs. Classic mode collapse in disguise.


πŸ–ΌοΈ Visual Timeline of Generator Outputs

Here’s how the generator’s output changed across epochs (Phase 1 and 2):

Phase 1

Phase 2

Each row represents a class, and each column an epoch.

Some classes improved for a while, others vanished. It made the case for better label conditioning and model rebalancing.


πŸ“‚ GitHub Repository and Data set

All code used for data loading, training, model is available at:

πŸ”— GitHub: CGAN Project Repository

πŸ”— Dataset

These artifacts helped me spot early signs of instability β€” and will absolutely shape how I train GANs in the future.

"Logs, visuals, and graphs β€” your three best friends in GAN debugging."

Top comments (0)