Ertugrul

Posted on Aug 8 • Edited on Aug 15

🎭 A CGAN Story: Three Attempts and an Incomplete Ending

#gan #deeplearning #python #pytorch

"GANs either learn to create art — or break your patience."

🚀 Project Idea

Hey there! In this blog post, I’ll share my rollercoaster journey of building a Conditional GAN (cGAN) to generate black-and-white hand-drawn objects using the Berlin Sketches dataset.

The idea was simple:

"Give a label, get a drawing."

But of course… it didn’t go as smoothly as expected :)

🧱 Models and Training Scripts Used

Throughout the project, I used three different architectures and training strategies, each in its own phase:

1. Classic CGAN (Basic Setup)

Model files: generator.py, discriminator.py
Training script: train.py

# Generator (classic)
self.label_emb = nn.Embedding(num_classes, num_classes)
x = torch.cat([noise, label_embedding], dim=1)

# Discriminator (classic)
self.label_embedding = nn.Embedding(num_classes, num_classes)
x = torch.cat([images, label_embedding.expand(...)], dim=1)

2. Improved Training

Same model, but better training loop
Script: train_2.py
✅ Learning rate tweaks
✅ Label smoothing
✅ Fixed noise evaluation

real_targets = torch.full_like(real_preds, 0.9)  # label smoothing
loss_d = (loss_d_real + loss_d_fake) / 2

3. Upgraded Architecture (Powerful, but Failed to Train)

Model files: improved_generator.py, projection_discriminator.py
Training scripts: new train.py and train_2.py for improved models
✅ Z dim: 512
✅ Feature maps: 256
✅ BCEWithLogitsLoss, LR scheduler, gradient clipping
✅ Extensive data augmentation using data_augment.py

# Generator (improved)
self.label_emb = nn.Embedding(num_classes, z_dim * 2)

# Discriminator (projection)
proj = torch.sum(features * label_embedding, dim=1)
return out + proj

⚠️ Phase 1 — The First Attempt (`train.py`)

Everything was built from scratch, and many rookie mistakes followed:

Category	Mistake	Explanation
Normalization	Didn't normalize to [-1, 1]	`Tanh` activation failed to perform
Embedding	Used `num_classes` as dim	Inefficient and inflexible
Concat Shape	Shape mismatch	Needed `unsqueeze` calls
Loss Monitoring	Relied only on G loss	But visuals were bad
Mode Collapse	Detected too late	All outputs turned white

Each epoch took around 25–30 minutes on CPU, and although losses were decreasing, the results weren’t improving.

🔁 Phase 2 — Resumed Training (`train_2.py`)

I resumed from epoch 15 using the same model and improved the training loop:

✅ Generator LR: 1e-4, Discriminator LR: 2.5e-5
✅ Label smoothing added
✅ Better visual logging (fixed noise, square grid)

However:

Mode collapse wasn’t fully gone
Some classes never appeared
The GUI testing script failed due to checkpoint mismatch

So… I managed to fix the training loop, but the damage from phase 1 was still there.

🔬 Phase 3 — Improved Model (That Couldn't Train)

I built a much more powerful model using everything I’d learned:

✅ Z-dim increased from 100 → 512
✅ Feature maps from 64 → 256
✅ BCEWithLogitsLoss instead of BCELoss
✅ LR scheduling + gradient clipping
✅ Heavy augmentation pipeline

I even wrote an evaluation script:

python evaluate_diversity_and_control.py

Which tested:

same class + different noise ✅
same noise + different classes ✅

🧪 Phase 3 – Improvements Over Earlier Models

After experimenting with the basic Conditional GAN architecture in Phases 1 and 2, I realized that a more robust and expressive model was needed to truly capture the variability and structure in the Berlin Sketches dataset. So, I redesigned both the Generator and Discriminator.

Here are the key improvements:

🔁 1. Label Embedding

Before:

nn.Embedding(num_classes, num_classes)

Improved:

nn.Embedding(num_classes, z_dim * 2)

📈 2. Latent Vector Size (z_dim)

Before: z_dim = 100

Improved: z_dim = 512

🏗️ 3. Generator Architecture

Before: Basic ConvTranspose2d

Improved: Wider layers and better label conditioning

🧠 4. Discriminator Architecture

Before: Image + label concat

Improved: Projection Discriminator:

proj = torch.sum(features * label_embedding, dim=1)
return out + proj

🎯 5. Loss Function

Before: BCELoss

Improved: BCEWithLogitsLoss

🧹 6. Training Strategy

LR Scheduler
Gradient Clipping
Label Smoothing
Real Image Noise Injection

🧪 7. Evaluation & Debugging

evaluate_diversity_and_control.py
Class conditioning and noise variation tests

🎨 8. Data Augmentation

Balanced mix via data_augment.py:

Rotation, Affine, Jitter, Erasing, Perspective

Summary Table

Component	Before (Phases 1–2)	Phase 3 Upgrade
z_dim	100	512
Label Embed	size = num_classes	size = z_dim * 2
Discriminator	Simple concat + conv	Projection discriminator
Loss	BCELoss	BCEWithLogitsLoss
LR Strategy	Fixed	Scheduler + clipping
Evaluation	Manual visuals	Automated test script
Augmentation	Weak or aggressive	Balanced + structured

💻 But My Computer Said No

Training this monster model didn’t work. At all.

Augmented dataset was huge
Model too large for my 6GB GPU
Training crashed with OOM errors
On CPU, 1 batch took 7+ minutes…

So phase 3 ended before it even began.

🧠 What I Learned

Don’t trust GAN losses — use visuals and consistency checks
Mode collapse is silent but deadly
Label conditioning needs proper embedding and architecture
Augmentation should be balanced
Training stability > model size

📌 Final Words

This project may not have "succeeded", but it taught me more than any finished one. I now understand:

GAN architecture design
Training dynamics
Failure modes (like collapse and instability)
Checkpoint compatibility issues

And most importantly — I know what not to do next time :)

"One day, when I finally complete this project, I’ll come back to this blog post and smile."

📊 Bonus: Loss Graph & Visual Timeline

Alongside all the model rewrites and retraining attempts, I kept track of two crucial things:

Loss Graphs from Phase 1
Visual Timelines showing generator output evolution

📉 Generator Loss Over Time (Phase 1)

In early training, the Generator loss steadily decreased — which looked promising at first. But...

It turned out that low loss didn’t mean high-quality results. Visuals were repetitive and often white blobs. Classic mode collapse in disguise.

🖼️ Visual Timeline of Generator Outputs

Here’s how the generator’s output changed across epochs (Phase 1 and 2):

Each row represents a class, and each column an epoch.

Some classes improved for a while, others vanished. It made the case for better label conditioning and model rebalancing.

📂 GitHub Repository and Data set

All code used for data loading, training, model is available at:

🔗 GitHub: CGAN Project Repository

🔗 Dataset

These artifacts helped me spot early signs of instability — and will absolutely shape how I train GANs in the future.

"Logs, visuals, and graphs — your three best friends in GAN debugging."

DEV Community

🎭 A CGAN Story: Three Attempts and an Incomplete Ending

🚀 Project Idea

🧱 Models and Training Scripts Used

1. Classic CGAN (Basic Setup)

2. Improved Training

3. Upgraded Architecture (Powerful, but Failed to Train)

⚠️ Phase 1 — The First Attempt (`train.py`)

🔁 Phase 2 — Resumed Training (`train_2.py`)

🔬 Phase 3 — Improved Model (That Couldn't Train)

🧪 Phase 3 – Improvements Over Earlier Models

🔁 1. Label Embedding

📈 2. Latent Vector Size (z_dim)

🏗️ 3. Generator Architecture

🧠 4. Discriminator Architecture

🎯 5. Loss Function

🧹 6. Training Strategy

🧪 7. Evaluation & Debugging

🎨 8. Data Augmentation

Summary Table

💻 But My Computer Said No

🧠 What I Learned

📌 Final Words

📊 Bonus: Loss Graph & Visual Timeline

📉 Generator Loss Over Time (Phase 1)

🖼️ Visual Timeline of Generator Outputs

📂 GitHub Repository and Data set

Top comments (0)

🚀 Project Idea

🧱 Models and Training Scripts Used

1. Classic CGAN (Basic Setup)

2. Improved Training

3. Upgraded Architecture (Powerful, but Failed to Train)

⚠️ Phase 1 — The First Attempt (train.py)

🔁 Phase 2 — Resumed Training (train_2.py)

🔬 Phase 3 — Improved Model (That Couldn't Train)

🧪 Phase 3 – Improvements Over Earlier Models

🔁 1. Label Embedding

📈 2. Latent Vector Size (z_dim)

🏗️ 3. Generator Architecture

🧠 4. Discriminator Architecture

🎯 5. Loss Function

🧹 6. Training Strategy

🧪 7. Evaluation & Debugging

🎨 8. Data Augmentation

Summary Table

💻 But My Computer Said No

🧠 What I Learned

📌 Final Words

📊 Bonus: Loss Graph & Visual Timeline

📉 Generator Loss Over Time (Phase 1)

🖼️ Visual Timeline of Generator Outputs

📂 GitHub Repository and Data set

⚠️ Phase 1 — The First Attempt (`train.py`)

🔁 Phase 2 — Resumed Training (`train_2.py`)