Gruhesh Sri Sai Karthik Kurra

Posted on Jul 14

Demystifying Diffusion Models: Building DDPM from Scratch with PyTorch

#pytorch #llm #deeplearning #ai

Demystifying Diffusion Models: Building DDPM from Scratch with PyTorch

Diffusion models have taken the AI world by storm! From DALL-E 2 to Stable Diffusion, these models are behind the most impressive image generators we see today. But how do they actually work? Today, I'll walk you through building a complete Denoising Diffusion Probabilistic Model (DDPM) from scratch, demystifying the mathematics and implementation behind this revolutionary technology.

🌊 What Makes Diffusion Models Special?

Unlike GANs that learn through adversarial training, diffusion models use a surprisingly intuitive approach:

Forward Process: Gradually add noise to data until it becomes pure random noise
Reverse Process: Train a neural network to remove noise step by step
Generation: Start with noise and iteratively denoise to create new data

Think of it like learning to clean a dirty window - but in reverse! We first learn how windows get dirty, then master the art of cleaning them.

🧮 The Mathematics Made Simple

Forward Process: Destroying Data

The forward process follows a Markov chain that gradually adds Gaussian noise:

q(x_t | x_{t-1}) = N(x_t; √(1-β_t) x_{t-1}, β_t I)

But here's the magic - we can jump directly to any timestep using the reparameterization trick:

x_t = √ᾱ_t x_0 + √(1-ᾱ_t) ε

Where:

x_0 = original clean data
x_t = data at timestep t
ᾱ_t = cumulative noise schedule
ε = random Gaussian noise

Reverse Process: Creating Data

The neural network learns to predict the noise that was added:

# The network predicts: ε_θ(x_t, t)
# We then compute the denoised sample:
μ_θ(x_t, t) = (x_t - (β_t / √(1-ᾱ_t)) * ε_θ(x_t, t)) / √α_t

Loss Function: Simple and Elegant

We train by minimizing the noise prediction error:

L = E[||ε - ε_θ(x_t, t)||²]

That's it! No discriminator, no adversarial dynamics - just predict the noise!

🏗️ Implementation Architecture

Our implementation features a clean, modular design:

Noise Predictor Network

class NoisePredictor(nn.Module):
    def __init__(self, data_dim=2, hidden_dim=256, time_embed_dim=64):
        super(NoisePredictor, self).__init__()

        # Time embedding: Convert timestep to rich representation
        self.time_mlp = nn.Sequential(
            nn.Linear(1, time_embed_dim),
            nn.SiLU(),  # Smooth activation works great for diffusion
            nn.Linear(time_embed_dim, time_embed_dim),
            nn.SiLU(),
            nn.Linear(time_embed_dim, time_embed_dim)
        )

        # Main network: Predict noise from data + time
        self.main_mlp = nn.Sequential(
            nn.Linear(data_dim + time_embed_dim, hidden_dim),
            nn.SiLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_dim, hidden_dim),
            nn.SiLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_dim, hidden_dim),
            nn.SiLU(),
            nn.Linear(hidden_dim, data_dim)  # Output: predicted noise
        )

    def forward(self, x, t):
        batch_size = x.shape[0]
        t_normalized = t.float() / 1000.0
        t_embed = self.time_mlp(t_normalized.view(-1, 1))
        x_t_concat = torch.cat([x, t_embed], dim=1)
        noise_pred = self.main_mlp(x_t_concat)
        return noise_pred

Complete Diffusion Model

class DiffusionModel:
    def __init__(self, T=1000, beta_start=0.0001, beta_end=0.02):
        self.T = T

        # Noise schedule: β increases linearly
        self.beta = torch.linspace(beta_start, beta_end, T)
        self.alpha = 1. - self.beta
        self.alpha_bar = torch.cumprod(self.alpha, dim=0)

        self.model = NoisePredictor()
        self.optimizer = torch.optim.AdamW(self.model.parameters())

    def forward_process(self, x0, t):
        """Add noise to clean data using reparameterization trick"""
        epsilon = torch.randn_like(x0)
        sqrt_alpha_bar = torch.sqrt(self.alpha_bar[t]).view(-1, 1)
        sqrt_one_minus_alpha_bar = torch.sqrt(1 - self.alpha_bar[t]).view(-1, 1)

        # Direct sampling at timestep t
        xt = sqrt_alpha_bar * x0 + sqrt_one_minus_alpha_bar * epsilon
        return xt, epsilon

    def sample(self, n_samples=100):
        """Generate new samples starting from pure noise"""
        x = torch.randn(n_samples, 2)  # Start with pure noise

        # Iteratively denoise over T steps
        for t in reversed(range(self.T)):
            epsilon_pred = self.model(x, torch.tensor([t]).float())

            # Compute denoised sample
            alpha_t = self.alpha[t]
            beta_t = self.beta[t]
            sqrt_one_minus_alpha_bar = torch.sqrt(1 - self.alpha_bar[t])

            # Denoising step
            mu = (x - (beta_t / sqrt_one_minus_alpha_bar) * epsilon_pred) / torch.sqrt(alpha_t)

            if t > 0:
                x = mu + torch.sqrt(beta_t) * torch.randn_like(x)
            else:
                x = mu

        return x

📊 Training Results & Visualizations

Our implementation produces impressive results on 2D datasets:

Training Curves

Generated Samples

Key Performance Metrics:

Model Size: 1.8MB (130K parameters)
Training Time: ~30 minutes on GPU
Memory Usage: <500MB GPU memory
Convergence: Stable training without mode collapse

🔑 Key Implementation Insights

1. Time Embedding is Crucial

The timestep embedding allows the network to understand "how much noise to expect":

# Normalize timestep and create rich embedding
t_normalized = t.float() / 1000.0
t_embed = self.time_mlp(t_normalized.view(-1, 1))

2. SiLU Activation Works Best

We found SiLU (Swish) activation consistently outperforms ReLU for diffusion models:

nn.SiLU()  # x * sigmoid(x) - smooth and works great!

3. Beta Schedule Matters

Linear beta schedule from 0.0001 to 0.02 provides good balance:

self.beta = torch.linspace(0.0001, 0.02, T)

4. Dropout Prevents Overfitting

Even with simple 2D data, dropout helps generalization:

nn.Dropout(0.1)  # Light dropout is sufficient

🚀 Why This Implementation Rocks

📚 Educational Value

Complete Mathematical Derivations: From theory to code
Step-by-Step Explanations: Understand every component
Visual Learning: Rich plots and animations
Progressive Complexity: Build understanding gradually

🛠️ Production Features

Modular Design: Easy to extend and modify
Comprehensive Logging: Track everything during training
Rich Visualizations: Monitor training in real-time
Clean Code: Well-documented and maintainable

🔬 Research Ready

Extensible Architecture: Add new features easily
Multiple Schedules: Support for different noise schedules
Flexible Sampling: Various generation strategies
Detailed Analytics: Deep insight into model behavior

🎯 Real-World Applications

This foundational implementation opens doors to:

🖼️ Image Generation

Extend to pixel-based image synthesis
Add conditional generation with text guidance
Implement inpainting and outpainting

🎵 Audio Synthesis

Apply to waveforms or spectrograms
Music generation and speech synthesis
Audio restoration and enhancement

🧬 Scientific Applications

Molecular structure generation
Drug discovery and materials science
Climate modeling and simulation

🤖 AI Research

Foundation for more complex architectures
Understanding generative modeling principles
Basis for novel research directions

🔮 Advanced Extensions

Ready to take it further? Here are exciting directions:

# DDIM: Faster sampling with deterministic steps
def ddim_sample(self, n_samples, eta=0.0):
    # Deterministic sampling for faster generation
    pass

# Conditional Generation: Text-guided creation
def conditional_sample(self, text_condition):
    # Guide generation with text embeddings
    pass

# Classifier-Free Guidance: Better controllability
def cfg_sample(self, guidance_scale=7.5):
    # Enhanced conditional generation
    pass

💡 Key Takeaways

Diffusion models are mathematically elegant - based on simple Gaussian processes
Training is remarkably stable - no adversarial dynamics like GANs
Quality is exceptional - can generate highly realistic samples
Implementation is accessible - complex theory, simple code
Applications are vast - from art to science to entertainment

🚀 Try It Yourself!

Ready to dive in? Here's how to get started:

GitHub Repository

git clone https://github.com/GruheshKurra/DiffusionModelFromScratch.git
cd DiffusionModelFromScratch
pip install torch matplotlib numpy seaborn pandas tqdm
jupyter notebook "Diffusion Models.ipynb"

Hugging Face Model Hub

🤗 karthik-2905/DiffusionModelFromScratch

Explore the pre-trained model, detailed documentation, and interactive examples!

🌟 What's Next?

This implementation provides a solid foundation for:

Understanding diffusion model theory
Building more complex architectures
Exploring novel research directions
Creating your own generative AI applications

The future of generative AI is bright, and diffusion models are leading the charge. Whether you're building the next DALL-E or exploring new scientific applications, understanding these fundamentals will serve you well.

🤝 Connect & Contribute

Found this helpful? Let's build the future of AI together!

🐙 GitHub: GruheshKurra
🤗 Hugging Face: karthik-2905

Have questions, suggestions, or want to contribute? Open an issue or submit a PR!

Happy diffusing, and may your samples be high-quality and diverse! 🌊✨

DEV Community

Demystifying Diffusion Models: Building DDPM from Scratch with PyTorch

Demystifying Diffusion Models: Building DDPM from Scratch with PyTorch

🌊 What Makes Diffusion Models Special?

🧮 The Mathematics Made Simple

Forward Process: Destroying Data

Reverse Process: Creating Data

Loss Function: Simple and Elegant

🏗️ Implementation Architecture

Noise Predictor Network

Complete Diffusion Model

📊 Training Results & Visualizations

Training Curves

Generated Samples

Key Performance Metrics:

🔑 Key Implementation Insights

1. Time Embedding is Crucial

2. SiLU Activation Works Best

3. Beta Schedule Matters

4. Dropout Prevents Overfitting

🚀 Why This Implementation Rocks

📚 Educational Value

🛠️ Production Features

🔬 Research Ready

🎯 Real-World Applications

🖼️ Image Generation

🎵 Audio Synthesis

🧬 Scientific Applications

🤖 AI Research

🔮 Advanced Extensions

💡 Key Takeaways

🚀 Try It Yourself!

GitHub Repository

Hugging Face Model Hub

🌟 What's Next?

🤝 Connect & Contribute

Top comments (0)