Diffusion Models from Scratch: DDPM Training in PyTorch

#diffusionmodels #ddpm #pytorch #generativemodels

The Forward Process: Adding Noise Until Nothing Remains

Diffusion models work by destroying data through gradual noise injection, then learning to reverse that destruction. That's it. No adversarial training, no mode collapse nightmares, no discriminator to babysit. Just a deterministic noising schedule and a neural network that learns to denoise.

The math is surprisingly elegant once you stop trying to understand it from VAE analogies.

The forward diffusion process takes an image $x_0$ and progressively adds Gaussian noise over $T$ timesteps until you're left with pure noise $x_T \sim \mathcal{N}(0, I)$. Each step follows:

$$x_t = \sqrt{1 - \beta_t} \cdot x_{t-1} + \sqrt{\beta_t} \cdot \epsilon_t$$

where $\beta_t$ is a small variance schedule (typically 0.0001 to 0.02) and $\epsilon_t \sim \mathcal{N}(0, I)$. The beauty is you can sample $x_t$ directly from $x_0$ without computing all intermediate steps:

$$x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon$$

where $\alpha_t = 1 - \beta_t$ and $\bar{\alpha}t = \prod{s=1}^{t} \alpha_s$. This closed-form sampling is what makes training efficient — you can jump to any timestep in constant time.


python
import torch
import torch.nn as nn
import numpy as np
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class DiffusionSchedule:
    def __init__(self, timesteps=1000, beta_start=1e-4, beta_end=0.02):
        self.timesteps = timesteps

---

*Continue reading the full article on [TildAlice](https://tildalice.io/diffusion-models-from-scratch-ddpm-pytorch/)*

DEV Community

Diffusion Models from Scratch: DDPM Training in PyTorch

The Forward Process: Adding Noise Until Nothing Remains

Top comments (0)