UC Barkleyのwilliam peebles
Transformersを使った研究
Introduction
Diffusion transformer is a diffusion model that overtook the U-net. Ho first introduced the U-net backbone.
History
U-Net
CNN++
U-net inductive bias is not crucial to the performance of diffusion models
DiT adhere to the best practice of ViT.
U-net backbone → transformer
Diffusion transformers
Diffusion modes are trained to learn the reverse process that inverts forward process corruptions
Classifier-free guidance
Latent diffusion models
Patchify
Spatial input to sequences of tokens like ViT
In-context conditioning
Adaptive layer norm block
adaptive normalization layers
adaLN-Zero block
Experimental setup
Augmentation
horizontal flips
Transformer decoder
The size of an output noise prediction = the input’s
Rearrange the decoded tokens into the original spatial layout
Increasing transformer size and decreasing patch size have a better impact on the quality of images.
Diver DiT blocks
Conclusion
A simple transformer-based backbone for diffusion models.
DiT to larger models
Top comments (0)