Yu Xie. BiliBiliの人
Using Diffusion Transformer, this method achieves an OCR-free method for training.
This method is light and adaptive to LoRA.
Diffusion Transformer にテキストを入れ込んでいるだけのように思えるがそうではないのか?
Related works
AnyText
Experiment
Model input
I_{scene}, I_{glyph}, is channel-wise concatenated and input into the VAE encoder.
Comment
Diffusion Transformerについて勉強しなきゃだなーの気持ちになって言える
Top comments (0)