TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

#ai #machinelearning #deeplearning #discuss

Yu Xie. BiliBiliの人

Using Diffusion Transformer, this method achieves an OCR-free method for training.
This method is light and adaptive to LoRA.
Diffusion Transformer にテキストを入れ込んでいるだけのように思えるがそうではないのか？

Related works
AnyText

Experiment
Model input
I_{scene}, I_{glyph}, is channel-wise concatenated and input into the VAE encoder.

Comment
Diffusion Transformerについて勉強しなきゃだなーの気持ちになって言える

DEV Community

TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

Top comments (0)