DEV Community

Takara Taniguchi
Takara Taniguchi

Posted on

TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

Yu Xie. BiliBiliの人

Using Diffusion Transformer, this method achieves an OCR-free method for training.
This method is light and adaptive to LoRA.
Diffusion Transformer にテキストを入れ込んでいるだけのように思えるがそうではないのか?

Related works
AnyText

Experiment
Model input
I_{scene}, I_{glyph}, is channel-wise concatenated and input into the VAE encoder.

Comment
Diffusion Transformerについて勉強しなきゃだなーの気持ちになって言える

Top comments (0)