DEV Community

Cover image for Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension
Furkan Gözükara
Furkan Gözükara

Posted on

Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Info

  • As you know I have finalized and perfected my FLUX Fine Tuning and LoRA training workflows until something new arrives

  • Both are exactly same, only we load LoRA config into LoRA tab of Kohya GUI and we load Fine Tuning config into Dreambooth tab

  • When we use Classification / Regularization images actually Fine Tuning becomes Dreambooth training as you know

  • However with FLUX, Classification / Regularization images do not help as I have shown previously with Grid experimentations

  • FLUX LoRA training configs and details : https://www.patreon.com/posts/110879657

  • FLUX Fine Tuning configs and details : https://www.patreon.com/posts/112099700

  • So what is up with Single Block FLUX LoRA training?

  • FLUX model is composed of by 19 double blocks and 39 single blocks

  • 1 double block takes around 640 MB VRAM and 1 single block around 320 MB VRAM in 16-bit precision when doing a Fine Tuning training

    • We have configs for 16GB, 24GB and 48GB GPUs all same quality, only speed is different

  • Normally we train a LoRA on all of the blocks

  • However it was claimed that you can train a single block and still get good results

  • So I have researched this thoroughly and sharing all info in this article

  • Moreover, I decided to reduce LoRA Network Rank (Dimension) of my workflow and testing impact of keeping same Network Alpha or scaling it relatively

Experimentation Details and Hardware

  • We are going to use Kohya GUI

  • How to install it and use and train full tutorial here : https://youtu.be/nySGu12Y05k

  • Full tutorial for Cloud services here : https://youtu.be/-uhL2nW7Ddw

  • I have used my classical 15 images experimentation dataset

  • I have trained 150 epochs thus 2250 steps

  • All experiments are done on a single RTX A6000 48 GB GPU (almost same speed as RTX 3090)

  • In all experiments I have trained Clip-L as well except in Fine Tuning (you can't train it yet)

  • I know it doesn't have expressions but that is not the point you can see my 256 images training results with exact same workflow here : https://www.reddit.com/r/StableDiffusion/comments/1ffwvpo/tried_expressions_with_flux_lora_training_with_my/

  • So I research a workflow and when you use a better dataset you get even better results

  • I will give full links to the Figures so click them to download and see full resolution

  • Figure 0 is first uploaded image and so on with numbers

Research of 1-Block Training

  • I have used my exact same settings and trained 0-7 double blocks and 0-15 single blocks at first to determine whether block number matters a lot or not with same learning rate of my full layers LoRA training

  • 0-7 double blocks results can be seen in Figure_0.jfif and 0-15 single block results can be seen in Figure_1.jfif

  • I didn't notice very meaningful difference and also the learning rate was too low as can be seen from the figures

  • But still I picked single block-8 as best one to expand the research

  • Then I have trained 8 different learning rates on single-block 8 and determined the best learning rate as shown in Figure_2.jfif

  • It required more than 10 times learning rate of all blocks regular FLUX LoRA training

  • Then I decided to test combination of different single blocks / layers and wanted to see their impact

  • As can be seen in Figure_3.jfif I have tried combination of 2-11 different layers

  • As the number of trained layers increased, obviously it required a new fine-tuned learning rate

  • Thus I decided to not move any further at the moment because single layer training will obviously yield sub-par results and i don't see much benefit of them

  • In all cases Full FLUX Fine Tuning > LoRA Extraction from Full FLUX Fine Tuned Model > LoRA full Layers training > reduced FLUX LoRA layers training

Research of Network Alpha Change

  • In my very best FLUX LoRA training workflow I use LoRA Network Rank (Dimension) as 128

  • The impact of is, the generated LoRA file sizes are bigger

  • It keeps more information but also causes more overfitting

  • So with some tradeoffs, this LoRA Network Rank (Dimension) can be reduced

  • Normally I found my workflow with 128 Network Rank (Dimension) / 128 (Network Alpha)

  • The Network Alpha directly scales the Learning Rate thus changing it affects the Learning Rate

  • We also know that training more parameters requires lesser Learning Rate already by now from above experiments and from FLUX Full Fine Tuning experiments

  • So when we reduce LoRA Network Rank (Dimension) what should we do to not change Learning Rate?

  • Here comes the Network Alpha into play

  • Should we scale it or keep it as it is?

  • Thus I have experimented LoRA Network Rank (Dimension) 16 / 16 (Network Alpha) and 16 / 128

  • So in 1 experiment I kept it as it is and in another experiment I relatively scaled it

  • The results are shared in Figure_4.jpg

Conclusions

  • As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced

  • Of course you earn some extra VRAM memory reduction and also some reduced size on the disk

  • Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better

  • Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research

  • Finally, very best and least overfitting is achieved with full Fine Tuning

  • Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA

  • Third is doing a all layers regular LoRA training

  • And the worst quality is training lesser blocks / layers with LoRA

  • So how much VRAM and Speed single block LoRA training brings?

    • All layers 16 bit is 27700 MB (4.85 second / it) and 1 single block is 25800 MB (3.7 second / it)

    • All layers 8 bit is 17250 MB (4.85 second / it) and 1 single block is 15700 MB (3.8 second / it)

Image Raw Links

Figures

Top comments (0)