DEV Community

Cover image for A layman understanding on Stable Diffusion uptraining
Kage.
Kage.

Posted on • Updated on

A layman understanding on Stable Diffusion uptraining

People have been talking a lot about Chilloutmix lately, with different models like basil-mix being trained by various methods. What are the primary variations between DreamBooth, Textual Inversion, LoRa & HyperNetworks?

Dreambooth

Dreambooth can fine-tune the diffusion model to learn new concepts

Usage: Adding new objects or styles to stable diffusion.
Pros: High Quality, Not too slow
Cons: Output files get real big real fast as it’s a new model checkpoint, which makes it difficult to switch between training models if you want to isolate them.

Textual Inversion

Textual Inversion uses special word embedding to create new concepts.
Usage: You can add face or object embedding to any prompt without having to make any model changes.
Props: Extremely tiny file sizes, making it easy to attach or detach from models.
Cons: Cannot use the same embedding for another model and it is rather slow on training speed. You will need an accurate prompt template.

LoRA

LoRA adds a few weights to the diffusion model, then trains them so it can comprehend the concept.
Usage: A lightweight version of Dreambooth without changing the internal model
Props: Small output file size, good quality, compatibility with other checkpoints, and low RAM requirements.
Cons: It is hard to configure and easy to struggle with multiple models and styles.

Hypernetworks

Hypernetworks use another network to predict new weights for the original one, which are then used in inference.
Usage: Similar to LoRA, but with slightly different technology.
Props: Build in with automatic1111 so quite easy to use
Cons: Quality is worse than LoRA

By Kage

Top comments (0)