Generative Models by Stability AI
News
June 22, 2023
- We are releasing two new diffusion models for research purposes:
- 
SD-XL 0.9-base: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model.
- 
SD-XL 0.9-refiner: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
 
- 
If you would like to access these models for your research, please apply using one of the following links:
SDXL-0.9-Base model, and SDXL-0.9-Refiner.
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your HuggingFace Account with your organization email to request access.
We plan to do a full release soon (July).
The codebase
General Philosophy
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.
  
  
  Changelog from the old ldm codebase
For training, we use pytorch-lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion, now DiffusionEngine) has been cleaned up:
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: GeneralConditioner, seesgm/modules/encoders/modules.py.
- We separate guiders (such as classifier-free guidance, see sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model.
- We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see sgm/modules/diffusionmodules/denoiser.py.
- The following features are now independent: weighting of the diffusion loss function (sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
 
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see 
- Autoencoding models have also been cleaned up.
Installation:
1. Clone the repo
git clone git@github.com:Stability-AI/generative
-models.git
cd generative-models
2. Setting up the virtualenv
This is assuming you have navigated to the generative-models root after cloning it.
NOTE: This is tested under python3.8 and python3.10. For other python versions, you might encounter version conflicts.
PyTorch 1.13
# install required packages from pypi
python3 -m venv .pt1
source .pt1/bin/activate
pip3 install wheel
pip3 install -r requirements_pt13.txt
PyTorch 2.0
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install wheel
pip3 install -r requirements_pt2.txt
Inference:
We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. The following models are currently supported:
Weights for SDXL :
If you would like to access these models for your research, please apply using one of the following links:
SDXL-0.9-Base model, and SDXL-0.9-Refiner.
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your HuggingFace Account with your organization email to request access.
After obtaining the weights, place them into checkpoints/.
Next, start the demo using
streamlit run scripts/demo/sampling.py --server.port <your_port>
Invisible Watermark Detection
Images generated with our code use the
invisible-watermark
library to embed an invisible watermark into the model output. We also provide
a script to easily detect that watermark. Please note that this watermark is
not the same as in previous Stable Diffusion 1.x/2.x versions.
To run the script you need to either have a working installation as above or
try an experimental import using only a minimal amount of packages:
python -m venv .detect
source .detect/bin/activate
pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark
To run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual.
 


 
    
Top comments (0)