DEV Community

StackFoss
StackFoss

Posted on • Originally published at stackfoss.com on

Generative Models by Stability AI

Generative Models by Stability AI

Generative Models by Stability AI

News

June 22, 2023

  • We are releasing two new diffusion models for research purposes:
    • SD-XL 0.9-base: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model.
    • SD-XL 0.9-refiner: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.

If you would like to access these models for your research, please apply using one of the following links:

SDXL-0.9-Base model, and SDXL-0.9-Refiner.

This means that you can apply for any of the two links - and if you are granted - you can access both.

Please log in to your HuggingFace Account with your organization email to request access.

We plan to do a full release soon (July).

The codebase

General Philosophy

Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.

Changelog from the old ldm codebase

For training, we use pytorch-lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion, now DiffusionEngine) has been cleaned up:

  • No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: GeneralConditioner, see sgm/modules/encoders/modules.py.
  • We separate guiders (such as classifier-free guidance, see sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model.
  • We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
    • Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see sgm/modules/diffusionmodules/denoiser.py.
    • The following features are now independent: weighting of the diffusion loss function (sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
  • Autoencoding models have also been cleaned up.

Installation:

1. Clone the repo

git clone git@github.com:Stability-AI/generative

-models.git
cd generative-models

Enter fullscreen mode Exit fullscreen mode

2. Setting up the virtualenv

This is assuming you have navigated to the generative-models root after cloning it.

NOTE: This is tested under python3.8 and python3.10. For other python versions, you might encounter version conflicts.

PyTorch 1.13

# install required packages from pypi
python3 -m venv .pt1
source .pt1/bin/activate
pip3 install wheel
pip3 install -r requirements_pt13.txt

Enter fullscreen mode Exit fullscreen mode

PyTorch 2.0

# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install wheel
pip3 install -r requirements_pt2.txt

Enter fullscreen mode Exit fullscreen mode

Inference:

We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. The following models are currently supported:

Weights for SDXL :

If you would like to access these models for your research, please apply using one of the following links:

SDXL-0.9-Base model, and SDXL-0.9-Refiner.

This means that you can apply for any of the two links - and if you are granted - you can access both.

Please log in to your HuggingFace Account with your organization email to request access.

After obtaining the weights, place them into checkpoints/.

Next, start the demo using

streamlit run scripts/demo/sampling.py --server.port <your_port>

Enter fullscreen mode Exit fullscreen mode

Invisible Watermark Detection

Images generated with our code use the

invisible-watermark

library to embed an invisible watermark into the model output. We also provide

a script to easily detect that watermark. Please note that this watermark is

not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or

try an experimental import using only a minimal amount of packages:

python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

Enter fullscreen mode Exit fullscreen mode

To run the script you need to have a working installation as above. The script

is then useable in the following ways (don't forget to activate your

virtual.

Top comments (0)