original article here: https://zenn.dev/ktakayama/articles/6c627e0956f32c
AI image generator Stable Diffusion is now open source. I want to running it on local machine but I only have a MacBook Pro then not to easy.
https://github.com/CompVis/stable-diffusion
The following thread is very helpful!
https://github.com/CompVis/stable-diffusion/issues/25
Speed
Here is my MacBook Pro 14 spec.
- Apple M1 Pro chip
- 8 core CPU with 6 performance cores and 2 efficiency cores
- 14-core GPU
- 16-core Neural Engine
- 32GB memory
It needs about 15–20 GB of memory while generating images. 6 images can be generated in about 5 minutes.
Get model
Register and clone this repository.
https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
Get source code
Get the source code in the apple-silicon-mps-support branch of this repository.
https://github.com/magnusviri/stable-diffusion/tree/apple-silicon-mps-support
Setup
Install conda and rust with homebrew.
brew install miniconda rust
Setup shell environment for conda. I use zsh.
conda init zsh
When I run conda env create, I get an error.
$ conda env create -f environment-mac.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- python=3.8.5
Edit environment-mac.yaml to match your environment. Specifically, change the version number to match your environment. For example.
diff --git a/environment-mac.yaml b/environment-mac.yaml
index d923d56..c8a0a8e 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -3,14 +3,14 @@ channels:
- pytorch
- defaults
dependencies:
- - python=3.8.5
- - pip=20.3
+ - python=3.9.12
+ - pip=21.2.4
- pytorch=1.12.1
- torchvision=0.13.1
- numpy=1.19.2
- pip:
- albumentations==0.4.3
- - opencv-python==4.1.2.30
+ - opencv-python>=4.1.2.30
- pudb==2019.2
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
activate and link to model.
conda activate ldm
mkdir -p models/ldm/stable-diffusion-v1
ln -s /path/to/stable-diffusion-v-1-4-original/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
Do image generation!
I get a PyTorch related error when I execute txt2image.
$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
〜 skip 〜
NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Install nightly version.
conda install pytorch torchvision torchaudio -c pytorch-nightly
This still got error.
$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Fix this error.
https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221667017
vi /opt/homebrew/Caskroom/miniconda/base/envs/ldm/lib/python3.9/site-packages/torch/nn/functional.py
--- functional.py_ 2022-08-23 17:07:29.000000000 +0900
+++ functional.py 2022-08-23 17:07:31.000000000 +0900
@@ -2506,9 +2506,9 @@ def layer_norm(
"""
if has_torch_function_variadic(input, weight, bias):
return handle_torch_function(
- layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
+ layer_norm, (input.contiguous(), weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
)
- return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
+ return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
Everything OK! Great!!!
$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
...
Your samples are ready and waiting for you here:
outputs/txt2img-samples
Enjoy.
Top comments (1)
can I ask your for a little more benchmarking? you say 6 images in 5 minutes, but what are the settings? 512x512 with 50 steps? This is about the same speed as my 4 year old msi notebook with a gtx1060 12gb.