Z-Image is a recently released image generation model, so I tried running it locally on my GPU to see how practical it actually is.
This is not about using an official cloud or demo — the goal was simply to check how easy it is to run on my own machine.
Environment
- OS: Ubuntu 22.04
- GPU: NVIDIA RTX (16GB VRAM)
- CUDA: 11.8
- Python: 3.10
If you have experience with SDXL or other Diffusers-based models, nothing here feels unusual.
Setup
Create a virtual environment.
conda create -n zimage python=3.10
conda activate zimage
`
Install PyTorch with CUDA support.
bash
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu118
Install dependencies.
bash
pip install diffusers transformers accelerate safetensors
pip install einops sentencepiece pillow
This is a standard setup for Diffusers-based workflows.
Trying Z-Image-Turbo
A minimal text-to-image example.
`python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16
).to("cuda")
image = pipe(
prompt="A cinematic portrait photo, natural light",
num_inference_steps=8,
guidance_scale=0.0
).images[0]
image.save("out.png")
`
Even with just 8 steps, the output quality is perfectly usable.
It clearly feels designed with efficiency in mind.
Parameters That Took a Moment to Get Used To
A few points that were slightly different from SD-style usage:
-
guidance_scaleis expected to be 0.0 - Increasing steps does not noticeably improve quality
- VRAM usage becomes tight without
bfloat16
Raising CFG like you would with SD models tends to make results worse, not better.
Image-to-Image Works as Expected
`python
from PIL import Image
init_image = Image.open("input.jpg").convert("RGB")
image = pipe(
prompt="change background to a modern office",
image=init_image,
strength=0.8,
num_inference_steps=8,
guidance_scale=0.0
).images[0]
image.save("edited.png")
`
No special configuration is required — this works the same way as other Diffusers pipelines.
Impressions
- Runs reliably on a 16GB VRAM GPU
- Very fast inference
- Handles mixed English / Japanese prompts reasonably well
It feels less like a research showcase and more like a model intended for local or internal use.
I also keep some personal notes and links related to Z-Image here (non-official):
https://z-image.io/
References
Official GitHub
https://github.com/Tongyi-MAI/Z-ImageZ-Image notes (unofficial)
https://z-image.io/
Conclusion
If you want to run image generation fully on your own infrastructure,
Z-Image-Turbo feels like a very practical option.
Next, I’d like to try turning this into a simple API or testing a Docker-based setup.

Top comments (1)
hello