Introduction
Recently, Alibaba has been making frequent moves in the image generation model field. Just after renaming z-image Base (Not Z-Image-Base, but Z-Image-Omni-Base), they hastily released Z-Image-Turbo-Fun-Controlnet-Union-2.0 on December 14th.
Notably, this comes just 9 days after the release of Z-Image-Turbo ControlNet Union 1.0, inevitably raising questions: what secrets lie behind such rapid iteration?
As outsiders, it's difficult to know the exact details, but we can gain insights from the update content. Without further ado, let's examine what's new:
Key Updates and Features
Version 2.0 emphasizes reliability and creativity. Here's what's inside:
Supported Control Modes: Handles standard inputs like Canny (edge detection for contours), HED (soft edges for artistic effects), Depth (3D structure from maps), Pose (human or object positioning), and MLSD (straight lines for architecture). These allow you to "condition" the AI—for example, provide a rough sketch, and the model generates a refined matching image.
Inpainting Mode: A major new addition! This allows you to mask and edit specific regions of an image (e.g., replace the background without changing the foreground). However, users note it sometimes blurs unmasked areas, so ComfyUI's masking tools help refine results.
Adjustable Parameters: Tune
control_context_scale(recommended 0.65–0.90) to balance how strictly the AI follows controls. Higher values require more inference steps (e.g., 20–40) for clear output, avoiding over-control that distorts details.Training Foundation: Trained from scratch for 70,000 steps using 1 million high-quality images (a mix of general scenes and human-centric content). Uses 1328 resolution, BFloat16 precision, batch size 64, and learning rate 2e-5. The "Fun" name hints at its playful, creative focus, with a text dropout ratio of 0.10 to encourage diverse prompts.
Comparison with Previous Version (1.0)
Summary
This upgrade brings improvements in quality and functionality, including support for Inpainting mode and longer training steps. It's an incremental update that addresses some issues from the previous version, such as training errors and slow loading, making the model more reliable for creative tasks. While performance is better, complex scenes (like hand poses) may still require manual optimization, and hardware requirements are relatively high.
It feels more like it should be called V1.1 or V1.5 rather than V2.0. My subjective speculation is that the current active updates and upgrades may be aimed at faster rollout of Z-Image-Omni-Base, using a modular upgrade approach with distributed iterations to drive unified capability improvements.
Regardless, I hope Alibaba can maintain Z-Image's good momentum, continuously lowering AI barriers, allowing more people to enjoy the convenience of AI.


Top comments (0)