🎄 Merry Christmas and Happy New Year!
As 2025 comes to a close, we want to extend our deepest gratitude to each of you for your creativity and support this year. Your experiments, feedback, and brilliant creations have been the heartbeat of our open ecosystem.
As a final gift of the year, we’re excited to share the newest models and tools born in this last week of 2025.
Let’s take a look at what’s just landed.
👉 Subscribe to The Tongyi Weekly and never miss a release:
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345
📣 Model Release & Updates
Introducing Qwen-Image-Layered: native image decomposition, fully open-sourced
Why it stands out
- Photoshop-grade layering: Physically isolated RGBA layers with true native editability
- Prompt-controlled structure: Explicitly specify 3–10 layers — from coarse layouts to fine-grained details
Infinite decomposition: Keep drilling down: layers within layers, to any depth of detail
đź”— Get started:
New Open-Source End-to-End Voice Model: Fun-Audio-Chat
We’re open-sourcing Fun-Audio-Chat — an end-to-end voice model that’s more than just a chatbot.
It’s your AI voice partner:
- Empathetic: Understands emotion, tone, and intent
- Action-oriented: Follows voice commands to complete tasks
- End-to-end S2S architecture: lower latency, higher efficiency.
- Dual-resolution design: ~50% lower GPU cost
Leader in multiple benchmarks (OpenAudioBench, MMAU, etc.).
Open, efficient, and deeply useful.
đź”— Try it:
New Qwen3-TTS Lineup: VoiceDesign & VoiceClone
Create, control, and clone voices—faster and more expressive than ever.
VoiceDesign-VD-Flash
- Fully controllable speech via free-form text instructions — tone, rhythm, emotion, persona
- No preset voices. Design your own unique vocal identity
- Outperforms GPT-4o-mini-tts & Gemini-2.5-pro on role-play benchmarks
VoiceClone-VC-Flash
- Clone any voice from just 3 seconds of audio
- Generate speech in 10 languages (CN / EN / JP / ES + more)
- 15% lower WER vs. ElevenLabs & GPT-4o-Audio in multilingual tests
- Context-aware cadence for more natural delivery
đź”— Try it now
Qwen-Image-Edit-2511: Stronger Consistency & Real-World Image Editing
What’s new in 2511:
- Stronger multi-person consistency for group photos and complex scenes
- Built-in popular community LoRAs — no extra tuning required
- Enhanced industrial & product design generation
- Reduced image drift with dramatically improved character & identity consistency
- Improved geometric reasoning, including construction lines and structural edits
From identity-preserving portrait edits to high-fidelity multi-person fusion and practical engineering & design workflows, 2511 pushes image editing to the next level.
đź”— Try it now
đź§© Ecosystem Highlights
Z-Image Turbo: #1 Open-Weight Text-to-Image Model in the Artificial Analysis Image Arena
According to Artificial Analysis, Z-Image Turbo now ranks #1 among all open-weight image models in the Artificial Analysis Image Arena.
Why it leads:
- Only $5/1k images on Alibaba Cloud
- Runs on consumer with just 16GB of memory
- Apache 2.0 open source license A 6B powerhouse that proves: high quality doesn’t require high cost.
✨ Community Spotlights
Portrait Photography: BEYOND REALITY Z IMAGE 1.0 from Nurburgring
This model, fine-tuned from Z-Image-Turbo, optimizes skin textures and environmental details while maintaining analog film aesthetics. It is available in both BF16 and FP8 versions, the latter being compatible with 8GB VRAM hardware.
👉 Try it here
📬 Want More? Stay Updated.
Every week, we bring you:
â—Ź New model releases & upgrades
â—Ź AI research breakthroughs
â—Ź Open-source tools you can use today
â—Ź Community highlights that inspire
👉 Subscribe to The Tongyi Weekly and never miss a release.
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345
Thank you for being part of this journey.
Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models, focusing on the research, development, and innovative applications of AI models across diverse domains. Its research spans large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.

Top comments (0)