DEV Community

Tongyi Lab
Tongyi Lab

Posted on

Jan 9, 2026 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

🎄 Happy New Year!
We hope you enjoyed a restful holiday season filled with joy, creativity, and maybe even a few AI experiments by the fireplace. As we step into 2026, we’re more inspired than ever by what this community has built — and what we’ll create together in the year ahead.
To kick off the new year, we’re thrilled to give you our first gift of 2026: *Wan App is now live on iOS & Android! *🎁
Please note: Wan App is rolling out gradually and may not yet be available in all countries or regions. We’re working hard to bring it to you as quickly as possible. Scan the QR code below and give it a try!

This week also brings groundbreaking releases, let’s dive in.

👉 Subscribe to The Tongyi Weekly and never miss a release:
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345


📣 Model Release & Updates

Qwen-Image-2512: Finer Details, Greater Realism
We are thrilled to announce the Qwen-Image-2512 open-source release! This December update pushes the boundaries of our text-to-image foundational model, moving from "AI-generated" looks to true photorealism.

  • Enhanced Human Realism : We’ve eliminated the artificial "AI look" by capturing intricate facial details—like wrinkles and pores—and ensuring better adherence to body postures.
  • Finer Natural Detail : Experience notably more detailed rendering of landscapes, misty waterfalls, and animal fur with distinct, individual strands.
  • Advanced Text Rendering : Achieve professional-grade layout for complex infographics and PPT slides with unprecedented textual accuracy.
    Try it now:

  • Qwen Chat

  • Hugging Face

  • ModelScope

  • GitHub

  • Blog

  • Hugging Face Demo

  • ModelScope Demo

  • API

Qwen Code v0.6.0: Smarter, More Connected
Your AI coding assistant just got better:

  • Experimental Skills: Introduced experimental Skills feature for extended capabilities
  • VS Code Enhancements: Improved extension description with download links and clickable bash toolcall outputs
  • Commands Support: Added /compress and /summary commands for non-interactive & ACP usage
  • Multi-Provider Support: Added Gemini and Anthropic providers with normalized authentication configuration
  • Enhancements & Stability: Improved testing reliability with fixed flaky integration tests, enhanced Windows compatibility through CLI path resolution, updated OAuth client for Figma MCP server, streamlined SDK release workflows, and clearer README documentation for faster onboarding. 🔗 Check out the full changelog 👉 Get started in Terminal
npm install -g @qwen-code/qwen-code@latest
Enter fullscreen mode Exit fullscreen mode

MAI-UI: The Foundation GUI Agent Family
We’re releasing MAI-UI—a family of foundation GUI agents. It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing state-of-the-art results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld.
To meet real-world deployment constrains, MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants. We open-sourced two models: MAI-UI-2B and MAI-UI-8B.
Technical Highlight:

  • MCP tool use: MAI-UI natively support MCP tool use, compressing long, fragile UI operation sequences into a few API calls.
  • Agent user interaction: MAI-UI proactively ask clarifying questions when user instructions are ambiguous or incomplete.
  • Device-cloud collaboration: MAI-UI can dynamically select on-device or cloud execution based on task execution state and data sensitivity.
  • Online RL: Significant experimental gains from scaling parallel environments from 32 to 512 (+5.2 points) and increasing environment step budget from 15 to 50 (+4.3 points).
    Get started:

  • GitHub

  • Project page

  • MobileWorld benchmark

  • MobileWorld homepage

Qwen3-VL-Embedding & Qwen3-VL-Reranker: Advanced Multimodal Retrieval & Cross-Modal Understanding
Meet Qwen3-VL-Embedding and Qwen3-VL-Reranker:

  • Built upon the robust Qwen3-VL foundation model
  • Processes text, images, screenshots, videos, and mixed modality inputs
  • Supports 30+ languages
  • Achieves state-of-the-art performance on multimodal retrieval benchmarks

Two-stage retrieval architecture:

  • Embedding Model – generates semantically rich vector representations in a unified embedding space
  • Reranker Model – computes fine-grained relevance scores for enhanced retrieval accuracy

Developer-friendly capabilities:


🧠 Research Breakthroughs

MobileWorld: A Next-Gen Benchmark for Real-World Mobile Agents
Meet MobileWorld — a revolutionary benchmark from the MAI Team at Tongyi Lab that transcends the limitations of traditional ones by realistically simulating users’ complex real-world demands:

  • Substantially increased task difficulty: Featuring long-horizon, cross-app workflows, tasks require an average of 27.8 steps (nearly double that of AndroidWorld), with 62.2% of tasks necessitating coordination across multiple apps, ensuring strong alignment with real-life usage scenarios.
  • Novel task paradigms: Introducing agent-user interaction tasks and MCP-argumented tasks, which challenge agents’ abilities to interpret ambiguous instructions and make tool-calling decisions.
  • A robust and reproducible evaluation environment: Built on a self-hosted app ecosystem, Docker containers, and AVD snapshots, this infrastructure guarantees consistent, fair, and replicable experimental conditions. Evaluation results reveal a stark reality: even the current state-of-the-art (SOTA) models achieve only a 51.7% success rate, with end-to-end models peaking at just 20.9%. On the agent-user interaction and MCP-argumented tasks, mainstream agents’ success rate drops nearly to zero, highlighting a significant gap between agent's capabilities and real-world deployment readiness.

The codebase is now open-source:


🧩 Ecosystem Highlights

Higging Face Wrapped 2025: 2 Papers from Qwen Upvoted As Top10
We’re honored that both the Qwen3 Technical Reportand Group Sequence Policy Optimization (GSPO)were featured in Hugging Face’s Wrapped 2025 Top 10 most upvoted papers.
Thank you to the entire Qwen team — and to you, our community — for your upvotes.


✨ Community Spotlights

See Qwen3-VL “Think” Before It Speaks: comfyui-prompt-generator from d3cker
We are stoked to recommend the "comfyui-prompt-generator" by d3cker. This custom node is a total powerhouse, especially when using Qwen3-VL-8B-Thinking—it actually displays its "thinking process" before spitting out the perfect prompt.
👉 Try it here

AnyPose LoRA: AnyPose from lilylilith
Made in mind with the new Qwen Image Edit 2511 lightning LoRA for fast inference, with just a single reference image as a pose guide, you can pilot any image to follow that pose with this LoRA.
👉 Try it here

Upscale2K LoRA: Qwen-Image-Edit-2511-Upscale2K from valiantcat
This is a model for High-definition magnification of the picture, trained on Qwen/Qwen-Image-Edit-2511, and it is mainly used for losslessly enlarging images to approximately 2K size, injecting a serious dose of clarity and texture into every frame.
👉 Try it here

Speed Meets Aesthetics: Qwen-Image-2512 Turbo V2.0 from Wuli-art
Wuli Team has released V2.0 of their Qwen-Image-2512 Turbo LoRA.
Optimized for 4-8 steps, it offers a perfect balance of insane speed and high-aesthetic output. A vital resource for efficient local deployment and high-fidelity generation.
👉 Try it here


📬 Want More? Stay Updated.

Every week, we bring you:

  • New model releases & upgrades
  • AI research breakthroughs
  • Open-source tools you can use today
  • Community highlights that inspire

👉 Subscribe to The Tongyi Weekly and never miss a release.
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345

Thank you for being part of this journey.

Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models, focusing on the research, development, and innovative applications of AI models across diverse domains. Its research spans large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.

Top comments (0)