Meituan Open-Sources LongCat-Video-Avatar 1.5: Digital Humans Go Commercial
On July 3, 2026, Meituan's technical team officially open-sourced LongCat-Video-Avatar 1.5, marking the transition of digital human video generation from experimental SOTA performance to commercial-grade utility.
From SOTA to Commercial: What's the Difference?
In AI, there's a huge gap between topping benchmarks and real-world deployment. A model can score highest on tests but be completely unusable in production - because commercial use demands not just "good results" but also stability, efficiency, and controllability.
Version 1.5's upgrades address all three:
- Lip synchronization - improved audio-visual alignment eliminates "mouth moves but audio doesn't match"
- Physical plausibility - physics constraints reduce visual artifacts like clipping
- Long-video stability - temporal consistency training significantly improves 30+ second video quality
Commercial Scenarios
Meituan's use cases align with its business: food delivery customer service (24/7 digital agents), brand IP (customized digital spokespeople), and livestream commerce (digital hosts for product showcases).
The decision to open-source rather than commercialize the model is notable. The likely logic: digital human generation technology is rapidly commoditizing, so building an ecosystem through open-source and monetizing at the application layer makes more sense than hoarding the model.
A Systematic AI Research Showcase
Meituan also released VitaBench 2.0 (long-term user modeling benchmark), WBench (interactive video world model evaluation), an AIGC poster generation framework, and multiple papers at ICML 2026 and ACL 2026 - all on the same day. This is a clear signal of Meituan transforming from an application company to a technology company.
Why This Open Source Matters
LongCat-Video-Avatar 1.5 stands out because: it's validated in real business scenarios (Meituan's food delivery at scale), it's fully engineered (inference optimization, batch generation, quality control), and it supports multi-person interaction - something most similar models can't do.
For developers wanting to build digital human applications without expensive API fees, this is a practical starting point.
This article was first published on Deskless Daily. Follow for more AI-driven tech content.
Top comments (0)