LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU
TL;DR: Lightricks just open-sourced LTX-2.3, an upgraded version of LTX-2. Audio-video synchronized generation, 4K resolution support, 20-second duration, runs on local GPU, 18x faster than WAN 2.2!
๐ฌ The Era of "Audio-Video Sync" for Open Source Video Models
If you've been following AI video generation, you've probably seen the "performances" from big players like Sora and Veo. But today's protagonist is a company called Lightricksโthey just open-sourced LTX-2.3, a video model that truly runs locally and generates audio-video synchronously.
This isn't a "toy"โit's a production-grade tool you can use right away.
๐ฅ Core Highlight: Audio-Video Synchronized Generation
The Old Problem
Anyone who's used video generation models knows the common issue:
- Video is generated, audio needs to be added separately
- Lip sync doesn't match (lip synchronization is hit-or-miss)
- Environmental sounds and background music rely entirely on post-production
LTX-2.3's Solution
Audio and video generated simultaneously in one model:
- โ Actions, dialogue, environmental sounds, music generated in sync
- โ Up to 20 seconds of continuous video (with synchronized audio)
- โ First open-source DiT-based audio-video foundation model
This is essentially open-sourcing Sora's core capability.
๐ Performance Data: 18x Speed
Data Center Performance (H100)
Source: Lightricks Official Blog - Research โ Performance
| Metric | LTX-2 | WAN 2.2 14B |
|---|---|---|
| Steps/min (H100) | ~18x | 1x |
| Resolution | 1080p/1440p/4K | - |
| FPS | 24/25/48/50 | - |
| Duration | Up to 20s | - |
| Compute Cost | 50% reduction | - |
What does 18x speed mean? What takes others 1 hour, you finish in 3 minutes.
โ ๏ธ Data Note: 18x speed data based on LTX-2 testing. LTX-2.3, as an upgraded version, has similar or better performance.
๐ฎ Two Modes: Fast vs Pro
Fast Flow
For quick iteration and rapid feedback:
- Resolution: 1080p/1440p/4K
- FPS: 24/25/48/50
- Duration: Up to 20 seconds
- Low compute load, fast rendering
Pro Flow
For scenarios requiring high quality:
- Resolution: 1080p/1440p/4K
- FPS: 24/25/48/50
- Duration: Up to 20 seconds
- Enhanced detail and stability
๐ ๏ธ Control Capabilities: Frame-Level Precision
1. Depth-Aware Generation
Control scene structure and spatial depth.
Prompt: A cinematic wide aerial shot of a rugged desert mountain range at golden hour. A towering sandstone peak catches warm orange light, overlooking a vast arid basin and layered rocky hills under a soft, hazy pastel sky.
2. OpenPose-Driven Motion
Human pose and motion guidance for precise control.
Prompt: A 3D animated medium shot of a youthful, blond astronaut in a red spacesuit standing inside a sleek white shuttle. He gazes confidently toward a large panoramic window revealing a vibrant galaxy.
3. Camera Control
- Static shots
- Dolly in/out
- Dolly left/right
- 3D camera logic
๐จ LoRA Training: Custom Styles & Characters
Style LoRA
- Style LoRA: Learn specific styles in minutes
- Character LoRA: Maintain character consistency
- Training time: Usually less than 1 hour
๐ฆ Model Versions
| Version | Description |
|---|---|
| ltx-2.3-22b-dev | Full model, flexible and trainable (bf16) |
| ltx-2.3-22b-distilled | Distilled version, 8 steps, CFG=1 |
| ltx-2.3-22b-distilled-lora-384 | LoRA version for full model |
| ltx-2.3-spatial-upscaler-x2 | 2x spatial upsampling |
| ltx-2.3-spatial-upscaler-x1.5 | 1.5x spatial upsampling |
| ltx-2.3-temporal-upscaler-x2 | 2x temporal upsampling |
๐ป Local Deployment: Runs on GPU
System Requirements
- Python >= 3.12
- CUDA > 12.7
- PyTorch ~= 2.7
Installation Steps
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# Sync environment with uv
uv sync
source .venv/bin/activate
ComfyUI Integration
Recommended: Use built-in LTXVideo nodes in ComfyUI Manager for GUI operationโbeginner-friendly.
๐ Online Experience
Don't want to deploy locally? Try it online:
- API Playground: https://console.ltx.video/playground/
- LTX Desktop: https://ltx.io/ltx-desktop
๐ Summary
Key features of LTX-2.3:
- โ Audio-Video Sync Generation - First open-source DiT-based audio-video foundation model
- โ 4K Resolution Support - 1080p/1440p/4K options
- โ 20-Second Duration - Continuous clips (with synchronized audio)
- โ 18x Speed - LTX-2 is ~18x faster than WAN 2.2 on H100*
- โ Local Deployment - Runs on GPU, no cloud dependency
- โ Open Source & Free - Open-Weights License
*Source: Lightricks Official Blog - Data Center Performance (H100)
If you're working on AI video, short videos, or content creation, this model is worth your attention.
Related Links:
- GitHub: https://github.com/Lightricks/LTX-2
- HuggingFace: https://huggingface.co/Lightricks/LTX-2.3
- Official Docs: https://docs.ltx.video/
- Technical Report: https://huggingface.co/papers/2601.03233
- Official Blog: https://ltx.video/blog/introducing-ltx-2
Source: Lightricks Official GitHub, HuggingFace, Technical Report, Official Blog
Top comments (0)