Garyvov

Posted on Mar 8

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

#career

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

TL;DR: Lightricks just open-sourced LTX-2.3, an upgraded version of LTX-2. Audio-video synchronized generation, 4K resolution support, 20-second duration, runs on local GPU, 18x faster than WAN 2.2!

🎬 The Era of "Audio-Video Sync" for Open Source Video Models

If you've been following AI video generation, you've probably seen the "performances" from big players like Sora and Veo. But today's protagonist is a company called Lightricks—they just open-sourced LTX-2.3, a video model that truly runs locally and generates audio-video synchronously.

This isn't a "toy"—it's a production-grade tool you can use right away.

🔥 Core Highlight: Audio-Video Synchronized Generation

The Old Problem

Anyone who's used video generation models knows the common issue:

Video is generated, audio needs to be added separately
Lip sync doesn't match (lip synchronization is hit-or-miss)
Environmental sounds and background music rely entirely on post-production

LTX-2.3's Solution

Audio and video generated simultaneously in one model:

✅ Actions, dialogue, environmental sounds, music generated in sync
✅ Up to 20 seconds of continuous video (with synchronized audio)
✅ First open-source DiT-based audio-video foundation model

This is essentially open-sourcing Sora's core capability.

📊 Performance Data: 18x Speed

Data Center Performance (H100)

Source: Lightricks Official Blog - Research → Performance

Metric	LTX-2	WAN 2.2 14B
Steps/min (H100)	~18x	1x
Resolution	1080p/1440p/4K	-
FPS	24/25/48/50	-
Duration	Up to 20s	-
Compute Cost	50% reduction	-

What does 18x speed mean? What takes others 1 hour, you finish in 3 minutes.

⚠️ Data Note: 18x speed data based on LTX-2 testing. LTX-2.3, as an upgraded version, has similar or better performance.

🎮 Two Modes: Fast vs Pro

Fast Flow

For quick iteration and rapid feedback:

Resolution: 1080p/1440p/4K
FPS: 24/25/48/50
Duration: Up to 20 seconds
Low compute load, fast rendering

Pro Flow

For scenarios requiring high quality:

Resolution: 1080p/1440p/4K
FPS: 24/25/48/50
Duration: Up to 20 seconds
Enhanced detail and stability

🛠️ Control Capabilities: Frame-Level Precision

1. Depth-Aware Generation

Control scene structure and spatial depth.

Prompt: A cinematic wide aerial shot of a rugged desert mountain range at golden hour. A towering sandstone peak catches warm orange light, overlooking a vast arid basin and layered rocky hills under a soft, hazy pastel sky.

2. OpenPose-Driven Motion

Human pose and motion guidance for precise control.

Prompt: A 3D animated medium shot of a youthful, blond astronaut in a red spacesuit standing inside a sleek white shuttle. He gazes confidently toward a large panoramic window revealing a vibrant galaxy.

3. Camera Control

Static shots
Dolly in/out
Dolly left/right
3D camera logic

🎨 LoRA Training: Custom Styles & Characters

Style LoRA

Style LoRA: Learn specific styles in minutes
Character LoRA: Maintain character consistency
Training time: Usually less than 1 hour

📦 Model Versions

Version	Description
ltx-2.3-22b-dev	Full model, flexible and trainable (bf16)
ltx-2.3-22b-distilled	Distilled version, 8 steps, CFG=1
ltx-2.3-22b-distilled-lora-384	LoRA version for full model
ltx-2.3-spatial-upscaler-x2	2x spatial upsampling
ltx-2.3-spatial-upscaler-x1.5	1.5x spatial upsampling
ltx-2.3-temporal-upscaler-x2	2x temporal upsampling

💻 Local Deployment: Runs on GPU

System Requirements

Python >= 3.12
CUDA > 12.7
PyTorch ~= 2.7

Installation Steps

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# Sync environment with uv
uv sync
source .venv/bin/activate

ComfyUI Integration

Recommended: Use built-in LTXVideo nodes in ComfyUI Manager for GUI operation—beginner-friendly.

🌐 Online Experience

Don't want to deploy locally? Try it online:

API Playground: https://console.ltx.video/playground/
LTX Desktop: https://ltx.io/ltx-desktop

📝 Summary

Key features of LTX-2.3:

✅ Audio-Video Sync Generation - First open-source DiT-based audio-video foundation model
✅ 4K Resolution Support - 1080p/1440p/4K options
✅ 20-Second Duration - Continuous clips (with synchronized audio)
✅ 18x Speed - LTX-2 is ~18x faster than WAN 2.2 on H100*
✅ Local Deployment - Runs on GPU, no cloud dependency
✅ Open Source & Free - Open-Weights License

*Source: Lightricks Official Blog - Data Center Performance (H100)

If you're working on AI video, short videos, or content creation, this model is worth your attention.

Related Links:

GitHub: https://github.com/Lightricks/LTX-2
HuggingFace: https://huggingface.co/Lightricks/LTX-2.3
Official Docs: https://docs.ltx.video/
Technical Report: https://huggingface.co/papers/2601.03233
Official Blog: https://ltx.video/blog/introducing-ltx-2

Source: Lightricks Official GitHub, HuggingFace, Technical Report, Official Blog

DEV Community

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

🎬 The Era of "Audio-Video Sync" for Open Source Video Models

🔥 Core Highlight: Audio-Video Synchronized Generation

The Old Problem

LTX-2.3's Solution

📊 Performance Data: 18x Speed

Data Center Performance (H100)

🎮 Two Modes: Fast vs Pro

Fast Flow

Pro Flow

🛠️ Control Capabilities: Frame-Level Precision

1. Depth-Aware Generation

2. OpenPose-Driven Motion

3. Camera Control

🎨 LoRA Training: Custom Styles & Characters

Style LoRA

📦 Model Versions

💻 Local Deployment: Runs on GPU

System Requirements

Installation Steps

ComfyUI Integration

🌐 Online Experience

📝 Summary

Top comments (0)