DEV Community

Garyvov
Garyvov

Posted on

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

LTX-2.3 Open Source! Audio-Video Sync, 20s 4K Video on GPU

TL;DR: Lightricks just open-sourced LTX-2.3, an upgraded version of LTX-2. Audio-video synchronized generation, 4K resolution support, 20-second duration, runs on local GPU, 18x faster than WAN 2.2!


๐ŸŽฌ The Era of "Audio-Video Sync" for Open Source Video Models

If you've been following AI video generation, you've probably seen the "performances" from big players like Sora and Veo. But today's protagonist is a company called Lightricksโ€”they just open-sourced LTX-2.3, a video model that truly runs locally and generates audio-video synchronously.

This isn't a "toy"โ€”it's a production-grade tool you can use right away.


๐Ÿ”ฅ Core Highlight: Audio-Video Synchronized Generation

The Old Problem

Anyone who's used video generation models knows the common issue:

  • Video is generated, audio needs to be added separately
  • Lip sync doesn't match (lip synchronization is hit-or-miss)
  • Environmental sounds and background music rely entirely on post-production

LTX-2.3's Solution

Audio and video generated simultaneously in one model:

  • โœ… Actions, dialogue, environmental sounds, music generated in sync
  • โœ… Up to 20 seconds of continuous video (with synchronized audio)
  • โœ… First open-source DiT-based audio-video foundation model

This is essentially open-sourcing Sora's core capability.


๐Ÿ“Š Performance Data: 18x Speed

Data Center Performance (H100)

Source: Lightricks Official Blog - Research โ†’ Performance

Metric LTX-2 WAN 2.2 14B
Steps/min (H100) ~18x 1x
Resolution 1080p/1440p/4K -
FPS 24/25/48/50 -
Duration Up to 20s -
Compute Cost 50% reduction -

What does 18x speed mean? What takes others 1 hour, you finish in 3 minutes.

โš ๏ธ Data Note: 18x speed data based on LTX-2 testing. LTX-2.3, as an upgraded version, has similar or better performance.


๐ŸŽฎ Two Modes: Fast vs Pro

Fast Flow

For quick iteration and rapid feedback:

  • Resolution: 1080p/1440p/4K
  • FPS: 24/25/48/50
  • Duration: Up to 20 seconds
  • Low compute load, fast rendering

Pro Flow

For scenarios requiring high quality:

  • Resolution: 1080p/1440p/4K
  • FPS: 24/25/48/50
  • Duration: Up to 20 seconds
  • Enhanced detail and stability

๐Ÿ› ๏ธ Control Capabilities: Frame-Level Precision

1. Depth-Aware Generation

Control scene structure and spatial depth.

Prompt: A cinematic wide aerial shot of a rugged desert mountain range at golden hour. A towering sandstone peak catches warm orange light, overlooking a vast arid basin and layered rocky hills under a soft, hazy pastel sky.

2. OpenPose-Driven Motion

Human pose and motion guidance for precise control.

Prompt: A 3D animated medium shot of a youthful, blond astronaut in a red spacesuit standing inside a sleek white shuttle. He gazes confidently toward a large panoramic window revealing a vibrant galaxy.

3. Camera Control

  • Static shots
  • Dolly in/out
  • Dolly left/right
  • 3D camera logic

๐ŸŽจ LoRA Training: Custom Styles & Characters

Style LoRA

  • Style LoRA: Learn specific styles in minutes
  • Character LoRA: Maintain character consistency
  • Training time: Usually less than 1 hour

๐Ÿ“ฆ Model Versions

Version Description
ltx-2.3-22b-dev Full model, flexible and trainable (bf16)
ltx-2.3-22b-distilled Distilled version, 8 steps, CFG=1
ltx-2.3-22b-distilled-lora-384 LoRA version for full model
ltx-2.3-spatial-upscaler-x2 2x spatial upsampling
ltx-2.3-spatial-upscaler-x1.5 1.5x spatial upsampling
ltx-2.3-temporal-upscaler-x2 2x temporal upsampling

๐Ÿ’ป Local Deployment: Runs on GPU

System Requirements

  • Python >= 3.12
  • CUDA > 12.7
  • PyTorch ~= 2.7

Installation Steps

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# Sync environment with uv
uv sync
source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

ComfyUI Integration

Recommended: Use built-in LTXVideo nodes in ComfyUI Manager for GUI operationโ€”beginner-friendly.


๐ŸŒ Online Experience

Don't want to deploy locally? Try it online:


๐Ÿ“ Summary

Key features of LTX-2.3:

  • โœ… Audio-Video Sync Generation - First open-source DiT-based audio-video foundation model
  • โœ… 4K Resolution Support - 1080p/1440p/4K options
  • โœ… 20-Second Duration - Continuous clips (with synchronized audio)
  • โœ… 18x Speed - LTX-2 is ~18x faster than WAN 2.2 on H100*
  • โœ… Local Deployment - Runs on GPU, no cloud dependency
  • โœ… Open Source & Free - Open-Weights License

*Source: Lightricks Official Blog - Data Center Performance (H100)

If you're working on AI video, short videos, or content creation, this model is worth your attention.


Related Links:


Source: Lightricks Official GitHub, HuggingFace, Technical Report, Official Blog

Top comments (0)