A 5-second AI video clip used to cost between $0.25 and $0.50 on platforms like Runway. Through deAPI, the same clip runs on LTX-Video models for $0.005 to $0.053 depending on model and resolution. This tutorial gets you from zero to a generated video in Python.
We'll cover three generation modes: text-to-video, image-to-video (animating a still image), and audio-to-video (lip-synced clips from an audio file). All three run through the same API with the same authentication.
What you'll need
- Python 3.8+
- The requests library (pip install requests)
- A deAPI account - sign up at app.deapi.ai/dashboard and grab your API key from Settings -> API Keys. $5 in free credits, no card required.
Pick your model
deAPI runs three LTX-Video models. Each trades speed for quality:
| Model | Slug | Max resolution | Audio sync | Price (768x768) |
|---|---|---|---|---|
| LTX-Video 13B | Ltxv_13B_0_9_8_Distilled_FP8 | 768x768 | No | ~$0.009 (~4s max) |
| LTX-2 19B | Ltx2_19B_Dist_FP8 | 1024x1024 | No | ~$0.041 (5s) |
| LTX-2.3 22B | Ltx2_3_22B_Dist_INT8 | 1024x1024 | Yes | ~$0.047 (5s) |
Start with LTX-Video 13B for testing - it's the fastest and cheapest. Move to LTX-2.3 when you need higher quality or audio synchronization.
Your first video
Every deAPI request follows an async pattern: send a generation request, get back a request_id, poll until the result is ready. Here's the complete flow:
import requests
import time
API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v2"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Accept": "application/json",
"Content-Type": "application/json"
}
# Generate a clip (120 frames at 30 fps = ~4 seconds, the LTX-Video 13B max)
response = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json={
"prompt": "A lighthouse on a rocky cliff during a storm at night. Rain hammers the rocks while the beam cuts through thick fog. Waves crash against the base, sending spray upward. Camera slowly pushes in from a wide shot. Cinematic, 35mm film grain.",
"model": "Ltxv_13B_0_9_8_Distilled_FP8",
"width": 512,
"height": 512,
"frames": 120,
"fps": 30,
"steps": 1,
"guidance": 7.5,
"seed": 42
})
request_id = response.json()["data"]["request_id"]
print(f"Request ID: {request_id}")
# Poll for the result
while True:
result = requests.get(
f"{BASE}/jobs/{request_id}",
headers=HEADERS
).json()
status = result["data"]["status"]
print(f"Status: {status}")
if status == "done":
video_url = result["data"]["result_url"]
print(f"Video URL: {video_url}")
video = requests.get(video_url)
with open("output.mp4", "wb") as f:
f.write(video.content)
print("Saved to output.mp4")
break
if status == "error":
print(f"Error: {result['data']}")
break
time.sleep(3)
Generation takes 30-90 seconds depending on the model and resolution. The result URL returns an MP4 file.
Writing prompts that work
LTX-Video reads prompts like a language model, not a keyword parser. Three things make the biggest difference:
First: write in full sentences. "A 35-year-old woman with dark hair speaks to the camera in a modern office" beats "woman, dark hair, office, talking, 35yo." Front-load your subject - the model weighs earlier words more heavily.
Micro-motions matter more than you'd expect. Without explicit instructions like "subtle head nods" or "natural blinks," faces tend to freeze. A prompt that says "she speaks expressively, gesturing with her right hand" produces a fundamentally different clip than "a woman talking."
Camera movement is the other lever. Static or slow-dolly shots pair best with dialogue and faces. Save the sweeping drone moves for wide establishing shots where face coherence doesn't matter.
Aim for 60-200 words in your prompt. Below 60, the output gets generic. Past 200, you give the model enough detail to stay coherent across the full clip.
Image-to-video: animate a still
The image-to-video endpoint takes a still image and a motion prompt, then generates a clip where that image comes to life. Product photos and portraits work especially well - anything with a clear subject and defined edges.
import requests
import time
API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v2"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Accept": "application/json"}
with open("portrait.jpg", "rb") as img:
response = requests.post(f"{BASE}/videos/animations", headers=HEADERS,
data={
"prompt": "The woman slowly turns her head to the right, smiles gently, and looks directly at the camera. Soft natural light from a window. Shallow depth of field.",
"model": "Ltx2_19B_Dist_FP8",
"width": 768,
"height": 768,
"frames": 120,
"fps": 24,
"steps": 8,
"guidance": 1,
"seed": 42
},
files={"first_frame_image": ("portrait.jpg", img, "image/jpeg")}
)
request_id = response.json()["data"]["request_id"]
while True:
result = requests.get(
f"{BASE}/jobs/{request_id}", headers=HEADERS
).json()
if result["data"]["status"] == "done":
video = requests.get(result["data"]["result_url"])
with open("animated_portrait.mp4", "wb") as f:
f.write(video.content)
print("Saved to animated_portrait.mp4")
break
if result["data"]["status"] == "error":
print(f"Error: {result['data']}")
break
time.sleep(3)
LTX-2 and LTX-2.3 also support a last_frame_image parameter. Pin both the start and end frames, and the model interpolates the motion between them - useful for controlled transitions in product demos.
Audio-to-video: lip-sync from speech
This is where LTX-2.3 stands alone. Feed it an audio file alongside a text prompt, and the generated character's mouth movements sync to the speech. Phoneme-level accuracy, not just generic "mouth opening and closing."
import requests
import time
API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v2"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Accept": "application/json"}
with open("narration.mp3", "rb") as audio:
response = requests.post(f"{BASE}/videos/audio-syncs", headers=HEADERS,
data={
"prompt": "A male news anchor in his 40s sits behind a studio desk, speaking directly to camera. Professional lighting, shallow depth of field on the background. He gestures occasionally with his right hand while maintaining eye contact.",
"model": "Ltx2_3_22B_Dist_INT8",
"width": 768,
"height": 768,
"frames": 120,
"fps": 24,
"seed": 42
},
files={"audio": ("narration.mp3", audio, "audio/mpeg")}
)
request_id = response.json()["data"]["request_id"]
while True:
result = requests.get(
f"{BASE}/jobs/{request_id}", headers=HEADERS
).json()
if result["data"]["status"] == "done":
video = requests.get(result["data"]["result_url"])
with open("lipsync_video.mp4", "wb") as f:
f.write(video.content)
print("Saved to lipsync_video.mp4")
break
if result["data"]["status"] == "error":
print(f"Error: {result['data']}")
break
time.sleep(3)
Audio files can be up to 11 seconds long (MP3, WAV, OGG, or FLAC). For the strongest results, combine audio with a first_frame_image - the image locks the character's appearance while the audio drives every lip movement.
What it costs
| Model | 512x512 | 768x768 | 1024x1024 |
|---|---|---|---|
| LTX-Video 13B | ~$0.006 | ~$0.009 | - |
| LTX-2 19B | - | ~$0.041 | ~$0.046 |
| LTX-2.3 22B | - | ~$0.047 | ~$0.053 |
Note on clip length: LTX-Video 13B runs at a fixed 30 fps, so its 120-frame maximum is a ~4-second clip (the 512x512 figure above is for that clip). LTX-2 and LTX-2.3 run at 24 fps, so 120 frames is a true 5-second clip.
For comparison, Runway Gen-3 charges $0.25-0.50 per 5-second clip. Generating 100 test clips on LTX-Video 13B costs about $0.60 total - the kind of budget where you can iterate freely without watching a billing dashboard.
The $5 free credit covers roughly 800 clips on LTX-Video 13B, or about 100 on LTX-2.3 at 768x768.
What's next
- Full API docs - all video endpoints, parameters, and model specs
- Playground - generate videos in the browser before writing code
- LTX-2.3 prompting guide - deep dive into prompt structure with worked examples
Pair video generation with deAPI's text-to-speech endpoint and you have a complete talking-head pipeline: generate the narration with Kokoro or Qwen3 TTS, then feed that audio straight into LTX-2.3. The lip-synced clip comes back ready to use.
Sign up at deapi.ai to get $5 in free credits. Your first video generates in under two minutes.
Top comments (1)
deAPI is really cheap!