TL;DR
The Grok text-to-video API generates video from a text prompt. You call POST /v1/videos/generations, get a request_id immediately, then poll GET /v1/videos/{request_id} until status is "done". The model is grok-imagine-video, pricing starts at $0.05 per second at 480p, and the xAI Python SDK can handle polling automatically.
Introduction
xAI generated 1.2 billion videos in January 2026 alone. That was the first month after launching the Grok text-to-video API on January 28, 2026. The model also ranked number one on the Artificial Analysis text-to-video leaderboard that same month. Those numbers matter because they show the infrastructure has already been tested at scale.
This guide shows you how to:
- Make your first text-to-video request
- Poll for the generated video
- Tune duration, resolution, and aspect ratio
- Write better prompts
- Use reference images
- Extend or edit existing videos
- Test the async polling flow without spending credits on every frontend test
The API is async. Your frontend should not block while waiting for video generation. Instead, it needs to render loading, success, and error states while polling for completion.
If you're building a video generation UI, mock the generation and polling endpoints during development. Apidog's Smart Mock can simulate both endpoints so your team can build the player UI before the backend flow is finalized.
What is the Grok text-to-video API?
The Grok text-to-video API is part of xAI's media generation suite at https://api.x.ai.
You send a text prompt to the grok-imagine-video model, and the API generates a short video clip from scratch. No source image is required.
The API sits alongside:
- A synchronous image generation endpoint:
POST /v1/images/generations - The
grok-imagine-imagemodel - Video extension and editing endpoints
The text-to-video endpoint is different from image-to-video generation because you provide only words. The model creates the scene, motion, composition, and visual style from your prompt.
Use text-to-video when you want the model to create the scene from scratch. Use image-to-video when you already have a source image and want to animate it.
How text-to-video generation works
Most API calls are synchronous:
- Send a request
- Wait briefly
- Receive the final response
Video generation takes longer, so the Grok video API uses an async pattern:
- Send a
POSTrequest with your prompt - Receive a
request_idimmediately - Poll a
GETendpoint with thatrequest_id - Continue polling while
statusis"processing" - Stop when
statusbecomes"done" - Read the generated video URL from the response
Flow:
POST /v1/videos/generations
↓
{ "request_id": "..." }
↓
GET /v1/videos/{request_id}
↓
status: processing
↓
GET /v1/videos/{request_id}
↓
status: done
↓
video.url
This keeps HTTP connections short and lets your app decide how often to poll.
Prerequisites
Before writing code, create the following:
- An xAI account at
console.x.ai - An API key from the xAI console
- Billing access enabled for generation requests
Store your API key as an environment variable instead of hardcoding it:
export XAI_API_KEY="your_api_key_here"
If you want to use the xAI Python SDK:
pip install xai-sdk
For raw HTTP requests:
pip install requests
Your first text-to-video request
Endpoint:
POST https://api.x.ai/v1/videos/generations
Required fields:
| Field | Value |
|---|---|
model |
grok-imagine-video |
prompt |
Your video description |
Using curl
curl -X POST https://api.x.ai/v1/videos/generations \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting"
}'
Response:
{
"request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
That request_id is used to retrieve the completed video.
Using Python with requests
import os
import requests
API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
payload = {
"model": "grok-imagine-video",
"prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting",
}
response = requests.post(
f"{BASE_URL}/v1/videos/generations",
headers=headers,
json=payload,
)
response.raise_for_status()
data = response.json()
request_id = data["request_id"]
print(f"Generation started. Request ID: {request_id}")
Polling for the video result
After receiving a request_id, poll:
GET /v1/videos/{request_id}
The status field can be:
| Status | Meaning |
|---|---|
processing |
The video is still generating |
done |
The video is complete and the URL is available |
failed |
The generation failed |
Python polling loop
import os
import time
import requests
API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"
headers = {
"Authorization": f"Bearer {API_KEY}",
}
def poll_video(request_id: str, interval: int = 5, max_attempts: int = 60) -> dict:
"""Poll until video generation is complete."""
url = f"{BASE_URL}/v1/videos/{request_id}"
for attempt in range(max_attempts):
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
status = data.get("status")
progress = data.get("progress", 0)
print(f"Attempt {attempt + 1}: status={status}, progress={progress}%")
if status == "done":
return data
if status == "failed":
raise RuntimeError(f"Video generation failed: {data}")
time.sleep(interval)
raise TimeoutError(f"Video not ready after {max_attempts} attempts")
Full generate-and-poll workflow
import os
import time
import requests
API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"
headers = {
"Authorization": f"Bearer {API_KEY}",
}
def poll_video(request_id: str, interval: int = 5, max_attempts: int = 60) -> dict:
url = f"{BASE_URL}/v1/videos/{request_id}"
for attempt in range(max_attempts):
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
status = data.get("status")
progress = data.get("progress", 0)
print(f"Attempt {attempt + 1}: status={status}, progress={progress}%")
if status == "done":
return data
if status == "failed":
raise RuntimeError(f"Video generation failed: {data}")
time.sleep(interval)
raise TimeoutError(f"Video not ready after {max_attempts} attempts")
def generate_video(prompt: str) -> str:
"""Generate a video and return its URL."""
response = requests.post(
f"{BASE_URL}/v1/videos/generations",
headers={**headers, "Content-Type": "application/json"},
json={
"model": "grok-imagine-video",
"prompt": prompt,
},
)
response.raise_for_status()
request_id = response.json()["request_id"]
print(f"Request ID: {request_id}")
result = poll_video(request_id)
video_url = result["video"]["url"]
print(f"Video ready: {video_url}")
return video_url
video_url = generate_video(
"A timelapse of a city skyline at sunset transitioning to night, aerial view"
)
When complete, the poll response looks like this:
{
"status": "done",
"video": {
"url": "https://vidgen.x.ai/....mp4",
"duration": 8,
"respect_moderation": true
},
"progress": 100,
"usage": {
"cost_in_usd_ticks": 500000000
}
}
Using the xAI Python SDK
If you do not want to implement polling yourself, use the xAI SDK. The client.video.generate() method blocks until the video is ready.
from xai_sdk import Client
import os
client = Client(api_key=os.environ["XAI_API_KEY"])
result = client.video.generate(
model="grok-imagine-video",
prompt="A golden retriever running through autumn leaves in slow motion",
duration=8,
resolution="720p",
aspect_ratio="16:9",
)
print(f"Video URL: {result.video.url}")
print(f"Duration: {result.video.duration}s")
Use the SDK when you want the shortest path to working code.
Use raw HTTP requests when you need:
- Custom retry behavior
- Frontend progress updates
- Custom polling intervals
- More detailed logging
- Test control over
processing,done, andfailedstates
Writing effective prompts for video generation
Your prompt is the most important input. A specific prompt usually produces better results than a vague one.
A useful structure:
[subject and scene].
[motion].
[camera behavior].
[style, lighting, and mood].
1. Describe the scene clearly
Weak:
A coffee mug.
Better:
A white ceramic coffee mug on a wooden table beside a rain-streaked window.
2. Add explicit motion
Weak:
A coffee mug on a table.
Better:
A white ceramic coffee mug on a wooden table. Steam curls upward while raindrops slide down the window behind it.
3. Specify the camera style
Use terms like:
close-uptracking shotoverhead drone viewhandheldslow dolly incamera orbitwide establishing shot
Example:
The camera slowly orbits the mug as steam rises from the coffee.
4. Define lighting and mood
Lighting examples:
golden hourovercastneon-litstudio three-point lightingsoft window light
Mood examples:
melancholiccalmenergeticcinematicdreamlike
Example:
Foggy morning, soft window light, quiet melancholic mood.
5. Add style references in text
You can guide the visual format with terms like:
cinematicdocumentaryanimestop-motionhyperlapseIMAX-styleproduct commercial
Prompt template
A lone astronaut floats past the International Space Station,
tether drifting behind them. The camera tracks slowly alongside,
showing Earth below. Cinematic, IMAX quality, warm sunrise light
reflecting off the visor.
Controlling resolution, duration, and aspect ratio
The generation endpoint accepts optional parameters for output length and dimensions.
Duration
{
"duration": 10
}
Range:
- Minimum:
1second - Maximum:
15seconds - Default:
6seconds
Longer videos cost more. For example, a 10-second clip at 480p costs $0.50.
Resolution
{
"resolution": "720p"
}
Options:
| Resolution | Use case |
|---|---|
480p |
Default, prototyping, cheaper tests |
720p |
Production output where quality matters |
Aspect ratio
{
"aspect_ratio": "9:16"
}
Available ratios:
| Ratio | Best for |
|---|---|
16:9 |
Desktop, YouTube, presentations |
9:16 |
TikTok, Instagram Reels, mobile |
1:1 |
Instagram feed, social cards |
4:3 |
Classic video, presentations |
3:4 |
Portrait mobile content |
3:2 |
Standard photo ratio |
2:3 |
Portrait photography |
Full request with all parameters
curl -X POST https://api.x.ai/v1/videos/generations \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
"duration": 10,
"resolution": "720p",
"aspect_ratio": "16:9"
}'
Using reference images to guide video style
The reference_images parameter accepts an array of up to 7 image URLs.
These images guide the style and content of the generated video, but they do not become the source frame.
Example:
{
"model": "grok-imagine-video",
"prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
"reference_images": [
{
"url": "https://example.com/my-style-reference.jpg"
},
{
"url": "https://example.com/color-palette-reference.jpg"
}
]
}
Reference images work best when they share a consistent aesthetic. Avoid mixing unrelated styles unless you intentionally want the model to blend them.
Use reference images to guide:
- Color grading
- Composition
- Texture
- Lighting style
- Overall visual mood
Do not confuse reference images with image-to-video. In text-to-video with reference images, the prompt still drives the scene. In image-to-video, the source image becomes the first frame.
Extending and editing generated videos
xAI provides two additional endpoints for videos you have already generated.
Extend a video
POST /v1/videos/extensions
Use this endpoint to add more footage to an existing generated video.
You pass:
- The
request_idof the original video - A new prompt for the extension
This is useful when you want a longer sequence without generating more than 15 seconds in a single request.
Edit a video
POST /v1/videos/edits
Use this endpoint to modify an existing generated video with a text instruction.
Examples:
- Change the visual style
- Alter scene details
- Apply effects
- Adjust the look of an existing clip
Both endpoints use the same async pattern:
- Send the request
- Receive a
request_id - Poll
GET /v1/videos/{request_id} - Wait for
status: "done"
Reading the cost from the API response
The completed poll response includes a usage object:
{
"usage": {
"cost_in_usd_ticks": 500000000
}
}
The unit is USD ticks. Divide by 10,000,000 to convert ticks to dollars.
cost_in_usd = result["usage"]["cost_in_usd_ticks"] / 10_000_000
print(f"Cost: ${cost_in_usd:.4f}")
Output:
Cost: $0.0500
Pricing reference
| Resolution | Price per second | 10-second clip |
|---|---|---|
480p |
$0.05 | $0.50 |
720p |
$0.07 | $0.70 |
A value of 500000000 ticks equals $0.50. That is a 10-second clip at 480p.
For production systems, log cost_in_usd_ticks from every completed response. This gives you a simple usage dashboard without querying billing separately.
Example log payload:
{
"request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf",
"status": "done",
"duration": 10,
"resolution": "480p",
"cost_in_usd_ticks": 500000000
}
How to test your Grok video API with Apidog
The async polling pattern creates a frontend testing problem.
Your UI needs to handle:
- Loading while polling
- Success when the video URL is available
- Failure when generation fails
Testing those states with real API calls costs money and takes time. Apidog's Smart Mock lets you define mock responses for both endpoints and test the full flow instantly.
Use case 1: Mock the frontend flow with Smart Mock
You need to mock two endpoints:
POST /v1/videos/generations
GET /v1/videos/{request_id}
Mock the generation endpoint
In Apidog:
- Create
POST /v1/videos/generations - Define the response schema with a
request_idstring field - Enable Smart Mock
Mock response:
{
"request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
Mock the polling endpoint
Create:
GET /v1/videos/{request_id}
Define the response schema with:
statusvideo.urlvideo.durationvideo.respect_moderationprogressusage.cost_in_usd_ticks
Mock successful response:
{
"status": "done",
"video": {
"url": "https://vidgen.x.ai/mock-video-12345.mp4",
"duration": 8,
"respect_moderation": true
},
"progress": 100,
"usage": {
"cost_in_usd_ticks": 400000000
}
}
To test loading state, return:
{
"status": "processing",
"progress": 45
}
To test failure state, return:
{
"status": "failed",
"progress": 100
}
Now frontend developers can build the complete video player flow without spending real API credits.
Use case 2: Validate polling with Test Scenarios
After your integration is working, use Apidog Test Scenarios to automate the generate-then-poll flow.
Step 1: Add the generate request
Add this request as the first step:
POST /v1/videos/generations
In the post-processor, extract request_id using JSONPath:
$.request_id
Store it as:
videoRequestId
Step 2: Add the polling request
Add this request as the second step:
GET /v1/videos/{{videoRequestId}}
Wrap it in a loop.
Break condition:
response.body.status == "done"
Add a wait processor between iterations:
5 seconds
This avoids hammering the endpoint.
Step 3: Assert the final result
Add an assertion to the final GET response:
$.video.url is not empty
This confirms the async flow completed successfully.
You can run this scenario in CI to catch regressions when polling logic changes.
Text-to-video vs image-to-video: which should you use?
Both modes use the grok-imagine-video model, but they solve different problems.
Choose text-to-video when
- You are generating original content from a concept or script
- You want the model to control the composition
- Users provide text prompts
- You do not have a source image
Choose image-to-video when
- You have a product photo, illustration, or brand asset to animate
- You need to preserve details from an existing image
- You are creating consistent animations from related images
- You want to animate your own artwork or photography
The key distinction:
Text-to-video creates a scene from scratch.
Image-to-video makes an existing image move.
For products that support both modes, route requests based on input type:
def choose_generation_mode(prompt: str, image_url: str | None):
if image_url:
return "image-to-video"
return "text-to-video"
If the user uploads an image, route to the image-to-video flow. If the user provides only a prompt, route to:
POST /v1/videos/generations
Common errors and fixes
401 Unauthorized
Your API key is missing, expired, or incorrectly formatted.
Check that your header is exactly:
Authorization: Bearer YOUR_XAI_API_KEY
Also confirm that the key is active in the xAI console.
429 Too Many Requests
You hit a rate limit.
The API allows:
- 60 requests per minute
- 1 request per second
Fixes:
- Add delays between requests
- Poll every 5 to 10 seconds
- Avoid tight polling loops
status: "failed" in the poll response
The generation failed.
This usually means the prompt was rejected by content moderation. If respect_moderation is true, moderation was applied.
Fixes:
- Revise the prompt
- Remove ambiguous wording
- Remove potentially sensitive language
- Try a more specific and neutral scene description
Video URL returns 404
Generated video URLs expire after a period of time.
Fix:
Download the MP4 to your own storage immediately after retrieving video.url.
Do not store the generated URL and assume it will work days later.
Empty or frozen video
Vague prompts or prompts without motion cues can produce minimal movement.
Weak:
A car on a road.
Better:
A red sports car speeds along a winding mountain road. The camera follows from behind as trees blur past on both sides.
Add:
- What moves
- Direction of movement
- Speed
- Camera behavior
Slow generation or polling
720p videos take longer than 480p. Longer durations also take more time.
For development, use:
{
"duration": 3,
"resolution": "480p"
}
Then switch to longer 720p generations for production output.
Conclusion
The Grok text-to-video API follows a simple async workflow:
- Send a prompt to
POST /v1/videos/generations - Receive a
request_id - Poll
GET /v1/videos/{request_id} - Wait for
status: "done" - Read the MP4 URL from
video.url
Once your polling loop works, the rest of the integration is mostly parameter tuning.
For production:
- Track
cost_in_usd_ticks - Download generated videos to your own storage
- Poll at reasonable intervals
- Handle
processing,done, andfailed - Mock both endpoints during frontend development
- Add automated tests for the async flow
Use Apidog to mock the Grok video endpoints and validate your polling logic before spending credits on real generations.
FAQ
What model name do I use for text-to-video generation?
Use:
grok-imagine-video
This is the required model value for:
POST /v1/videos/generations
How long does video generation take?
It depends on duration and resolution.
Short 480p clips may complete in under 30 seconds. Longer 720p clips can take a few minutes.
Poll every 5 to 10 seconds instead of continuously calling the endpoint.
Can I generate a video longer than 15 seconds?
Not in a single request.
The maximum duration is 15 seconds. To create longer videos, generate a clip and then use:
POST /v1/videos/extensions
How do I download the generated video?
Use the URL from the completed poll response:
video_url = result["video"]["url"]
Download the MP4 to your own storage immediately. The URL is temporary and will expire.
What happens if my prompt violates content moderation?
The job can return:
{
"status": "failed"
}
The respect_moderation field indicates that moderation was applied. Revise the prompt and try again.
Is there a free tier for the video API?
xAI charges per second of output generated. There is no free tier specifically for video generation. Check console.x.ai for current credit offers for new accounts.
How do reference_images differ from starting with a source image?
reference_images guide the visual style of a text-to-video generation. They influence the look but do not become the subject.
A source image for image-to-video becomes the first frame of the generated video.
What's the best way to test the polling loop without spending credits?
Use Apidog Smart Mock to mock both endpoints:
POST /v1/videos/generations
GET /v1/videos/{request_id}
Define mock responses for:
processingdonefailed
Then your frontend and polling code can run without calling the real API.


Top comments (0)