TL;DR
The Grok image-to-video API uses the grok-imagine-video model to animate a static image into an MP4 clip. Submit a POST request to https://api.x.ai/v1/videos/generations with an image URL, prompt, and optional output settings. The API returns a request_id, then you poll GET /v1/videos/{request_id} until status becomes "done". Duration ranges from 1 to 15 seconds. Pricing starts at $0.05 per second for 480p output.
Introduction
On January 28, 2026, xAI launched the grok-imagine-video model for public API access. Within that first month, the model generated 1.2 billion videos and ranked number one on the Artificial Analysis text-to-video leaderboard.
One of its core use cases is image-to-video generation: provide a source image and a motion prompt, then receive a short animated video clip.
The important implementation detail is that generation is asynchronous. Your integration is not complete when the initial POST succeeds. It is complete when your app can:
- submit the generation job
- extract the
request_id - poll while the job is
"processing" - handle
"done"and"failed" - read the final video URL
You can test that full flow with Apidog Test Scenarios by chaining the generation request, variable extraction, polling loop, wait step, and final assertions.
What is the Grok image-to-video API?
The Grok image-to-video API is part of xAI's video generation product. It uses the grok-imagine-video model and accepts an image as the first frame of the output video.
Endpoint:
POST https://api.x.ai/v1/videos/generations
Authentication uses a Bearer token:
Authorization: Bearer YOUR_XAI_API_KEY
You get the key from the xAI console.
The same API surface also supports:
- text-to-video generation by omitting the
imageparameter - video extensions
- video edits
How image-to-video generation works
The image parameter defines the first frame of the generated video. The model does not replace that image. It starts from it and predicts motion based on your prompt.
Example:
- Source image: a mountain lake at sunrise
- Prompt:
gentle ripples spread across the water as morning mist drifts - Output: the first frame matches your image, then the water and mist animate over time
This differs from text-to-video, where the model generates the initial scene itself.
Use image-to-video when:
- you already have product photos, landscapes, portraits, or brand assets
- you need control over the first frame
- you want motion grounded in a specific real-world scene
Use text-to-video when:
- you are exploring visual concepts without a reference image
- you want the model to determine the full scene composition
- iteration speed matters more than first-frame precision
The same xAI surface also powers the conversational companion mode — for a non-video angle on Grok, see the walkthrough of Companion Mode with Ani.
Prerequisites
Before you call the API, prepare:
- an xAI account at
console.x.ai - an API key from the xAI console
- Python 3.8+ or Node.js 18+
- a public image URL or a base64-encoded image data URI
Set your API key as an environment variable:
export XAI_API_KEY="your_key_here"
If you want to use the xAI Python SDK:
pip install xai-sdk
For raw HTTP calls, you can use:
- Python:
requests - Node.js: built-in
fetch
Make your first image-to-video request
Using curl
curl -X POST https://api.x.ai/v1/videos/generations \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"prompt": "Gentle waves move across the surface, morning mist rises slowly",
"image": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
},
"duration": 6,
"resolution": "720p",
"aspect_ratio": "16:9"
}'
The API returns a job ID immediately:
{
"request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
The video is not ready yet. You need to poll the status endpoint.
Using Python with raw requests
import os
import requests
api_key = os.environ["XAI_API_KEY"]
payload = {
"model": "grok-imagine-video",
"prompt": "Gentle waves move across the surface, morning mist rises slowly",
"image": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
},
"duration": 6,
"resolution": "720p",
"aspect_ratio": "16:9"
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
response = requests.post(
"https://api.x.ai/v1/videos/generations",
json=payload,
headers=headers
)
response.raise_for_status()
data = response.json()
request_id = data["request_id"]
print(f"Job started: {request_id}")
Using a base64 image
If your source image is local or private, encode it as a data URI:
import base64
with open("my_image.jpg", "rb") as f:
encoded = base64.b64encode(f.read()).decode("utf-8")
payload["image"] = {
"url": f"data:image/jpeg;base64,{encoded}"
}
Use the correct MIME type:
data:image/jpeg;base64,...
data:image/png;base64,...
Poll for the result
Video generation is asynchronous. Poll this endpoint with the request_id:
GET https://api.x.ai/v1/videos/{request_id}
Status values:
| Status | Meaning |
|---|---|
"processing" |
Video is still rendering |
"done" |
Video is ready |
"failed" |
Generation failed |
A completed response looks like this:
{
"status": "done",
"video": {
"url": "https://vidgen.x.ai/....mp4",
"duration": 6
},
"progress": 100
}
Full Python polling loop
import time
import requests
def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
url = f"https://api.x.ai/v1/videos/{request_id}"
headers = {"Authorization": f"Bearer {api_key}"}
while True:
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
status = data.get("status")
print(f"Status: {status} | Progress: {data.get('progress', 0)}%")
if status == "done":
return data["video"]
if status == "failed":
raise RuntimeError(f"Video generation failed for {request_id}")
time.sleep(interval)
video = poll_video(request_id, api_key)
print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")
Use a polling interval of 5 seconds or more. The API limit is 60 requests per minute, or 1 request per second. If you poll many jobs at once, stagger them.
Use the xAI Python SDK
The xai-sdk library can submit the job and handle polling for you:
from xai_sdk import Client
import os
client = Client(api_key=os.environ["XAI_API_KEY"])
video = client.video.generate(
model="grok-imagine-video",
prompt="Gentle waves move across the surface, morning mist rises slowly",
image={"url": "https://example.com/landscape.jpg"},
duration=6,
resolution="720p",
aspect_ratio="16:9"
)
print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")
Use the SDK when you want simple application code.
Use raw HTTP when you need custom control over:
- polling intervals
- retries
- logging
- timeout behavior
- integration tests
Configure duration, resolution, and aspect ratio
The request body controls the video output format.
Duration
duration accepts integers from 1 to 15 seconds. The default is 6.
{
"duration": 10
}
Longer videos cost more because output is billed per second.
Resolution
Available values:
| Value | Description |
|---|---|
"480p" |
Default. Lower cost and faster generation. |
"720p" |
Higher quality. Costs $0.07/sec vs $0.05/sec. |
Example:
{
"resolution": "720p"
}
Aspect ratio
aspect_ratio controls output dimensions.
| Value | Use case |
|---|---|
"16:9" |
Widescreen landscape |
"9:16" |
Vertical mobile/social video |
"1:1" |
Square social content |
"4:3" |
Classic photography or presentation format |
"3:4" |
Portrait photography |
"3:2" |
Standard photography crop |
"2:3" |
Tall portrait format |
Example:
{
"aspect_ratio": "16:9"
}
When you provide an image, the aspect ratio defaults to the source image's dimensions. Set aspect_ratio explicitly if you want a different crop.
Use reference images for style guidance
image and reference_images serve different purposes.
image is the first frame of the output video.
reference_images is an array of up to 7 images used for style, content, or visual context. These images are not used as frames in the video.
Example:
{
"model": "grok-imagine-video",
"prompt": "A product rotating slowly on a clean white surface",
"image": {
"url": "https://example.com/product-shot.jpg"
},
"reference_images": [
{
"url": "https://example.com/brand-style-reference-1.jpg"
},
{
"url": "https://example.com/lighting-reference.jpg"
}
],
"duration": 6,
"resolution": "720p"
}
In this request:
-
product-shot.jpgbecomes the first frame - the reference images guide lighting and visual treatment
You can also provide reference_images without image. In that case, the model generates a text-to-video result while using the references for guidance.
Extend and edit videos
The API supports additional operations after initial generation.
Extend a video
Use POST /v1/videos/extensions to generate more footage from where an existing video ends.
curl -X POST https://api.x.ai/v1/videos/extensions \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"video_id": "your_original_request_id",
"prompt": "The mist continues to lift as sunlight breaks through",
"duration": 5
}'
The response follows the same async pattern. Poll:
GET /v1/videos/{request_id}
Edit a video
Use POST /v1/videos/edits to apply prompt-guided modifications to an existing video.
curl -X POST https://api.x.ai/v1/videos/edits \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"video_id": "your_original_request_id",
"prompt": "Change the sky to a dramatic sunset with deep orange tones"
}'
Edits are also asynchronous and use the same polling flow.
Pricing breakdown
The xAI video API charges for:
- input image processing
- output video duration
| Component | Cost |
|---|---|
| Input image | $0.002 per image |
| Output at 480p | $0.05 per second |
| Output at 720p | $0.07 per second |
Example: 10-second video at 720p
| Item | Cost |
|---|---|
| Input image | $0.002 |
| Output | 10 × $0.07 = $0.70 |
| Total | $0.702 |
Example: 6-second video at 480p
| Item | Cost |
|---|---|
| Input image | $0.002 |
| Output | 6 × $0.05 = $0.30 |
| Total | $0.302 |
The input image charge applies each time you submit a generation request, even if you reuse the same image URL.
Text-to-video requests omit the $0.002 input image charge but still use per-second output pricing.
Test the async flow with Apidog
A one-shot request test is not enough for this API. You need to validate the entire lifecycle:
-
POST /v1/videos/generationsreturns arequest_id -
GET /v1/videos/{request_id}handles"processing" - polling stops when
status == "done" - the final response contains a non-empty
video.url
Apidog Test Scenarios can automate this sequence.
Step 1: Create a test scenario
In Apidog, open the Tests module and create a new scenario.
Name it:
Grok image-to-video async flow
Step 2: Add the generation request
Add a custom POST request step.
URL:
https://api.x.ai/v1/videos/generations
Header:
Authorization: Bearer {{xai_api_key}}
Body:
{
"model": "grok-imagine-video",
"prompt": "Gentle mist rises from the water as light filters through the trees",
"image": {
"url": "https://example.com/your-test-image.jpg"
},
"duration": 6,
"resolution": "480p"
}
Step 3: Extract request_id
After the POST step, add an Extract Variable processor.
Configure it as:
| Field | Value |
|---|---|
| Variable name | video_request_id |
| Source | Response body |
| Extraction method | JSONPath |
| JSONPath | $.request_id |
Apidog stores the value as:
{{video_request_id}}
Step 4: Build the polling loop
Add a For loop processor.
Inside the loop, add a GET request:
https://api.x.ai/v1/videos/{{video_request_id}}
Header:
Authorization: Bearer {{xai_api_key}}
Add an Extract Variable processor inside the loop:
| Field | Value |
|---|---|
| Variable name | video_status |
| JSONPath | $.status |
Add a Wait processor after status extraction:
5000ms
Set the loop break condition:
{{video_status}} == "done"
Step 5: Assert the final video URL
After the loop, add one final GET request to:
https://api.x.ai/v1/videos/{{video_request_id}}
Add an assertion:
| Field | Value |
|---|---|
| Field | $.video.url |
| Condition | Is not empty |
This confirms the test only passes when the generated video URL is available.
Run the scenario
Click Run in the test scenario view.
Apidog will:
- submit the generation request
- extract
request_id - poll until the status is
"done" - assert that
video.urlexists - show step timing and results in the report
You can also run the scenario from CI/CD with the Apidog CLI:
apidog run --scenario grok-video-async-flow --env production
For deeper async API testing patterns, including complex polling and CI/CD workflows, see the dedicated guide.
Common errors and fixes
401 Unauthorized
Your API key is missing or invalid.
Check the header format:
Authorization: Bearer YOUR_XAI_API_KEY
Also confirm the key is active in the xAI console.
422 Unprocessable Entity
The request body is malformed.
Common causes:
- missing
model - empty
prompt - inaccessible
image.url - invalid
duration,resolution, oraspect_ratio
Validate the image URL in a browser before using it.
Image URL not accessible
xAI servers must be able to fetch the image at generation time.
These sources can fail:
- private URLs
localhost- URLs behind authentication
- expired signed URLs
Use a public CDN URL or a base64 data URI.
Status stays at "processing"
Generation can take from 30 seconds to several minutes depending on duration and resolution.
If a job stays "processing" beyond 10 minutes, it may have stalled. Submit a new request.
The xAI API does not currently expose a separate timeout signal apart from "failed".
429 Rate limit errors
The API allows:
- 60 requests per minute
- 1 request per second
If you poll multiple jobs concurrently:
- use a polling interval of at least 5 seconds
- stagger jobs
- add backoff for retries
Base64 upload rejected
Make sure the data URI includes the correct MIME type prefix:
data:image/jpeg;base64,...
data:image/png;base64,...
Aspect ratio mismatch
If aspect_ratio differs significantly from your source image, the model may crop or letterbox.
For best results, match aspect_ratio to the source image.
Conclusion
The Grok image-to-video API turns a static image into a short animated clip with an async job model:
- submit an image and prompt
- receive a
request_id - poll until
status == "done" - read the MP4 URL from the final response
The grok-imagine-video model ranked at the top of the Artificial Analysis leaderboard in January 2026, and over a billion videos were generated in that month.
The main integration risk is the polling workflow. Test it explicitly. With Apidog Test Scenarios, you can extract the request_id, loop on the status endpoint, wait between requests, break on "done", and assert that the video URL exists before deployment.
Start building your integration with Apidog free. No credit card required.
FAQ
What model name do I use for the Grok image-to-video API?
Use:
grok-imagine-video
Pass it in the model field of the request body.
What's the difference between image and reference_images?
image sets the first frame of the output video.
reference_images provides style and content guidance, but those images are not used as frames.
You can use both in the same request.
How long does video generation take?
Generation time varies by duration and resolution.
A 6-second 480p video typically takes 1 to 3 minutes. A 15-second 720p video may take 4 to 8 minutes.
Poll every 5 seconds to avoid unnecessary rate-limit pressure.
Can I use a local file as the source image?
Yes. Encode it as a base64 data URI:
data:image/jpeg;base64,{encoded_bytes}
Then pass it as image.url.
What happens if I don't specify aspect_ratio?
When you provide image, the output aspect ratio defaults to the source image's native proportions.
When generating text-to-video without an image, the default is 16:9.
How much does a 10-second 720p video cost?
Input image:
$0.002
Output:
10 × $0.07 = $0.70
Total:
$0.702
What are the rate limits?
The API allows:
- 60 requests per minute
- 1 request per second
This includes both generation requests and polling requests.
Can I extend a video beyond 15 seconds?
Yes. Use:
POST /v1/videos/extensions
Generate an initial clip up to 15 seconds, then extend it with additional generation passes. Each extension uses the same async polling pattern.

Top comments (0)