TL;DR
The Grok image-to-video API, powered by the grok-imagine-video model, animates a static image into a video clip. POST your image URL, prompt, and settings to https://api.x.ai/v1/videos/generations. The API returns a request_id immediately; poll GET /v1/videos/{request_id} until status is "done". Duration: 1–15 seconds. Pricing: from $0.05/sec for 480p output.
Introduction
On January 28, 2026, xAI launched the grok-imagine-video model for public API access. In its first month, it generated 1.2 billion videos and topped the Artificial Analysis text-to-video leaderboard. With image-to-video, you send the API a photo and a descriptive prompt, and it animates your image into an MP4 video.
This async workflow means your integration isn't finished when the POST returns 200—you must handle "processing", "done", and "failed" states robustly.
Apidog's Test Scenarios let you automate this: POST to /v1/videos/generations, extract the request_id, poll until status == "done", then assert the video URL is present.
What is the Grok image-to-video API?
Grok image-to-video is part of xAI's video generation suite. The grok-imagine-video model accepts an image as the first frame and animates it based on your prompt.
Endpoint:
POST https://api.x.ai/v1/videos/generations
Authenticate with a Bearer token:
Authorization: Bearer YOUR_XAI_API_KEY
Get your API key from the xAI console. This API also supports text-to-video (omit the image parameter), video extensions, and edits.
How the image-to-video process works
Set the image parameter in your POST body to define the first frame of your video. The model starts from your image and predicts natural motion according to your prompt.
Example: upload a mountain lake photo and prompt "gentle ripples spread across the water as morning mist drifts." The video starts exactly with your photo, then animates the scene.
Use image-to-video when:
- You have product photos, landscapes, or portraits you want animated.
- Brand assets require a consistent first frame.
- Motion should be grounded in a specific scene.
Use text-to-video when:
- You don’t have a source image or just want to brainstorm.
- Scene composition isn’t predetermined.
- Fast iteration matters more than first-frame precision.
Prerequisites
Before your first call:
- xAI account: console.x.ai
- API key: from the xAI console (store in an environment variable)
- Python 3.8+ or Node.js 18+ (examples below)
- Public image URL or base64-encoded image (data URI)
Set your API key:
export XAI_API_KEY="your_key_here"
Install the xAI Python SDK if you want higher-level access:
pip install xai-sdk
For raw HTTP, you only need requests (Python) or fetch (Node.js).
Making your first image-to-video request
Using curl
curl -X POST https://api.x.ai/v1/videos/generations \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"prompt": "Gentle waves move across the surface, morning mist rises slowly",
"image": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
},
"duration": 6,
"resolution": "720p",
"aspect_ratio": "16:9"
}'
Response:
{
"request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
The video is generated asynchronously—poll to check status.
Using Python (raw requests)
import os
import requests
api_key = os.environ["XAI_API_KEY"]
payload = {
"model": "grok-imagine-video",
"prompt": "Gentle waves move across the surface, morning mist rises slowly",
"image": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
},
"duration": 6,
"resolution": "720p",
"aspect_ratio": "16:9"
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
response = requests.post(
"https://api.x.ai/v1/videos/generations",
json=payload,
headers=headers
)
data = response.json()
request_id = data["request_id"]
print(f"Job started: {request_id}")
Using a base64 image
Encode a local image as a data URI:
import base64
with open("my_image.jpg", "rb") as f:
encoded = base64.b64encode(f.read()).decode("utf-8")
payload["image"] = {
"url": f"data:image/jpeg;base64,{encoded}"
}
Polling for the result
After submitting your request, poll the status endpoint:
GET https://api.x.ai/v1/videos/{request_id}
Status values:
| Status | Meaning |
|---|---|
"processing" |
Video is still rendering |
"done" |
Video ready; URL in response |
"failed" |
Something went wrong |
Completed response:
{
"status": "done",
"video": {
"url": "https://vidgen.x.ai/....mp4",
"duration": 6
},
"progress": 100
}
Full Python polling loop
import time
def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
url = f"https://api.x.ai/v1/videos/{request_id}"
headers = {"Authorization": f"Bearer {api_key}"}
while True:
response = requests.get(url, headers=headers)
data = response.json()
status = data.get("status")
print(f"Status: {status} | Progress: {data.get('progress', 0)}%")
if status == "done":
return data["video"]
elif status == "failed":
raise RuntimeError(f"Video generation failed for {request_id}")
time.sleep(interval)
# Usage
video = poll_video(request_id, api_key)
print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")
Tip: Keep intervals ≥5 seconds to avoid API rate limits (60 requests/minute).
Using the xAI Python SDK
The xai-sdk library abstracts polling:
from xai_sdk import Client
import os
client = Client(api_key=os.environ["XAI_API_KEY"])
video = client.video.generate(
model="grok-imagine-video",
prompt="Gentle waves move across the surface, morning mist rises slowly",
image={"url": "https://example.com/landscape.jpg"},
duration=6,
resolution="720p",
aspect_ratio="16:9"
)
print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")
Use the SDK for simple blocking calls; use raw HTTP for custom polling or logging.
Controlling resolution, duration, and aspect ratio
Grok's API gives you flexibility:
Duration
Integers 1–15 seconds; default is 6.
"duration": 10
Resolution
| Value | Description |
|---|---|
"480p" |
Default, lower cost, faster |
"720p" |
Higher quality; $0.07/sec |
"resolution": "720p"
Aspect ratio
| Value | Use case |
|---|---|
"16:9" |
Default, landscape |
"9:16" |
Vertical, stories/reels |
"1:1" |
Square, social |
"4:3" |
Presentations |
"3:4" |
Portrait |
"3:2" |
Photography crop |
"2:3" |
Tall portrait |
Defaults to match your source image unless set explicitly.
Using reference images for style guidance
-
image: source photograph (first frame) -
reference_images: up to 7 images to guide style (not used as frames)
Example:
{
"model": "grok-imagine-video",
"prompt": "A product rotating slowly on a clean white surface",
"image": {
"url": "https://example.com/product-shot.jpg"
},
"reference_images": [
{"url": "https://example.com/brand-style-reference-1.jpg"},
{"url": "https://example.com/lighting-reference.jpg"}
],
"duration": 6,
"resolution": "720p"
}
Reference images influence style/appearance, not the actual video frames.
Extending and editing videos
You can go beyond initial generation:
Extending a video
POST /v1/videos/extensions lets you add seconds to an existing clip (max 15 sec per call).
curl -X POST https://api.x.ai/v1/videos/extensions \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"video_id": "your_original_request_id",
"prompt": "The mist continues to lift as sunlight breaks through",
"duration": 5
}'
Poll the same status endpoint for the extended clip.
Editing a video
POST /v1/videos/edits applies prompt-based modifications:
curl -X POST https://api.x.ai/v1/videos/edits \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"video_id": "your_original_request_id",
"prompt": "Change the sky to a dramatic sunset with deep orange tones"
}'
Extensions and edits are both async.
Pricing breakdown: what a 10-second video costs
| Component | Cost |
|---|---|
| Input image | $0.002 per image |
| Output 480p | $0.05 per second |
| Output 720p | $0.07 per second |
10s video @ 720p:
- Input image: $0.002
- Output: 10 × $0.07 = $0.70
- Total: $0.702
6s video @ 480p (default):
- Input image: $0.002
- Output: 6 × $0.05 = $0.30
- Total: $0.302
The input charge applies to every generation request, even with the same image.
Text-to-video (no image) skips the $0.002 input fee.
How to test your Grok video API integration with Apidog
The async workflow requires testing for:
- Generation request returns a
request_id - Polling handles
"processing"correctly - Final response has
status == "done"and a video URL
Apidog's Test Scenarios automate this:
Step 1: Create a new Test Scenario
In Apidog, go to the Tests module and click +. Name: "Grok image-to-video async flow".
Step 2: Add the generation request
- URL:
https://api.x.ai/v1/videos/generations - Method: POST
- Header:
Authorization: Bearer {{xai_api_key}} - Body:
{
"model": "grok-imagine-video",
"prompt": "Gentle mist rises from the water as light filters through the trees",
"image": {
"url": "https://example.com/your-test-image.jpg"
},
"duration": 6,
"resolution": "480p"
}
Step 3: Extract the request_id
Add an Extract Variable processor:
- Variable:
video_request_id - Source: Response body
- Extraction: JSONPath
- JSONPath:
$.request_id
Step 4: Build the polling loop
Add a For loop:
- Inside, add GET:
- URL:
https://api.x.ai/v1/videos/{{video_request_id}} - Method: GET
- Header:
Authorization: Bearer {{xai_api_key}}
- URL:
- Extract
video_statusvia JSONPath$.status - Add Wait (5000ms) to avoid rate limits
- Loop break:
{{video_status}} == "done"
Step 5: Assert the video URL
After the loop, add GET to the same endpoint. Add Assertion:
- Field:
$.video.url - Condition: Is not empty
Run the scenario:
Click Run. Apidog executes POST, extracts request_id, polls until done, asserts video URL. The test report shows each step's results.
Integrate into CI/CD with Apidog CLI:
apidog run --scenario grok-video-async-flow --env production
Common errors and fixes
401 Unauthorized
Check API key and Authorization header.
422 Unprocessable Entity
Malformed body: check model, prompt, and accessible image.url.
Image URL not accessible
xAI must fetch the URL. Use a public link or base64 data URI.
Status stuck at "processing"
If stuck >10min, resubmit. Generation can take 30s–several minutes.
429 Rate limit
Max 60 requests/min, 1/sec. Add delays between polls.
Base64 upload rejected
Add correct MIME prefix (e.g., data:image/jpeg;base64,).
Aspect ratio mismatch
Set aspect ratio to match your source image for best results.
Conclusion
The Grok image-to-video API lets you animate static images with a simple async workflow: POST image + prompt, get request_id, poll until "done", download the MP4. The model is proven and scales to billions of videos.
Async patterns are error-prone—use Apidog Test Scenarios to automate extraction, polling, and assertions. This catches integration issues before production.
Start building your integration with Apidog free. No credit card required.
FAQ
What model name do I use for the Grok image-to-video API?
Use grok-imagine-video as the model field.
What's the difference between image and reference_images?
image: first frame (animated).
reference_images: guide style/content only.
How long does video generation take?
6s @ 480p: 1–3 min.
15s @ 720p: 4–8 min.
Poll every 5s.
Can I use a local file as the source image?
Yes. Encode as base64 data URI and pass as image.url.
What if I don't specify aspect_ratio?
Defaults to your image's proportions. Text-to-video defaults to 16:9.
How much does a 10s 720p video cost?
$0.002 image + 10 × $0.07 = ~$0.702.
What are the rate limits?
60 requests/min, 1/sec (POST+GET combined).
Can I extend a video beyond 15 seconds?
Yes, with POST /v1/videos/extensions. Each call is up to 15s and async.

Top comments (0)