Preecha

Posted on Jun 10

How to use the Grok image to video API (step-by-step guide)

TL;DR

The Grok image-to-video API uses the grok-imagine-video model to animate a static image into an MP4 clip. Submit a POST request to https://api.x.ai/v1/videos/generations with an image URL, prompt, and optional output settings. The API returns a request_id, then you poll GET /v1/videos/{request_id} until status becomes "done". Duration ranges from 1 to 15 seconds. Pricing starts at $0.05 per second for 480p output.

Try Apidog today

Introduction

On January 28, 2026, xAI launched the grok-imagine-video model for public API access. Within that first month, the model generated 1.2 billion videos and ranked number one on the Artificial Analysis text-to-video leaderboard.

One of its core use cases is image-to-video generation: provide a source image and a motion prompt, then receive a short animated video clip.

The important implementation detail is that generation is asynchronous. Your integration is not complete when the initial POST succeeds. It is complete when your app can:

submit the generation job
extract the request_id
poll while the job is "processing"
handle "done" and "failed"
read the final video URL

You can test that full flow with Apidog Test Scenarios by chaining the generation request, variable extraction, polling loop, wait step, and final assertions.

What is the Grok image-to-video API?

The Grok image-to-video API is part of xAI's video generation product. It uses the grok-imagine-video model and accepts an image as the first frame of the output video.

Endpoint:

POST https://api.x.ai/v1/videos/generations

Authentication uses a Bearer token:

Authorization: Bearer YOUR_XAI_API_KEY

You get the key from the xAI console.

The same API surface also supports:

text-to-video generation by omitting the image parameter
video extensions
video edits

How image-to-video generation works

The image parameter defines the first frame of the generated video. The model does not replace that image. It starts from it and predicts motion based on your prompt.

Example:

Source image: a mountain lake at sunrise
Prompt: gentle ripples spread across the water as morning mist drifts
Output: the first frame matches your image, then the water and mist animate over time

This differs from text-to-video, where the model generates the initial scene itself.

Use image-to-video when:

you already have product photos, landscapes, portraits, or brand assets
you need control over the first frame
you want motion grounded in a specific real-world scene

Use text-to-video when:

you are exploring visual concepts without a reference image
you want the model to determine the full scene composition
iteration speed matters more than first-frame precision

The same xAI surface also powers the conversational companion mode — for a non-video angle on Grok, see the walkthrough of Companion Mode with Ani.

Prerequisites

Before you call the API, prepare:

an xAI account at console.x.ai
an API key from the xAI console
Python 3.8+ or Node.js 18+
a public image URL or a base64-encoded image data URI

Set your API key as an environment variable:

export XAI_API_KEY="your_key_here"

If you want to use the xAI Python SDK:

pip install xai-sdk

For raw HTTP calls, you can use:

Python: requests
Node.js: built-in fetch

Make your first image-to-video request

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
      "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'

The API returns a job ID immediately:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}

The video is not ready yet. You need to poll the status endpoint.

Using Python with raw requests

import os
import requests

api_key = os.environ["XAI_API_KEY"]

payload = {
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.x.ai/v1/videos/generations",
    json=payload,
    headers=headers
)

response.raise_for_status()

data = response.json()
request_id = data["request_id"]

print(f"Job started: {request_id}")

Using a base64 image

If your source image is local or private, encode it as a data URI:

import base64

with open("my_image.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode("utf-8")

payload["image"] = {
    "url": f"data:image/jpeg;base64,{encoded}"
}

Use the correct MIME type:

data:image/jpeg;base64,...
data:image/png;base64,...

Poll for the result

Video generation is asynchronous. Poll this endpoint with the request_id:

GET https://api.x.ai/v1/videos/{request_id}

Status values:

Status	Meaning
`"processing"`	Video is still rendering
`"done"`	Video is ready
`"failed"`	Generation failed

A completed response looks like this:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 6
  },
  "progress": 100
}

Full Python polling loop

import time
import requests

def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
    url = f"https://api.x.ai/v1/videos/{request_id}"
    headers = {"Authorization": f"Bearer {api_key}"}

    while True:
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        data = response.json()
        status = data.get("status")

        print(f"Status: {status} | Progress: {data.get('progress', 0)}%")

        if status == "done":
            return data["video"]

        if status == "failed":
            raise RuntimeError(f"Video generation failed for {request_id}")

        time.sleep(interval)

video = poll_video(request_id, api_key)

print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")

Use a polling interval of 5 seconds or more. The API limit is 60 requests per minute, or 1 request per second. If you poll many jobs at once, stagger them.

Use the xAI Python SDK

The xai-sdk library can submit the job and handle polling for you:

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

video = client.video.generate(
    model="grok-imagine-video",
    prompt="Gentle waves move across the surface, morning mist rises slowly",
    image={"url": "https://example.com/landscape.jpg"},
    duration=6,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")

Use the SDK when you want simple application code.

Use raw HTTP when you need custom control over:

polling intervals
retries
logging
timeout behavior
integration tests

Configure duration, resolution, and aspect ratio

The request body controls the video output format.

Duration

duration accepts integers from 1 to 15 seconds. The default is 6.

{
  "duration": 10
}

Longer videos cost more because output is billed per second.

Resolution

Available values:

Value	Description
`"480p"`	Default. Lower cost and faster generation.
`"720p"`	Higher quality. Costs $0.07/sec vs $0.05/sec.

Example:

{
  "resolution": "720p"
}

Aspect ratio

aspect_ratio controls output dimensions.

Value	Use case
`"16:9"`	Widescreen landscape
`"9:16"`	Vertical mobile/social video
`"1:1"`	Square social content
`"4:3"`	Classic photography or presentation format
`"3:4"`	Portrait photography
`"3:2"`	Standard photography crop
`"2:3"`	Tall portrait format

Example:

{
  "aspect_ratio": "16:9"
}

When you provide an image, the aspect ratio defaults to the source image's dimensions. Set aspect_ratio explicitly if you want a different crop.

Use reference images for style guidance

image and reference_images serve different purposes.

image is the first frame of the output video.

reference_images is an array of up to 7 images used for style, content, or visual context. These images are not used as frames in the video.

Example:

{
  "model": "grok-imagine-video",
  "prompt": "A product rotating slowly on a clean white surface",
  "image": {
    "url": "https://example.com/product-shot.jpg"
  },
  "reference_images": [
    {
      "url": "https://example.com/brand-style-reference-1.jpg"
    },
    {
      "url": "https://example.com/lighting-reference.jpg"
    }
  ],
  "duration": 6,
  "resolution": "720p"
}

In this request:

product-shot.jpg becomes the first frame
the reference images guide lighting and visual treatment

You can also provide reference_images without image. In that case, the model generates a text-to-video result while using the references for guidance.

Extend and edit videos

The API supports additional operations after initial generation.

Extend a video

Use POST /v1/videos/extensions to generate more footage from where an existing video ends.

curl -X POST https://api.x.ai/v1/videos/extensions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "The mist continues to lift as sunlight breaks through",
    "duration": 5
  }'

The response follows the same async pattern. Poll:

GET /v1/videos/{request_id}

Edit a video

Use POST /v1/videos/edits to apply prompt-guided modifications to an existing video.

curl -X POST https://api.x.ai/v1/videos/edits \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "Change the sky to a dramatic sunset with deep orange tones"
  }'

Edits are also asynchronous and use the same polling flow.

Pricing breakdown

The xAI video API charges for:

input image processing
output video duration

Component	Cost
Input image	$0.002 per image
Output at 480p	$0.05 per second
Output at 720p	$0.07 per second

Example: 10-second video at 720p

Item	Cost
Input image	$0.002
Output	10 × $0.07 = $0.70
Total	$0.702

Example: 6-second video at 480p

Item	Cost
Input image	$0.002
Output	6 × $0.05 = $0.30
Total	$0.302

The input image charge applies each time you submit a generation request, even if you reuse the same image URL.

Text-to-video requests omit the $0.002 input image charge but still use per-second output pricing.

Test the async flow with Apidog

A one-shot request test is not enough for this API. You need to validate the entire lifecycle:

POST /v1/videos/generations returns a request_id
GET /v1/videos/{request_id} handles "processing"
polling stops when status == "done"
the final response contains a non-empty video.url

Apidog Test Scenarios can automate this sequence.

Step 1: Create a test scenario

In Apidog, open the Tests module and create a new scenario.

Name it:

Grok image-to-video async flow

Step 2: Add the generation request

Add a custom POST request step.

URL:

https://api.x.ai/v1/videos/generations

Header:

Authorization: Bearer {{xai_api_key}}

Body:

{
  "model": "grok-imagine-video",
  "prompt": "Gentle mist rises from the water as light filters through the trees",
  "image": {
    "url": "https://example.com/your-test-image.jpg"
  },
  "duration": 6,
  "resolution": "480p"
}

Step 3: Extract `request_id`

After the POST step, add an Extract Variable processor.

Configure it as:

Field	Value
Variable name	`video_request_id`
Source	Response body
Extraction method	JSONPath
JSONPath	`$.request_id`

Apidog stores the value as:

{{video_request_id}}

Step 4: Build the polling loop

Add a For loop processor.

Inside the loop, add a GET request:

https://api.x.ai/v1/videos/{{video_request_id}}

Header:

Authorization: Bearer {{xai_api_key}}

Add an Extract Variable processor inside the loop:

Field	Value
Variable name	`video_status`
JSONPath	`$.status`

Add a Wait processor after status extraction:

5000ms

Set the loop break condition:

{{video_status}} == "done"

Step 5: Assert the final video URL

After the loop, add one final GET request to:

https://api.x.ai/v1/videos/{{video_request_id}}

Add an assertion:

Field	Value
Field	`$.video.url`
Condition	Is not empty

This confirms the test only passes when the generated video URL is available.

Run the scenario

Click Run in the test scenario view.

Apidog will:

submit the generation request
extract request_id
poll until the status is "done"
assert that video.url exists
show step timing and results in the report

You can also run the scenario from CI/CD with the Apidog CLI:

apidog run --scenario grok-video-async-flow --env production

For deeper async API testing patterns, including complex polling and CI/CD workflows, see the dedicated guide.

Common errors and fixes

401 Unauthorized

Your API key is missing or invalid.

Check the header format:

Authorization: Bearer YOUR_XAI_API_KEY

Also confirm the key is active in the xAI console.

422 Unprocessable Entity

The request body is malformed.

Common causes:

missing model
empty prompt
inaccessible image.url
invalid duration, resolution, or aspect_ratio

Validate the image URL in a browser before using it.

Image URL not accessible

xAI servers must be able to fetch the image at generation time.

These sources can fail:

private URLs
localhost
URLs behind authentication
expired signed URLs

Use a public CDN URL or a base64 data URI.

Status stays at `"processing"`

Generation can take from 30 seconds to several minutes depending on duration and resolution.

If a job stays "processing" beyond 10 minutes, it may have stalled. Submit a new request.

The xAI API does not currently expose a separate timeout signal apart from "failed".

429 Rate limit errors

The API allows:

60 requests per minute
1 request per second

If you poll multiple jobs concurrently:

use a polling interval of at least 5 seconds
stagger jobs
add backoff for retries

Base64 upload rejected

Make sure the data URI includes the correct MIME type prefix:

data:image/jpeg;base64,...
data:image/png;base64,...

Aspect ratio mismatch

If aspect_ratio differs significantly from your source image, the model may crop or letterbox.

For best results, match aspect_ratio to the source image.

Conclusion

The Grok image-to-video API turns a static image into a short animated clip with an async job model:

submit an image and prompt
receive a request_id
poll until status == "done"
read the MP4 URL from the final response

The grok-imagine-video model ranked at the top of the Artificial Analysis leaderboard in January 2026, and over a billion videos were generated in that month.

The main integration risk is the polling workflow. Test it explicitly. With Apidog Test Scenarios, you can extract the request_id, loop on the status endpoint, wait between requests, break on "done", and assert that the video URL exists before deployment.

Start building your integration with Apidog free. No credit card required.

FAQ

What model name do I use for the Grok image-to-video API?

Use:

grok-imagine-video

Pass it in the model field of the request body.

What's the difference between `image` and `reference_images`?

image sets the first frame of the output video.

reference_images provides style and content guidance, but those images are not used as frames.

You can use both in the same request.

How long does video generation take?

Generation time varies by duration and resolution.

A 6-second 480p video typically takes 1 to 3 minutes. A 15-second 720p video may take 4 to 8 minutes.

Poll every 5 seconds to avoid unnecessary rate-limit pressure.

Can I use a local file as the source image?

Yes. Encode it as a base64 data URI:

data:image/jpeg;base64,{encoded_bytes}

Then pass it as image.url.

What happens if I don't specify `aspect_ratio`?

When you provide image, the output aspect ratio defaults to the source image's native proportions.

When generating text-to-video without an image, the default is 16:9.

How much does a 10-second 720p video cost?

Input image:

$0.002

Output:

10 × $0.07 = $0.70

Total:

$0.702

What are the rate limits?

The API allows:

60 requests per minute
1 request per second

This includes both generation requests and polling requests.

Can I extend a video beyond 15 seconds?

Yes. Use:

POST /v1/videos/extensions

Generate an initial clip up to 15 seconds, then extend it with additional generation passes. Each extension uses the same async polling pattern.

TL;DR

Introduction

What is the Grok image-to-video API?

How image-to-video generation works

Prerequisites

Make your first image-to-video request

Using curl

Using Python with raw requests

Using a base64 image

Poll for the result

Full Python polling loop

Use the xAI Python SDK

Configure duration, resolution, and aspect ratio

Duration

Resolution

Aspect ratio

Use reference images for style guidance

Extend and edit videos

Extend a video

Edit a video

Pricing breakdown

Example: 10-second video at 720p

Example: 6-second video at 480p

Test the async flow with Apidog

Step 1: Create a test scenario

Step 2: Add the generation request

Step 3: Extract request_id

Step 4: Build the polling loop

Step 5: Assert the final video URL

Run the scenario

Common errors and fixes

401 Unauthorized

422 Unprocessable Entity

Image URL not accessible

Status stays at "processing"

429 Rate limit errors

Base64 upload rejected

Aspect ratio mismatch

Conclusion

FAQ

What model name do I use for the Grok image-to-video API?

What's the difference between image and reference_images?

How long does video generation take?

Can I use a local file as the source image?

What happens if I don't specify aspect_ratio?

How much does a 10-second 720p video cost?

What are the rate limits?

Can I extend a video beyond 15 seconds?

Step 3: Extract `request_id`

Status stays at `"processing"`

What's the difference between `image` and `reference_images`?

What happens if I don't specify `aspect_ratio`?