DEV Community

Cover image for How to use the Grok image to video API (step-by-step guide)
Preecha
Preecha

Posted on

How to use the Grok image to video API (step-by-step guide)

TL;DR

The Grok image-to-video API uses the grok-imagine-video model to animate a static image into an MP4 clip. Submit a POST request to https://api.x.ai/v1/videos/generations with an image URL, prompt, and optional output settings. The API returns a request_id, then you poll GET /v1/videos/{request_id} until status becomes "done". Duration ranges from 1 to 15 seconds. Pricing starts at $0.05 per second for 480p output.

Try Apidog today

Introduction

On January 28, 2026, xAI launched the grok-imagine-video model for public API access. Within that first month, the model generated 1.2 billion videos and ranked number one on the Artificial Analysis text-to-video leaderboard.

One of its core use cases is image-to-video generation: provide a source image and a motion prompt, then receive a short animated video clip.

The important implementation detail is that generation is asynchronous. Your integration is not complete when the initial POST succeeds. It is complete when your app can:

  • submit the generation job
  • extract the request_id
  • poll while the job is "processing"
  • handle "done" and "failed"
  • read the final video URL

You can test that full flow with Apidog Test Scenarios by chaining the generation request, variable extraction, polling loop, wait step, and final assertions.

What is the Grok image-to-video API?

The Grok image-to-video API is part of xAI's video generation product. It uses the grok-imagine-video model and accepts an image as the first frame of the output video.

Endpoint:

POST https://api.x.ai/v1/videos/generations
Enter fullscreen mode Exit fullscreen mode

Authentication uses a Bearer token:

Authorization: Bearer YOUR_XAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

You get the key from the xAI console.

The same API surface also supports:

  • text-to-video generation by omitting the image parameter
  • video extensions
  • video edits

How image-to-video generation works

The image parameter defines the first frame of the generated video. The model does not replace that image. It starts from it and predicts motion based on your prompt.

Example:

  • Source image: a mountain lake at sunrise
  • Prompt: gentle ripples spread across the water as morning mist drifts
  • Output: the first frame matches your image, then the water and mist animate over time

This differs from text-to-video, where the model generates the initial scene itself.

Use image-to-video when:

  • you already have product photos, landscapes, portraits, or brand assets
  • you need control over the first frame
  • you want motion grounded in a specific real-world scene

Use text-to-video when:

  • you are exploring visual concepts without a reference image
  • you want the model to determine the full scene composition
  • iteration speed matters more than first-frame precision

The same xAI surface also powers the conversational companion mode — for a non-video angle on Grok, see the walkthrough of Companion Mode with Ani.

Prerequisites

Before you call the API, prepare:

  • an xAI account at console.x.ai
  • an API key from the xAI console
  • Python 3.8+ or Node.js 18+
  • a public image URL or a base64-encoded image data URI

Image

Set your API key as an environment variable:

export XAI_API_KEY="your_key_here"
Enter fullscreen mode Exit fullscreen mode

If you want to use the xAI Python SDK:

pip install xai-sdk
Enter fullscreen mode Exit fullscreen mode

For raw HTTP calls, you can use:

  • Python: requests
  • Node.js: built-in fetch

Make your first image-to-video request

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
      "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'
Enter fullscreen mode Exit fullscreen mode

The API returns a job ID immediately:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
Enter fullscreen mode Exit fullscreen mode

The video is not ready yet. You need to poll the status endpoint.

Using Python with raw requests

import os
import requests

api_key = os.environ["XAI_API_KEY"]

payload = {
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.x.ai/v1/videos/generations",
    json=payload,
    headers=headers
)

response.raise_for_status()

data = response.json()
request_id = data["request_id"]

print(f"Job started: {request_id}")
Enter fullscreen mode Exit fullscreen mode

Using a base64 image

If your source image is local or private, encode it as a data URI:

import base64

with open("my_image.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode("utf-8")

payload["image"] = {
    "url": f"data:image/jpeg;base64,{encoded}"
}
Enter fullscreen mode Exit fullscreen mode

Use the correct MIME type:

data:image/jpeg;base64,...
data:image/png;base64,...
Enter fullscreen mode Exit fullscreen mode

Poll for the result

Video generation is asynchronous. Poll this endpoint with the request_id:

GET https://api.x.ai/v1/videos/{request_id}
Enter fullscreen mode Exit fullscreen mode

Status values:

Status Meaning
"processing" Video is still rendering
"done" Video is ready
"failed" Generation failed

A completed response looks like this:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 6
  },
  "progress": 100
}
Enter fullscreen mode Exit fullscreen mode

Full Python polling loop

import time
import requests

def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
    url = f"https://api.x.ai/v1/videos/{request_id}"
    headers = {"Authorization": f"Bearer {api_key}"}

    while True:
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        data = response.json()
        status = data.get("status")

        print(f"Status: {status} | Progress: {data.get('progress', 0)}%")

        if status == "done":
            return data["video"]

        if status == "failed":
            raise RuntimeError(f"Video generation failed for {request_id}")

        time.sleep(interval)

video = poll_video(request_id, api_key)

print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")
Enter fullscreen mode Exit fullscreen mode

Use a polling interval of 5 seconds or more. The API limit is 60 requests per minute, or 1 request per second. If you poll many jobs at once, stagger them.

Use the xAI Python SDK

The xai-sdk library can submit the job and handle polling for you:

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

video = client.video.generate(
    model="grok-imagine-video",
    prompt="Gentle waves move across the surface, morning mist rises slowly",
    image={"url": "https://example.com/landscape.jpg"},
    duration=6,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")
Enter fullscreen mode Exit fullscreen mode

Use the SDK when you want simple application code.

Use raw HTTP when you need custom control over:

  • polling intervals
  • retries
  • logging
  • timeout behavior
  • integration tests

Configure duration, resolution, and aspect ratio

The request body controls the video output format.

Duration

duration accepts integers from 1 to 15 seconds. The default is 6.

{
  "duration": 10
}
Enter fullscreen mode Exit fullscreen mode

Longer videos cost more because output is billed per second.

Resolution

Available values:

Value Description
"480p" Default. Lower cost and faster generation.
"720p" Higher quality. Costs $0.07/sec vs $0.05/sec.

Example:

{
  "resolution": "720p"
}
Enter fullscreen mode Exit fullscreen mode

Aspect ratio

aspect_ratio controls output dimensions.

Value Use case
"16:9" Widescreen landscape
"9:16" Vertical mobile/social video
"1:1" Square social content
"4:3" Classic photography or presentation format
"3:4" Portrait photography
"3:2" Standard photography crop
"2:3" Tall portrait format

Example:

{
  "aspect_ratio": "16:9"
}
Enter fullscreen mode Exit fullscreen mode

When you provide an image, the aspect ratio defaults to the source image's dimensions. Set aspect_ratio explicitly if you want a different crop.

Use reference images for style guidance

image and reference_images serve different purposes.

image is the first frame of the output video.

reference_images is an array of up to 7 images used for style, content, or visual context. These images are not used as frames in the video.

Example:

{
  "model": "grok-imagine-video",
  "prompt": "A product rotating slowly on a clean white surface",
  "image": {
    "url": "https://example.com/product-shot.jpg"
  },
  "reference_images": [
    {
      "url": "https://example.com/brand-style-reference-1.jpg"
    },
    {
      "url": "https://example.com/lighting-reference.jpg"
    }
  ],
  "duration": 6,
  "resolution": "720p"
}
Enter fullscreen mode Exit fullscreen mode

In this request:

  • product-shot.jpg becomes the first frame
  • the reference images guide lighting and visual treatment

You can also provide reference_images without image. In that case, the model generates a text-to-video result while using the references for guidance.

Extend and edit videos

The API supports additional operations after initial generation.

Extend a video

Use POST /v1/videos/extensions to generate more footage from where an existing video ends.

curl -X POST https://api.x.ai/v1/videos/extensions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "The mist continues to lift as sunlight breaks through",
    "duration": 5
  }'
Enter fullscreen mode Exit fullscreen mode

The response follows the same async pattern. Poll:

GET /v1/videos/{request_id}
Enter fullscreen mode Exit fullscreen mode

Edit a video

Use POST /v1/videos/edits to apply prompt-guided modifications to an existing video.

curl -X POST https://api.x.ai/v1/videos/edits \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "Change the sky to a dramatic sunset with deep orange tones"
  }'
Enter fullscreen mode Exit fullscreen mode

Edits are also asynchronous and use the same polling flow.

Pricing breakdown

The xAI video API charges for:

  • input image processing
  • output video duration
Component Cost
Input image $0.002 per image
Output at 480p $0.05 per second
Output at 720p $0.07 per second

Example: 10-second video at 720p

Item Cost
Input image $0.002
Output 10 × $0.07 = $0.70
Total $0.702

Example: 6-second video at 480p

Item Cost
Input image $0.002
Output 6 × $0.05 = $0.30
Total $0.302

The input image charge applies each time you submit a generation request, even if you reuse the same image URL.

Text-to-video requests omit the $0.002 input image charge but still use per-second output pricing.

Test the async flow with Apidog

A one-shot request test is not enough for this API. You need to validate the entire lifecycle:

  1. POST /v1/videos/generations returns a request_id
  2. GET /v1/videos/{request_id} handles "processing"
  3. polling stops when status == "done"
  4. the final response contains a non-empty video.url

Apidog Test Scenarios can automate this sequence.

Step 1: Create a test scenario

In Apidog, open the Tests module and create a new scenario.

Name it:

Grok image-to-video async flow
Enter fullscreen mode Exit fullscreen mode

Step 2: Add the generation request

Add a custom POST request step.

URL:

https://api.x.ai/v1/videos/generations
Enter fullscreen mode Exit fullscreen mode

Header:

Authorization: Bearer {{xai_api_key}}
Enter fullscreen mode Exit fullscreen mode

Body:

{
  "model": "grok-imagine-video",
  "prompt": "Gentle mist rises from the water as light filters through the trees",
  "image": {
    "url": "https://example.com/your-test-image.jpg"
  },
  "duration": 6,
  "resolution": "480p"
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Extract request_id

After the POST step, add an Extract Variable processor.

Configure it as:

Field Value
Variable name video_request_id
Source Response body
Extraction method JSONPath
JSONPath $.request_id

Apidog stores the value as:

{{video_request_id}}
Enter fullscreen mode Exit fullscreen mode

Step 4: Build the polling loop

Add a For loop processor.

Inside the loop, add a GET request:

https://api.x.ai/v1/videos/{{video_request_id}}
Enter fullscreen mode Exit fullscreen mode

Header:

Authorization: Bearer {{xai_api_key}}
Enter fullscreen mode Exit fullscreen mode

Add an Extract Variable processor inside the loop:

Field Value
Variable name video_status
JSONPath $.status

Add a Wait processor after status extraction:

5000ms
Enter fullscreen mode Exit fullscreen mode

Set the loop break condition:

{{video_status}} == "done"
Enter fullscreen mode Exit fullscreen mode

Step 5: Assert the final video URL

After the loop, add one final GET request to:

https://api.x.ai/v1/videos/{{video_request_id}}
Enter fullscreen mode Exit fullscreen mode

Add an assertion:

Field Value
Field $.video.url
Condition Is not empty

This confirms the test only passes when the generated video URL is available.

Run the scenario

Click Run in the test scenario view.

Apidog will:

  1. submit the generation request
  2. extract request_id
  3. poll until the status is "done"
  4. assert that video.url exists
  5. show step timing and results in the report

You can also run the scenario from CI/CD with the Apidog CLI:

apidog run --scenario grok-video-async-flow --env production
Enter fullscreen mode Exit fullscreen mode

For deeper async API testing patterns, including complex polling and CI/CD workflows, see the dedicated guide.

Common errors and fixes

401 Unauthorized

Your API key is missing or invalid.

Check the header format:

Authorization: Bearer YOUR_XAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

Also confirm the key is active in the xAI console.

422 Unprocessable Entity

The request body is malformed.

Common causes:

  • missing model
  • empty prompt
  • inaccessible image.url
  • invalid duration, resolution, or aspect_ratio

Validate the image URL in a browser before using it.

Image URL not accessible

xAI servers must be able to fetch the image at generation time.

These sources can fail:

  • private URLs
  • localhost
  • URLs behind authentication
  • expired signed URLs

Use a public CDN URL or a base64 data URI.

Status stays at "processing"

Generation can take from 30 seconds to several minutes depending on duration and resolution.

If a job stays "processing" beyond 10 minutes, it may have stalled. Submit a new request.

The xAI API does not currently expose a separate timeout signal apart from "failed".

429 Rate limit errors

The API allows:

  • 60 requests per minute
  • 1 request per second

If you poll multiple jobs concurrently:

  • use a polling interval of at least 5 seconds
  • stagger jobs
  • add backoff for retries

Base64 upload rejected

Make sure the data URI includes the correct MIME type prefix:

data:image/jpeg;base64,...
data:image/png;base64,...
Enter fullscreen mode Exit fullscreen mode

Aspect ratio mismatch

If aspect_ratio differs significantly from your source image, the model may crop or letterbox.

For best results, match aspect_ratio to the source image.

Conclusion

The Grok image-to-video API turns a static image into a short animated clip with an async job model:

  1. submit an image and prompt
  2. receive a request_id
  3. poll until status == "done"
  4. read the MP4 URL from the final response

The grok-imagine-video model ranked at the top of the Artificial Analysis leaderboard in January 2026, and over a billion videos were generated in that month.

The main integration risk is the polling workflow. Test it explicitly. With Apidog Test Scenarios, you can extract the request_id, loop on the status endpoint, wait between requests, break on "done", and assert that the video URL exists before deployment.

Start building your integration with Apidog free. No credit card required.

FAQ

What model name do I use for the Grok image-to-video API?

Use:

grok-imagine-video
Enter fullscreen mode Exit fullscreen mode

Pass it in the model field of the request body.

What's the difference between image and reference_images?

image sets the first frame of the output video.

reference_images provides style and content guidance, but those images are not used as frames.

You can use both in the same request.

How long does video generation take?

Generation time varies by duration and resolution.

A 6-second 480p video typically takes 1 to 3 minutes. A 15-second 720p video may take 4 to 8 minutes.

Poll every 5 seconds to avoid unnecessary rate-limit pressure.

Can I use a local file as the source image?

Yes. Encode it as a base64 data URI:

data:image/jpeg;base64,{encoded_bytes}
Enter fullscreen mode Exit fullscreen mode

Then pass it as image.url.

What happens if I don't specify aspect_ratio?

When you provide image, the output aspect ratio defaults to the source image's native proportions.

When generating text-to-video without an image, the default is 16:9.

How much does a 10-second 720p video cost?

Input image:

$0.002
Enter fullscreen mode Exit fullscreen mode

Output:

10 × $0.07 = $0.70
Enter fullscreen mode Exit fullscreen mode

Total:

$0.702
Enter fullscreen mode Exit fullscreen mode

What are the rate limits?

The API allows:

  • 60 requests per minute
  • 1 request per second

This includes both generation requests and polling requests.

Can I extend a video beyond 15 seconds?

Yes. Use:

POST /v1/videos/extensions
Enter fullscreen mode Exit fullscreen mode

Generate an initial clip up to 15 seconds, then extend it with additional generation passes. Each extension uses the same async polling pattern.

Top comments (0)