DEV Community

Cover image for How to use the Grok image to video API (step-by-step guide)
Wanda
Wanda

Posted on • Originally published at apidog.com

How to use the Grok image to video API (step-by-step guide)

TL;DR

The Grok image-to-video API, powered by the grok-imagine-video model, animates a static image into a video clip. POST your image URL, prompt, and settings to https://api.x.ai/v1/videos/generations. The API returns a request_id immediately; poll GET /v1/videos/{request_id} until status is "done". Duration: 1–15 seconds. Pricing: from $0.05/sec for 480p output.

Try Apidog today

Introduction

On January 28, 2026, xAI launched the grok-imagine-video model for public API access. In its first month, it generated 1.2 billion videos and topped the Artificial Analysis text-to-video leaderboard. With image-to-video, you send the API a photo and a descriptive prompt, and it animates your image into an MP4 video.

This async workflow means your integration isn't finished when the POST returns 200—you must handle "processing", "done", and "failed" states robustly.

Apidog's Test Scenarios let you automate this: POST to /v1/videos/generations, extract the request_id, poll until status == "done", then assert the video URL is present.

What is the Grok image-to-video API?

Grok image-to-video is part of xAI's video generation suite. The grok-imagine-video model accepts an image as the first frame and animates it based on your prompt.

Endpoint:

POST https://api.x.ai/v1/videos/generations
Enter fullscreen mode Exit fullscreen mode

Authenticate with a Bearer token:

Authorization: Bearer YOUR_XAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

Get your API key from the xAI console. This API also supports text-to-video (omit the image parameter), video extensions, and edits.

How the image-to-video process works

Set the image parameter in your POST body to define the first frame of your video. The model starts from your image and predicts natural motion according to your prompt.

Example: upload a mountain lake photo and prompt "gentle ripples spread across the water as morning mist drifts." The video starts exactly with your photo, then animates the scene.

Use image-to-video when:

  • You have product photos, landscapes, or portraits you want animated.
  • Brand assets require a consistent first frame.
  • Motion should be grounded in a specific scene.

Use text-to-video when:

  • You don’t have a source image or just want to brainstorm.
  • Scene composition isn’t predetermined.
  • Fast iteration matters more than first-frame precision.

Prerequisites

Before your first call:

  1. xAI account: console.x.ai
  2. API key: from the xAI console (store in an environment variable)
  3. Python 3.8+ or Node.js 18+ (examples below)
  4. Public image URL or base64-encoded image (data URI)

Grok image-to-video UI

Set your API key:

export XAI_API_KEY="your_key_here"
Enter fullscreen mode Exit fullscreen mode

Install the xAI Python SDK if you want higher-level access:

pip install xai-sdk
Enter fullscreen mode Exit fullscreen mode

For raw HTTP, you only need requests (Python) or fetch (Node.js).

Making your first image-to-video request

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
      "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}
Enter fullscreen mode Exit fullscreen mode

The video is generated asynchronously—poll to check status.

Using Python (raw requests)

import os
import requests

api_key = os.environ["XAI_API_KEY"]

payload = {
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.x.ai/v1/videos/generations",
    json=payload,
    headers=headers
)

data = response.json()
request_id = data["request_id"]
print(f"Job started: {request_id}")
Enter fullscreen mode Exit fullscreen mode

Using a base64 image

Encode a local image as a data URI:

import base64

with open("my_image.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode("utf-8")

payload["image"] = {
    "url": f"data:image/jpeg;base64,{encoded}"
}
Enter fullscreen mode Exit fullscreen mode

Polling for the result

After submitting your request, poll the status endpoint:

GET https://api.x.ai/v1/videos/{request_id}
Enter fullscreen mode Exit fullscreen mode

Status values:

Status Meaning
"processing" Video is still rendering
"done" Video ready; URL in response
"failed" Something went wrong

Completed response:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 6
  },
  "progress": 100
}
Enter fullscreen mode Exit fullscreen mode

Full Python polling loop

import time

def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
    url = f"https://api.x.ai/v1/videos/{request_id}"
    headers = {"Authorization": f"Bearer {api_key}"}

    while True:
        response = requests.get(url, headers=headers)
        data = response.json()
        status = data.get("status")

        print(f"Status: {status} | Progress: {data.get('progress', 0)}%")

        if status == "done":
            return data["video"]
        elif status == "failed":
            raise RuntimeError(f"Video generation failed for {request_id}")

        time.sleep(interval)

# Usage
video = poll_video(request_id, api_key)
print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")
Enter fullscreen mode Exit fullscreen mode

Tip: Keep intervals ≥5 seconds to avoid API rate limits (60 requests/minute).

Using the xAI Python SDK

The xai-sdk library abstracts polling:

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

video = client.video.generate(
    model="grok-imagine-video",
    prompt="Gentle waves move across the surface, morning mist rises slowly",
    image={"url": "https://example.com/landscape.jpg"},
    duration=6,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")
Enter fullscreen mode Exit fullscreen mode

Use the SDK for simple blocking calls; use raw HTTP for custom polling or logging.

Controlling resolution, duration, and aspect ratio

Grok's API gives you flexibility:

Duration

Integers 1–15 seconds; default is 6.

"duration": 10
Enter fullscreen mode Exit fullscreen mode

Resolution

Value Description
"480p" Default, lower cost, faster
"720p" Higher quality; $0.07/sec
"resolution": "720p"
Enter fullscreen mode Exit fullscreen mode

Aspect ratio

Value Use case
"16:9" Default, landscape
"9:16" Vertical, stories/reels
"1:1" Square, social
"4:3" Presentations
"3:4" Portrait
"3:2" Photography crop
"2:3" Tall portrait

Defaults to match your source image unless set explicitly.

Using reference images for style guidance

  • image: source photograph (first frame)
  • reference_images: up to 7 images to guide style (not used as frames)

Example:

{
  "model": "grok-imagine-video",
  "prompt": "A product rotating slowly on a clean white surface",
  "image": {
    "url": "https://example.com/product-shot.jpg"
  },
  "reference_images": [
    {"url": "https://example.com/brand-style-reference-1.jpg"},
    {"url": "https://example.com/lighting-reference.jpg"}
  ],
  "duration": 6,
  "resolution": "720p"
}
Enter fullscreen mode Exit fullscreen mode

Reference images influence style/appearance, not the actual video frames.

Extending and editing videos

You can go beyond initial generation:

Extending a video

POST /v1/videos/extensions lets you add seconds to an existing clip (max 15 sec per call).

curl -X POST https://api.x.ai/v1/videos/extensions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "The mist continues to lift as sunlight breaks through",
    "duration": 5
  }'
Enter fullscreen mode Exit fullscreen mode

Poll the same status endpoint for the extended clip.

Editing a video

POST /v1/videos/edits applies prompt-based modifications:

curl -X POST https://api.x.ai/v1/videos/edits \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "Change the sky to a dramatic sunset with deep orange tones"
  }'
Enter fullscreen mode Exit fullscreen mode

Extensions and edits are both async.

Pricing breakdown: what a 10-second video costs

Component Cost
Input image $0.002 per image
Output 480p $0.05 per second
Output 720p $0.07 per second

10s video @ 720p:

  • Input image: $0.002
  • Output: 10 × $0.07 = $0.70
  • Total: $0.702

6s video @ 480p (default):

  • Input image: $0.002
  • Output: 6 × $0.05 = $0.30
  • Total: $0.302

The input charge applies to every generation request, even with the same image.

Text-to-video (no image) skips the $0.002 input fee.

How to test your Grok video API integration with Apidog

The async workflow requires testing for:

  1. Generation request returns a request_id
  2. Polling handles "processing" correctly
  3. Final response has status == "done" and a video URL

Apidog's Test Scenarios automate this:

Step 1: Create a new Test Scenario

In Apidog, go to the Tests module and click +. Name: "Grok image-to-video async flow".

Step 2: Add the generation request

  • URL: https://api.x.ai/v1/videos/generations
  • Method: POST
  • Header: Authorization: Bearer {{xai_api_key}}
  • Body:
{
  "model": "grok-imagine-video",
  "prompt": "Gentle mist rises from the water as light filters through the trees",
  "image": {
    "url": "https://example.com/your-test-image.jpg"
  },
  "duration": 6,
  "resolution": "480p"
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Extract the request_id

Add an Extract Variable processor:

  • Variable: video_request_id
  • Source: Response body
  • Extraction: JSONPath
  • JSONPath: $.request_id

Step 4: Build the polling loop

Add a For loop:

  • Inside, add GET:
    • URL: https://api.x.ai/v1/videos/{{video_request_id}}
    • Method: GET
    • Header: Authorization: Bearer {{xai_api_key}}
  • Extract video_status via JSONPath $.status
  • Add Wait (5000ms) to avoid rate limits
  • Loop break: {{video_status}} == "done"

Step 5: Assert the video URL

After the loop, add GET to the same endpoint. Add Assertion:

  • Field: $.video.url
  • Condition: Is not empty

Run the scenario:

Click Run. Apidog executes POST, extracts request_id, polls until done, asserts video URL. The test report shows each step's results.

Integrate into CI/CD with Apidog CLI:

apidog run --scenario grok-video-async-flow --env production
Enter fullscreen mode Exit fullscreen mode

Common errors and fixes

401 Unauthorized

Check API key and Authorization header.

422 Unprocessable Entity

Malformed body: check model, prompt, and accessible image.url.

Image URL not accessible

xAI must fetch the URL. Use a public link or base64 data URI.

Status stuck at "processing"

If stuck >10min, resubmit. Generation can take 30s–several minutes.

429 Rate limit

Max 60 requests/min, 1/sec. Add delays between polls.

Base64 upload rejected

Add correct MIME prefix (e.g., data:image/jpeg;base64,).

Aspect ratio mismatch

Set aspect ratio to match your source image for best results.

Conclusion

The Grok image-to-video API lets you animate static images with a simple async workflow: POST image + prompt, get request_id, poll until "done", download the MP4. The model is proven and scales to billions of videos.

Async patterns are error-prone—use Apidog Test Scenarios to automate extraction, polling, and assertions. This catches integration issues before production.

Start building your integration with Apidog free. No credit card required.

FAQ

What model name do I use for the Grok image-to-video API?

Use grok-imagine-video as the model field.

What's the difference between image and reference_images?

image: first frame (animated).

reference_images: guide style/content only.

How long does video generation take?

6s @ 480p: 1–3 min.

15s @ 720p: 4–8 min.

Poll every 5s.

Can I use a local file as the source image?

Yes. Encode as base64 data URI and pass as image.url.

What if I don't specify aspect_ratio?

Defaults to your image's proportions. Text-to-video defaults to 16:9.

How much does a 10s 720p video cost?

$0.002 image + 10 × $0.07 = ~$0.702.

What are the rate limits?

60 requests/min, 1/sec (POST+GET combined).

Can I extend a video beyond 15 seconds?

Yes, with POST /v1/videos/extensions. Each call is up to 15s and async.

Top comments (0)