Wanda

Posted on Apr 3 • Originally published at apidog.com

How to use the Grok text to video API (complete guide)

TL;DR

The Grok text-to-video API lets you generate video clips from text prompts. You POST /v1/videos/generations with your prompt, receive a request_id, then poll GET /v1/videos/{request_id} until "status" is "done". Model: grok-imagine-video. Pricing starts at $0.05/sec at 480p. The xAI Python SDK handles polling out-of-the-box.

Try Apidog today

Introduction

xAI generated 1.2 billion videos in January 2026, the first month after launching the Grok text-to-video API. Grok also ranked #1 on the Artificial Analysis text-to-video leaderboard that month, proving the platform's scale and robustness.

This is a hands-on guide for developers to:

Make your first video generation request
Poll for results and handle async flow
Tune parameters (duration, resolution, aspect ratio)
Write effective prompts
Use reference images
Extend or edit existing videos
Know when to use text-to-video vs image-to-video

💡 Note: The Grok video API is async. Your frontend/app shouldn't block waiting for video generation. For UI dev/testing, use Apidog's Smart Mock to instantly mock generation and poll endpoints—no real credits needed. This allows full UI and workflow testing while your backend is still in progress.

What is the Grok Text-to-Video API?

The Grok text-to-video API is part of xAI's media generation suite at https://api.x.ai. Send a text prompt—grok-imagine-video generates a video from scratch (no image required).

Other endpoints:

Synchronous image generation: POST /v1/images/generations (grok-imagine-image, $0.02/image)
Video extension/editing endpoints

Difference: text-to-video creates everything from your prompt. For animating a specific image, see the Grok image to video API guide.

How Text-to-Video Generation Works (Async Pattern)

Video generation is async:

Send a POST request with your prompt.
API returns a request_id immediately.
Video is generated server-side.
Poll GET /v1/videos/{request_id} until "status" is "done".
When done, response includes the video URL.

Your frontend/app must handle this async flow (e.g., show loading, then display video when ready).

Prerequisites

Before coding, set up:

xAI Account — console.x.ai
API Key — In the xAI console, create and save your key. Pass as Bearer token in headers.

Example (set as environment variable):

export XAI_API_KEY="your_api_key_here"

(Optional) Install the xAI Python SDK:

pip install xai-sdk

Your First Text-to-Video Request

Endpoint: POST https://api.x.ai/v1/videos/generations

Required fields: model, prompt

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting"
  }'

Response:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}

Using Python (requests)

import requests
import os

API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "grok-imagine-video",
    "prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting"
}

response = requests.post(
    f"{BASE_URL}/v1/videos/generations",
    headers=headers,
    json=payload
)

data = response.json()
request_id = data["request_id"]
print(f"Generation started. Request ID: {request_id}")

Polling for the Video Result

Poll GET /v1/videos/{request_id} until "status" is "done".

Status values:

"processing": still generating
"done": complete, video URL available
"failed": error

Python polling loop:

import requests
import time
import os

API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

def poll_video(request_id: str, interval: int = 5, max_attempts: int = 60) -> dict:
    url = f"{BASE_URL}/v1/videos/{request_id}"

    for attempt in range(max_attempts):
        response = requests.get(url, headers=headers)
        data = response.json()

        status = data.get("status")
        progress = data.get("progress", 0)
        print(f"Attempt {attempt + 1}: status={status}, progress={progress}%")

        if status == "done":
            return data
        elif status == "failed":
            raise RuntimeError(f"Video generation failed: {data}")

        time.sleep(interval)

    raise TimeoutError(f"Video not ready after {max_attempts} attempts")

def generate_video(prompt: str) -> str:
    response = requests.post(
        f"{BASE_URL}/v1/videos/generations",
        headers={**headers, "Content-Type": "application/json"},
        json={"model": "grok-imagine-video", "prompt": prompt}
    )
    request_id = response.json()["request_id"]
    print(f"Request ID: {request_id}")

    result = poll_video(request_id)
    video_url = result["video"]["url"]
    print(f"Video ready: {video_url}")
    return video_url

# Example usage
video_url = generate_video(
    "A timelapse of a city skyline at sunset transitioning to night, aerial view"
)

Poll response:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 8,
    "respect_moderation": true
  },
  "progress": 100,
  "usage": {
    "cost_in_usd_ticks": 500000000
  }
}

Using the xAI Python SDK

The SDK handles polling for you. client.video.generate() blocks until the video is ready.

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

result = client.video.generate(
    model="grok-imagine-video",
    prompt="A golden retriever running through autumn leaves in slow motion",
    duration=8,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {result.video.url}")
print(f"Duration: {result.video.duration}s")

Use the SDK for quick starts. Use raw requests for advanced polling, retries, or custom status handling.

Writing Effective Prompts for Video Generation

Prompt quality determines output quality. Use a structured approach:

Scene description: Specify subject and setting.
- “A white ceramic coffee mug on a wooden table beside a rain-streaked window”
Motion: Describe what moves and how.
- “The camera slowly orbits the mug as steam curls upward”
Camera style: Use film/video terms.
- “Close-up,” “tracking shot,” “overhead drone view,” etc.
Lighting and mood: Specify light and vibe.
- “Golden hour,” “overcast,” “neon-lit,” etc.
Style references: Add style cues.
- “Cinematic,” “anime,” “stop-motion,” etc.

Prompt structure example:

A lone astronaut floats past the International Space Station,
tether drifting behind them. The camera tracks slowly
alongside, showing Earth below. Cinematic, IMAX quality,
warm sunrise light reflecting off the visor.

Controlling Resolution, Duration, and Aspect Ratio

Parameters:

duration (seconds): 1–15 (default: 6)
resolution: "480p" (default) or "720p"
aspect_ratio: "16:9" (default), "9:16", "1:1", "4:3", "3:4", "3:2", "2:3"

Aspect Ratio Table:

Ratio	Best for
`16:9`	Desktop, YouTube, presentations
`9:16`	TikTok, Instagram Reels, mobile
`1:1`	Instagram feed, social cards
`4:3`	Classic video, presentations
`3:4`	Portrait mobile content
`3:2`	Standard photo ratio
`2:3`	Portrait photography

Example (all parameters):

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
    "duration": 10,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'

Using Reference Images to Guide Video Style

The reference_images parameter accepts up to 7 image URLs to guide style, color, and texture (but prompt still drives content):

{
  "model": "grok-imagine-video",
  "prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
  "reference_images": [
    {"url": "https://example.com/my-style-reference.jpg"},
    {"url": "https://example.com/color-palette-reference.jpg"}
  ]
}

Use visually consistent images for best results.
Reference images influence visual style, not the subject.
For turning a specific image into a video, use the image-to-video endpoint.

Extending and Editing Generated Videos

Two async endpoints for working with existing videos:

Extend a video: POST /v1/videos/extensions
- Pass the original request_id and a new prompt to add footage.
Edit a video: POST /v1/videos/edits
- Modify style or content of an existing video with a text instruction.

Both return a new request_id and follow the same poll-until-done pattern.

Reading the Cost from the API Response

Check the usage object in the final poll response:

"usage": {
  "cost_in_usd_ticks": 500000000
}

1 USD tick = 0.0000001 USD. (Divide by 10,000,000)
Example: 500000000 ticks = $0.50.

Python conversion:

cost_in_usd = result["usage"]["cost_in_usd_ticks"] / 10_000_000
print(f"Cost: ${cost_in_usd:.4f}")
# Output: Cost: $0.0500

Pricing Reference:

Resolution	Price per second	10-second clip
480p	$0.05	$0.50
720p	$0.07	$0.70

How to Test Your Grok Video API with Apidog

Async polling makes frontend testing tricky. Apidog's Smart Mock solves this:

Use Case 1: Smart Mock for Frontend Development

Mock the generation endpoint: In Apidog, create POST /v1/videos/generations. Define response with a request_id string. Smart Mock generates a UUID automatically.

  {
    "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
  }

Mock the poll endpoint: Create GET /v1/videos/{request_id}. Define schema with status, video.url, video.duration, progress, usage.cost_in_usd_ticks. Use a Custom Mock to return "status": "done" with a fake MP4 URL.

  {
    "status": "done",
    "video": {
      "url": "https://vidgen.x.ai/mock-video-12345.mp4",
      "duration": 8,
      "respect_moderation": true
    },
    "progress": 100,
    "usage": {
      "cost_in_usd_ticks": 400000000
    }
  }

Frontend teams can now test all UI states (loading, done, error) without spending API credits.

Use Case 2: Test Scenarios for the Polling Loop

Step 1: Add POST /v1/videos/generations as the first step. Extract request_id with JSONPath $.request_id into videoRequestId.
Step 2: Add GET /v1/videos/{{videoRequestId}} with a loop. Break when response.body.status == "done". Add a 5s wait between polls.
Step 3: Assert $.video.url is not empty to confirm successful completion.

Automate this in CI to catch regressions in your async flow.

Text-to-Video vs Image-to-Video: When to Use Each

Both use grok-imagine-video, but for different cases:

Text-to-video:

For creating original content from a prompt/concept
When you want full model creativity
For prompt-based content tools
No source image needed

Image-to-video:

To animate a specific image, product, or artwork
To preserve visual details from an existing image
For consistent animation across a set of images

Routing tip: If user uploads an image, use image-to-video. If they only type a prompt, use text-to-video.

See the Grok image to video API guide for more.

Common Errors and Fixes

401 Unauthorized: API key missing/invalid. Check Authorization header (Bearer YOUR_XAI_API_KEY).
429 Too Many Requests: Rate limit exceeded (60/min, 1/sec). Add delay between polls (≥5s).
status: "failed": Prompt rejected by moderation. Revise to remove ambiguous or sensitive terms.
Video URL 404: URLs are temporary. Download video immediately after generation.
Empty/frozen video: Add explicit motion to your prompt.
Slow polling: 720p and longer durations take more time. Use 480p/short clips for dev/testing.

Conclusion

The Grok text-to-video API provides a simple async pattern: send a prompt, get a request_id, poll until done, fetch your MP4. Once you structure the polling loop, other features—duration, resolution, aspect ratio, reference images—are easy to integrate.

For production, always track costs via cost_in_usd_ticks. Use Apidog to mock endpoints and run automated test scenarios so your frontend and backend teams can develop in parallel.

FAQ

What model name do I use for text-to-video generation?

Use grok-imagine-video as the model field in POST /v1/videos/generations.

How long does video generation take?

Depends on duration and resolution. 480p may finish in < 30s; 720p and/or longer clips may take minutes. Poll every 5–10 seconds.

Can I generate a video longer than 15 seconds?

No. Max duration is 15s. For longer, generate then extend using POST /v1/videos/extensions.

How do I download the generated video?

Download the MP4 from result.video.url immediately—URLs expire.

What happens if my prompt violates content moderation?

Status will be "failed". Check respect_moderation in the poll response. Revise your prompt and try again.

Is there a free tier for the video API?

No official free tier, but check console.x.ai for current offers/credits.

How do reference_images differ from a source image?

Reference images guide style for text-to-video. Source images (image-to-video) become the actual first video frame.

Best way to test polling loop without spending credits?

Use Apidog's Smart Mock to mock generation and poll endpoints, covering all states.

DEV Community