DEV Community

Cover image for How to Use the Grok 4.3 API ?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the Grok 4.3 API ?

xAI rolled out Grok 4.3 in stages: beta on April 17, 2026, API access on April 30, and full general availability on May 6. The release adds a 1,000,000-token context window, native video input, always-on reasoning, and roughly 40% lower pricing versus Grok 4.20. Eight legacy Grok models retire on May 15, so teams still using grok-3 or grok-4 models should migrate now.

Try Apidog today

This guide shows how to call Grok 4.3 from code: endpoint format, authentication, OpenAI-compatible SDK setup, reasoning_effort, video input, function calling, and a repeatable test workflow in Apidog.

For the voice side of the same release, see How to use Grok Voice for free. For the head-to-head against OpenAI’s flagship voice model, see Grok Voice vs GPT-Realtime.

TL;DR

  • Grok 4.3 went GA on May 6, 2026.
  • Eight legacy models retire on May 15, 2026.
  • Pricing:
    • $1.25 per 1M input tokens
    • $2.50 per 1M output tokens
    • $0.20 per 1M cached input tokens
  • Context window: 1,000,000 tokens.
  • New input type: native video input.
  • Reasoning is always on.
  • reasoning_effort supports low, medium, and high.
  • Default reasoning effort is medium.
  • Endpoint: https://api.x.ai/v1/chat/completions.
  • The API is OpenAI-compatible for Chat Completions.
  • Standard-tier throughput is around 159 tokens/second.
  • Intelligence Index: 53, according to Artificial Analysis.
  • Use Apidog to save request variants, compare reasoning settings, and replay the same test across providers.

What changed in Grok 4.3

For most developer teams, the important changes are practical:

  1. Lower token cost

Input pricing is down 37.5% versus Grok 4.20. Output pricing is down 58.3%. Cached input is now $0.20 per 1M tokens, which matters if you reuse long system prompts or large static context.

  1. 1M-token context window

Grok 4.3 increases the context window from 256k to 1M tokens. That makes it usable for large prompts such as codebases, transcripts, long contracts, and multi-document workflows.

  1. Native video input

Grok 4.3 is the first Grok model with native video input. You can pass a video URL in the message content and ask the model to reason over the clip.

  1. Always-on reasoning

Every request includes reasoning. The reasoning_effort parameter controls depth, but the model does not run below low.

  1. Better agent workflows

xAI reports a +300 Elo gain on GDPval-AA versus Grok 4.20. In practice, this matters most for tool selection, multi-step workflows, and function-calling agents.

Artificial Analysis gives Grok 4.3 an Intelligence Index of 53, above the average of 35 for its price tier, and ranks it tenth out of 146 tracked models.

Prerequisites

Before sending your first request, prepare:

  • An xAI Console account at console.x.ai
  • A billable tier with an API key
  • A project-scoped API key for production use
  • The OpenAI SDK or the xAI SDK
  • An API client for saving and replaying requests

xAI Console screenshot

Export your API key:

export XAI_API_KEY="xai-..."
Enter fullscreen mode Exit fullscreen mode

If you are testing locally, use an environment file or shell variable. For production, store the key in your secret manager.

Endpoint and authentication

Grok 4.3 uses the OpenAI-compatible Chat Completions API with xAI’s base URL.

POST https://api.x.ai/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

Required headers:

Authorization: Bearer $XAI_API_KEY
Content-Type: application/json
Enter fullscreen mode Exit fullscreen mode

Because the API is OpenAI-compatible, most existing OpenAI SDK code only needs two changes:

  1. Change the API key.
  2. Change the base_url.

Python example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Summarize the trade-offs of GraphQL vs REST in three bullets.",
        }
    ],
    reasoning_effort="medium",
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

If you use the xAI SDK instead, the request shape is similar. The main difference is the client import and initialization.

Request parameters

Use these parameters for most Grok 4.3 Chat Completions requests:

Parameter Type Values Notes
model string grok-4.3 Required.
messages array OpenAI message shape Required. Supports system, user, and assistant roles.
reasoning_effort string low, medium, high Optional. Default: medium. Higher values can increase latency and output tokens.
max_tokens int 1–32768 Caps output length.
temperature float 0.0–2.0 Default: 1.0.
top_p float 0.0–1.0 Nucleus sampling.
stream bool true, false Enables server-sent events when true.
tools array OpenAI tool shape Used for function calling.
tool_choice string / object auto, none, or specific tool Uses standard OpenAI semantics.
response_format object { "type": "json_object" } Enables structured JSON output.
seed int any integer Useful for reproducibility with temperature: 0.

Minimal curl request

curl https://api.x.ai/v1/chat/completions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior backend engineer."
      },
      {
        "role": "user",
        "content": "Review this query plan and flag the bottleneck."
      }
    ],
    "reasoning_effort": "high"
  }'
Enter fullscreen mode Exit fullscreen mode

The response uses the standard OpenAI-style shape:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "reasoning_tokens": 78,
    "total_tokens": 657
  }
}
Enter fullscreen mode Exit fullscreen mode

Read the final text from:

response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Choosing a reasoning effort

Grok 4.3 supports three reasoning levels.

Use low for fast, simple tasks

Good fits:

  • Classification
  • Summarization
  • Rule extraction
  • Simple Q&A
  • Lightweight routing

Example:

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Classify this ticket as billing, bug, feature request, or account access: ...",
        }
    ],
    reasoning_effort="low",
)
Enter fullscreen mode Exit fullscreen mode

Use medium for default production traffic

Good fits:

  • Customer support
  • Single-step tool use
  • Data analysis
  • Normal code explanations
  • Function calling
response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Analyze this API error log and suggest the most likely root cause.",
        }
    ],
    reasoning_effort="medium",
)
Enter fullscreen mode Exit fullscreen mode

Use high for complex workflows

Good fits:

  • Multi-step agents
  • Long code review
  • Complex math
  • Planning-heavy tasks
  • Debugging with many constraints
response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Review this migration plan, identify risks, and produce a safer rollout sequence.",
        }
    ],
    reasoning_effort="high",
)
Enter fullscreen mode Exit fullscreen mode

Reasoning is always enabled. Setting reasoning_effort to low reduces depth, but it does not disable reasoning.

Function calling

Grok 4.3 supports the standard OpenAI function-calling shape.

The flow is:

  1. Define tools.
  2. Send the user message and tool schema.
  3. Read tool_calls from the assistant message.
  4. Execute the tool in your application.
  5. Send the tool result back with role tool.
  6. Ask the model to produce the final answer.

Define a tool

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_user",
            "description": "Look up a user by ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {
                        "type": "string"
                    }
                },
                "required": ["user_id"],
            },
        },
    }
]
Enter fullscreen mode Exit fullscreen mode

Ask Grok 4.3 to call the tool

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Find user u_42 and tell me their last login.",
        }
    ],
    tools=tools,
    reasoning_effort="medium",
)

message = response.choices[0].message
tool_calls = message.tool_calls

print(tool_calls)
Enter fullscreen mode Exit fullscreen mode

Execute and return the tool result

messages = [
    {
        "role": "user",
        "content": "Find user u_42 and tell me their last login.",
    },
    message,
]

for tool_call in tool_calls:
    if tool_call.function.name == "lookup_user":
        # Replace this with your real database/API call.
        result = {
            "user_id": "u_42",
            "last_login": "2026-05-06T14:22:00Z",
        }

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            }
        )

final_response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    reasoning_effort="medium",
)

print(final_response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

The GDPval-AA gain is especially relevant here: Grok 4.3 should be better at choosing tools, avoiding redundant calls, and recovering from tool errors.

If you are testing tool workflows, MCP server testing in Apidog covers a replay-based setup.

Video input

Grok 4.3 is the first Grok model with native video input. Pass a video URL inside the message content array.

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe what happens in this clip and flag any anomalies.",
                },
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://example.com/clip.mp4"
                    },
                },
            ],
        }
    ],
)
Enter fullscreen mode Exit fullscreen mode

Video tokens count against input usage. If cost or latency matters:

  • Trim the clip before sending.
  • Downsample when full resolution is unnecessary.
  • Avoid sending repeated static footage.
  • Cache surrounding text context when possible.

The model reasons over frames natively, so you do not need to manually extract keyframes first.

Using the 1M-token context window

The 1M-token context window is useful when retrieval or chunking would remove important context.

Common patterns:

Whole-codebase review

Send:

  • The diff
  • Touched files
  • Related interfaces
  • Test output
  • Lint output
  • Migration notes

Prompt example:

Review this change as a senior backend engineer.

Focus on:
1. Data loss risks
2. Transaction boundaries
3. Backward compatibility
4. Test gaps
5. Rollback strategy

Context:
...
Enter fullscreen mode Exit fullscreen mode

Long-document QA

Use it for:

  • Legal contracts
  • Earnings calls
  • Compliance policies
  • Technical specifications
  • Incident timelines

Prompt example:

Answer only from the provided document.

Question:
Which clauses describe termination rights, and what notice period applies to each party?
Enter fullscreen mode Exit fullscreen mode

Agent memory

For agent workflows, you can keep long conversation history in context instead of summarizing aggressively. This is useful when prior details affect personalization or task continuity.

Cached input pricing makes stable long context cheaper. For example, a 400k-token stable system prompt costs $0.08 per cached call at $0.20 per 1M cached tokens, instead of $0.50 at the fresh input rate.

Migrating from legacy Grok models

Eight legacy Grok models retire on May 15, 2026, 12:00 PM PT.

For most apps, migration is:

- model="grok-4.20"
+ model="grok-4.3"
Enter fullscreen mode Exit fullscreen mode

or:

- model="grok-3"
+ model="grok-4.3"
Enter fullscreen mode Exit fullscreen mode

Because the request shape is compatible, most Chat Completions calls should continue working.

Watch for two differences.

1. Reasoning behavior

Some legacy models did not accept reasoning_effort. Grok 4.3 always reasons.

If your previous workflow depended on a very fast non-reasoning path, start with:

{
  "reasoning_effort": "low"
}
Enter fullscreen mode Exit fullscreen mode

Then measure latency and quality before moving to medium or high.

2. Output formatting

Grok 4.3 tends to produce more structured output than Grok 4.20. If your application uses regex-based parsing, retest before switching production traffic.

For broader model pricing context, see GPT-5.5 pricing. For reasoning-model usage patterns, see How to use the GPT-5.5 API.

Testing Grok 4.3 in Apidog

Use Apidog to create repeatable API tests before migrating production traffic.

Recommended setup:

  1. Create an Apidog environment.
  2. Add these variables:
XAI_API_KEY = xai-...
BASE_URL = https://api.x.ai/v1
MODEL = grok-4.3
REASONING_EFFORT = medium
Enter fullscreen mode Exit fullscreen mode
  1. Create a POST request:
{{BASE_URL}}/chat/completions
Enter fullscreen mode Exit fullscreen mode
  1. Add headers:
Authorization: Bearer {{XAI_API_KEY}}
Content-Type: application/json
Enter fullscreen mode Exit fullscreen mode
  1. Add the request body:
{
  "model": "{{MODEL}}",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior backend engineer."
    },
    {
      "role": "user",
      "content": "Review this API design and identify the top three implementation risks."
    }
  ],
  "reasoning_effort": "{{REASONING_EFFORT}}"
}
Enter fullscreen mode Exit fullscreen mode
  1. Duplicate the request three times:

    • Grok 4.3 - low
    • Grok 4.3 - medium
    • Grok 4.3 - high
  2. Change only REASONING_EFFORT.

Compare:

  • Response quality
  • Latency
  • usage.prompt_tokens
  • usage.completion_tokens
  • usage.reasoning_tokens
  • Total cost

To compare with another provider, duplicate the environment and change BASE_URL, MODEL, and the API key. Keep the same prompt and request body.

Download Apidog to run the comparison. For broader API testing strategy, see API testing tool for QA engineers.

Apidog API testing screenshot

Rate limits

xAI Console tier limits range from a few thousand requests per minute on Tier 1 to multi-hundred-thousand request limits on enterprise tiers. Exact numbers can change, so check your console dashboard.

The advertised 159 tokens/second throughput is per-stream output speed, not total account throughput. Concurrent requests scale within your tier limits.

If you exceed your limit, the API returns HTTP 429 with a retry-after header.

Basic retry pattern:

import time
from openai import RateLimitError

for attempt in range(5):
    try:
        response = client.chat.completions.create(
            model="grok-4.3",
            messages=[
                {
                    "role": "user",
                    "content": "Summarize this incident report.",
                }
            ],
            reasoning_effort="medium",
        )
        break
    except RateLimitError as error:
        wait_seconds = min(2 ** attempt, 30)
        time.sleep(wait_seconds)
else:
    raise RuntimeError("Request failed after retries")
Enter fullscreen mode Exit fullscreen mode

In production, also add jitter and respect the retry-after header when present.

FAQ

Is Grok 4.3 OpenAI-compatible end to end?

For Chat Completions, yes. You can use the OpenAI SDK, change base_url, change model, and keep the same request shape. Function calling, structured output, and streaming use the same semantics.

Does Grok 4.3 support the Responses API?

The xAI surface is Chat Completions today. The Responses API is OpenAI-only.

What is the actual context limit?

The context limit is 1,000,000 tokens. Long inputs still cost money, so use cached input when your prompt is stable.

How does always-on reasoning affect latency?

First-token latency is higher than non-reasoning models, but Grok 4.3 streams output at around 159 tokens/second. Use low for simple paths and reserve high for planning-heavy work.

Can I use Grok 4.3 with Grok Voice?

Yes. The voice agent, grok-voice-think-fast-1.0, calls Grok 4.3 under the hood when it reasons. You can also call Grok 4.3 directly from a custom voice loop built with TTS and STT components.

What happens to old Grok 3 or Grok 4 calls after May 15?

They fail with HTTP 410 because the model is retired. Migrate before the cutoff.

Does Grok 4.3 support image input?

Yes. It supports image input alongside video input. Pass an image URL in a content block using the OpenAI-style message format.

Wrapping up

Grok 4.3 is a practical migration target if you need lower token costs, larger context, always-on reasoning, native video input, and OpenAI-compatible Chat Completions. For existing OpenAI SDK users, the migration is mostly a base URL and model-name change.

The fastest validation path is to create three request variants in Apidog, test low, medium, and high reasoning on your real prompts, then compare latency, quality, and token usage before moving production traffic.

Top comments (0)