Hassann

Posted on May 8 • Originally published at apidog.com

How to Use the Grok 4.3 API ?

xAI rolled out Grok 4.3 in stages: beta on April 17, 2026, API access on April 30, and full general availability on May 6. The release adds a 1,000,000-token context window, native video input, always-on reasoning, and roughly 40% lower pricing versus Grok 4.20. Eight legacy Grok models retire on May 15, so teams still using grok-3 or grok-4 models should migrate now.

Try Apidog today

This guide shows how to call Grok 4.3 from code: endpoint format, authentication, OpenAI-compatible SDK setup, reasoning_effort, video input, function calling, and a repeatable test workflow in Apidog.

For the voice side of the same release, see How to use Grok Voice for free. For the head-to-head against OpenAI’s flagship voice model, see Grok Voice vs GPT-Realtime.

TL;DR

Grok 4.3 went GA on May 6, 2026.
Eight legacy models retire on May 15, 2026.
Pricing:
- $1.25 per 1M input tokens
- $2.50 per 1M output tokens
- $0.20 per 1M cached input tokens
Context window: 1,000,000 tokens.
New input type: native video input.
Reasoning is always on.
reasoning_effort supports low, medium, and high.
Default reasoning effort is medium.
Endpoint: https://api.x.ai/v1/chat/completions.
The API is OpenAI-compatible for Chat Completions.
Standard-tier throughput is around 159 tokens/second.
Intelligence Index: 53, according to Artificial Analysis.
Use Apidog to save request variants, compare reasoning settings, and replay the same test across providers.

What changed in Grok 4.3

For most developer teams, the important changes are practical:

Lower token cost

Input pricing is down 37.5% versus Grok 4.20. Output pricing is down 58.3%. Cached input is now $0.20 per 1M tokens, which matters if you reuse long system prompts or large static context.

1M-token context window

Grok 4.3 increases the context window from 256k to 1M tokens. That makes it usable for large prompts such as codebases, transcripts, long contracts, and multi-document workflows.

Native video input

Grok 4.3 is the first Grok model with native video input. You can pass a video URL in the message content and ask the model to reason over the clip.

Always-on reasoning

Every request includes reasoning. The reasoning_effort parameter controls depth, but the model does not run below low.

Better agent workflows

xAI reports a +300 Elo gain on GDPval-AA versus Grok 4.20. In practice, this matters most for tool selection, multi-step workflows, and function-calling agents.

Artificial Analysis gives Grok 4.3 an Intelligence Index of 53, above the average of 35 for its price tier, and ranks it tenth out of 146 tracked models.

Prerequisites

Before sending your first request, prepare:

An xAI Console account at console.x.ai
A billable tier with an API key
A project-scoped API key for production use
The OpenAI SDK or the xAI SDK
An API client for saving and replaying requests

Export your API key:

export XAI_API_KEY="xai-..."

If you are testing locally, use an environment file or shell variable. For production, store the key in your secret manager.

Endpoint and authentication

Grok 4.3 uses the OpenAI-compatible Chat Completions API with xAI’s base URL.

POST https://api.x.ai/v1/chat/completions

Required headers:

Authorization: Bearer $XAI_API_KEY
Content-Type: application/json

Because the API is OpenAI-compatible, most existing OpenAI SDK code only needs two changes:

Change the API key.
Change the base_url.

Python example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Summarize the trade-offs of GraphQL vs REST in three bullets.",
        }
    ],
    reasoning_effort="medium",
)

print(response.choices[0].message.content)

If you use the xAI SDK instead, the request shape is similar. The main difference is the client import and initialization.

Request parameters

Use these parameters for most Grok 4.3 Chat Completions requests:

Parameter	Type	Values	Notes
`model`	string	`grok-4.3`	Required.
`messages`	array	OpenAI message shape	Required. Supports `system`, `user`, and `assistant` roles.
`reasoning_effort`	string	`low`, `medium`, `high`	Optional. Default: `medium`. Higher values can increase latency and output tokens.
`max_tokens`	int	`1–32768`	Caps output length.
`temperature`	float	`0.0–2.0`	Default: `1.0`.
`top_p`	float	`0.0–1.0`	Nucleus sampling.
`stream`	bool	`true`, `false`	Enables server-sent events when `true`.
`tools`	array	OpenAI tool shape	Used for function calling.
`tool_choice`	string / object	`auto`, `none`, or specific tool	Uses standard OpenAI semantics.
`response_format`	object	`{ "type": "json_object" }`	Enables structured JSON output.
`seed`	int	any integer	Useful for reproducibility with `temperature: 0`.

Minimal curl request

curl https://api.x.ai/v1/chat/completions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior backend engineer."
      },
      {
        "role": "user",
        "content": "Review this query plan and flag the bottleneck."
      }
    ],
    "reasoning_effort": "high"
  }'

The response uses the standard OpenAI-style shape:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "reasoning_tokens": 78,
    "total_tokens": 657
  }
}

Read the final text from:

response.choices[0].message.content

Choosing a reasoning effort

Grok 4.3 supports three reasoning levels.

Use `low` for fast, simple tasks

Good fits:

Classification
Summarization
Rule extraction
Simple Q&A
Lightweight routing

Example:

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Classify this ticket as billing, bug, feature request, or account access: ...",
        }
    ],
    reasoning_effort="low",
)

Use `medium` for default production traffic

Good fits:

Customer support
Single-step tool use
Data analysis
Normal code explanations
Function calling

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Analyze this API error log and suggest the most likely root cause.",
        }
    ],
    reasoning_effort="medium",
)

Use `high` for complex workflows

Good fits:

Multi-step agents
Long code review
Complex math
Planning-heavy tasks
Debugging with many constraints

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Review this migration plan, identify risks, and produce a safer rollout sequence.",
        }
    ],
    reasoning_effort="high",
)

Reasoning is always enabled. Setting reasoning_effort to low reduces depth, but it does not disable reasoning.

Function calling

Grok 4.3 supports the standard OpenAI function-calling shape.

The flow is:

Define tools.
Send the user message and tool schema.
Read tool_calls from the assistant message.
Execute the tool in your application.
Send the tool result back with role tool.
Ask the model to produce the final answer.

Define a tool

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_user",
            "description": "Look up a user by ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {
                        "type": "string"
                    }
                },
                "required": ["user_id"],
            },
        },
    }
]

Ask Grok 4.3 to call the tool

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": "Find user u_42 and tell me their last login.",
        }
    ],
    tools=tools,
    reasoning_effort="medium",
)

message = response.choices[0].message
tool_calls = message.tool_calls

print(tool_calls)

Execute and return the tool result

messages = [
    {
        "role": "user",
        "content": "Find user u_42 and tell me their last login.",
    },
    message,
]

for tool_call in tool_calls:
    if tool_call.function.name == "lookup_user":
        # Replace this with your real database/API call.
        result = {
            "user_id": "u_42",
            "last_login": "2026-05-06T14:22:00Z",
        }

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            }
        )

final_response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    reasoning_effort="medium",
)

print(final_response.choices[0].message.content)

The GDPval-AA gain is especially relevant here: Grok 4.3 should be better at choosing tools, avoiding redundant calls, and recovering from tool errors.

If you are testing tool workflows, MCP server testing in Apidog covers a replay-based setup.

Video input

Grok 4.3 is the first Grok model with native video input. Pass a video URL inside the message content array.

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe what happens in this clip and flag any anomalies.",
                },
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://example.com/clip.mp4"
                    },
                },
            ],
        }
    ],
)

Video tokens count against input usage. If cost or latency matters:

Trim the clip before sending.
Downsample when full resolution is unnecessary.
Avoid sending repeated static footage.
Cache surrounding text context when possible.

The model reasons over frames natively, so you do not need to manually extract keyframes first.

Using the 1M-token context window

The 1M-token context window is useful when retrieval or chunking would remove important context.

Common patterns:

Whole-codebase review

Send:

The diff
Touched files
Related interfaces
Test output
Lint output
Migration notes

Prompt example:

Review this change as a senior backend engineer.

Focus on:
1. Data loss risks
2. Transaction boundaries
3. Backward compatibility
4. Test gaps
5. Rollback strategy

Context:
...

Long-document QA

Use it for:

Legal contracts
Earnings calls
Compliance policies
Technical specifications
Incident timelines

Prompt example:

Answer only from the provided document.

Question:
Which clauses describe termination rights, and what notice period applies to each party?

Agent memory

For agent workflows, you can keep long conversation history in context instead of summarizing aggressively. This is useful when prior details affect personalization or task continuity.

Cached input pricing makes stable long context cheaper. For example, a 400k-token stable system prompt costs $0.08 per cached call at $0.20 per 1M cached tokens, instead of $0.50 at the fresh input rate.

Migrating from legacy Grok models

Eight legacy Grok models retire on May 15, 2026, 12:00 PM PT.

For most apps, migration is:

- model="grok-4.20"
+ model="grok-4.3"

or:

- model="grok-3"
+ model="grok-4.3"

Because the request shape is compatible, most Chat Completions calls should continue working.

Watch for two differences.

1. Reasoning behavior

Some legacy models did not accept reasoning_effort. Grok 4.3 always reasons.

If your previous workflow depended on a very fast non-reasoning path, start with:

{
  "reasoning_effort": "low"
}

Then measure latency and quality before moving to medium or high.

2. Output formatting

Grok 4.3 tends to produce more structured output than Grok 4.20. If your application uses regex-based parsing, retest before switching production traffic.

For broader model pricing context, see GPT-5.5 pricing. For reasoning-model usage patterns, see How to use the GPT-5.5 API.

Testing Grok 4.3 in Apidog

Use Apidog to create repeatable API tests before migrating production traffic.

Recommended setup:

Create an Apidog environment.
Add these variables:

XAI_API_KEY = xai-...
BASE_URL = https://api.x.ai/v1
MODEL = grok-4.3
REASONING_EFFORT = medium

Create a POST request:

{{BASE_URL}}/chat/completions

Add headers:

Authorization: Bearer {{XAI_API_KEY}}
Content-Type: application/json

Add the request body:

{
  "model": "{{MODEL}}",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior backend engineer."
    },
    {
      "role": "user",
      "content": "Review this API design and identify the top three implementation risks."
    }
  ],
  "reasoning_effort": "{{REASONING_EFFORT}}"
}

Duplicate the request three times:
- Grok 4.3 - low
- Grok 4.3 - medium
- Grok 4.3 - high
Change only REASONING_EFFORT.

Compare:

Response quality
Latency
usage.prompt_tokens
usage.completion_tokens
usage.reasoning_tokens
Total cost

To compare with another provider, duplicate the environment and change BASE_URL, MODEL, and the API key. Keep the same prompt and request body.

Download Apidog to run the comparison. For broader API testing strategy, see API testing tool for QA engineers.

Rate limits

xAI Console tier limits range from a few thousand requests per minute on Tier 1 to multi-hundred-thousand request limits on enterprise tiers. Exact numbers can change, so check your console dashboard.

The advertised 159 tokens/second throughput is per-stream output speed, not total account throughput. Concurrent requests scale within your tier limits.

If you exceed your limit, the API returns HTTP 429 with a retry-after header.

Basic retry pattern:

import time
from openai import RateLimitError

for attempt in range(5):
    try:
        response = client.chat.completions.create(
            model="grok-4.3",
            messages=[
                {
                    "role": "user",
                    "content": "Summarize this incident report.",
                }
            ],
            reasoning_effort="medium",
        )
        break
    except RateLimitError as error:
        wait_seconds = min(2 ** attempt, 30)
        time.sleep(wait_seconds)
else:
    raise RuntimeError("Request failed after retries")

In production, also add jitter and respect the retry-after header when present.

FAQ

Is Grok 4.3 OpenAI-compatible end to end?

For Chat Completions, yes. You can use the OpenAI SDK, change base_url, change model, and keep the same request shape. Function calling, structured output, and streaming use the same semantics.

Does Grok 4.3 support the Responses API?

The xAI surface is Chat Completions today. The Responses API is OpenAI-only.

What is the actual context limit?

The context limit is 1,000,000 tokens. Long inputs still cost money, so use cached input when your prompt is stable.

How does always-on reasoning affect latency?

First-token latency is higher than non-reasoning models, but Grok 4.3 streams output at around 159 tokens/second. Use low for simple paths and reserve high for planning-heavy work.

Can I use Grok 4.3 with Grok Voice?

Yes. The voice agent, grok-voice-think-fast-1.0, calls Grok 4.3 under the hood when it reasons. You can also call Grok 4.3 directly from a custom voice loop built with TTS and STT components.

What happens to old Grok 3 or Grok 4 calls after May 15?

They fail with HTTP 410 because the model is retired. Migrate before the cutoff.

Does Grok 4.3 support image input?

Yes. It supports image input alongside video input. Pass an image URL in a content block using the OpenAI-style message format.

Wrapping up

Grok 4.3 is a practical migration target if you need lower token costs, larger context, always-on reasoning, native video input, and OpenAI-compatible Chat Completions. For existing OpenAI SDK users, the migration is mostly a base URL and model-name change.

The fastest validation path is to create three request variants in Apidog, test low, medium, and high reasoning on your real prompts, then compare latency, quality, and token usage before moving production traffic.

DEV Community

How to Use the Grok 4.3 API ?

TL;DR

What changed in Grok 4.3

Prerequisites

Endpoint and authentication

Python example

Request parameters

Minimal curl request

Choosing a reasoning effort

Use `low` for fast, simple tasks

Use `medium` for default production traffic

Use `high` for complex workflows

Function calling

Define a tool

Ask Grok 4.3 to call the tool

Execute and return the tool result

Video input

Using the 1M-token context window

Whole-codebase review

Long-document QA

Agent memory

Migrating from legacy Grok models

1. Reasoning behavior

2. Output formatting

Testing Grok 4.3 in Apidog

Rate limits

FAQ

Is Grok 4.3 OpenAI-compatible end to end?

Does Grok 4.3 support the Responses API?

What is the actual context limit?

How does always-on reasoning affect latency?

Can I use Grok 4.3 with Grok Voice?

What happens to old Grok 3 or Grok 4 calls after May 15?

Does Grok 4.3 support image input?

Wrapping up

Top comments (0)

TL;DR

What changed in Grok 4.3

Prerequisites

Endpoint and authentication

Python example

Request parameters

Minimal curl request

Choosing a reasoning effort

Use low for fast, simple tasks

Use medium for default production traffic

Use high for complex workflows

Function calling

Define a tool

Ask Grok 4.3 to call the tool

Execute and return the tool result

Video input

Using the 1M-token context window

Whole-codebase review

Long-document QA

Agent memory

Migrating from legacy Grok models

1. Reasoning behavior

2. Output formatting

Testing Grok 4.3 in Apidog

Rate limits

FAQ

Is Grok 4.3 OpenAI-compatible end to end?

Does Grok 4.3 support the Responses API?

What is the actual context limit?

How does always-on reasoning affect latency?

Can I use Grok 4.3 with Grok Voice?

What happens to old Grok 3 or Grok 4 calls after May 15?

Does Grok 4.3 support image input?

Wrapping up

Use `low` for fast, simple tasks

Use `medium` for default production traffic

Use `high` for complex workflows