xAI rolled out Grok 4.3 in stages: beta on April 17, 2026, API access on April 30, and full general availability on May 6. The release adds a 1,000,000-token context window, native video input, always-on reasoning, and roughly 40% lower pricing versus Grok 4.20. Eight legacy Grok models retire on May 15, so teams still using grok-3 or grok-4 models should migrate now.
This guide shows how to call Grok 4.3 from code: endpoint format, authentication, OpenAI-compatible SDK setup, reasoning_effort, video input, function calling, and a repeatable test workflow in Apidog.
For the voice side of the same release, see How to use Grok Voice for free. For the head-to-head against OpenAI’s flagship voice model, see Grok Voice vs GPT-Realtime.
TL;DR
- Grok 4.3 went GA on May 6, 2026.
- Eight legacy models retire on May 15, 2026.
- Pricing:
- $1.25 per 1M input tokens
- $2.50 per 1M output tokens
- $0.20 per 1M cached input tokens
- Context window: 1,000,000 tokens.
- New input type: native video input.
- Reasoning is always on.
-
reasoning_effortsupportslow,medium, andhigh. - Default reasoning effort is
medium. - Endpoint:
https://api.x.ai/v1/chat/completions. - The API is OpenAI-compatible for Chat Completions.
- Standard-tier throughput is around 159 tokens/second.
- Intelligence Index: 53, according to Artificial Analysis.
- Use Apidog to save request variants, compare reasoning settings, and replay the same test across providers.
What changed in Grok 4.3
For most developer teams, the important changes are practical:
- Lower token cost
Input pricing is down 37.5% versus Grok 4.20. Output pricing is down 58.3%. Cached input is now $0.20 per 1M tokens, which matters if you reuse long system prompts or large static context.
- 1M-token context window
Grok 4.3 increases the context window from 256k to 1M tokens. That makes it usable for large prompts such as codebases, transcripts, long contracts, and multi-document workflows.
- Native video input
Grok 4.3 is the first Grok model with native video input. You can pass a video URL in the message content and ask the model to reason over the clip.
- Always-on reasoning
Every request includes reasoning. The reasoning_effort parameter controls depth, but the model does not run below low.
- Better agent workflows
xAI reports a +300 Elo gain on GDPval-AA versus Grok 4.20. In practice, this matters most for tool selection, multi-step workflows, and function-calling agents.
Artificial Analysis gives Grok 4.3 an Intelligence Index of 53, above the average of 35 for its price tier, and ranks it tenth out of 146 tracked models.
Prerequisites
Before sending your first request, prepare:
- An xAI Console account at
console.x.ai - A billable tier with an API key
- A project-scoped API key for production use
- The OpenAI SDK or the xAI SDK
- An API client for saving and replaying requests
Export your API key:
export XAI_API_KEY="xai-..."
If you are testing locally, use an environment file or shell variable. For production, store the key in your secret manager.
Endpoint and authentication
Grok 4.3 uses the OpenAI-compatible Chat Completions API with xAI’s base URL.
POST https://api.x.ai/v1/chat/completions
Required headers:
Authorization: Bearer $XAI_API_KEY
Content-Type: application/json
Because the API is OpenAI-compatible, most existing OpenAI SDK code only needs two changes:
- Change the API key.
- Change the
base_url.
Python example
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XAI_API_KEY"],
base_url="https://api.x.ai/v1",
)
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Summarize the trade-offs of GraphQL vs REST in three bullets.",
}
],
reasoning_effort="medium",
)
print(response.choices[0].message.content)
If you use the xAI SDK instead, the request shape is similar. The main difference is the client import and initialization.
Request parameters
Use these parameters for most Grok 4.3 Chat Completions requests:
| Parameter | Type | Values | Notes |
|---|---|---|---|
model |
string | grok-4.3 |
Required. |
messages |
array | OpenAI message shape | Required. Supports system, user, and assistant roles. |
reasoning_effort |
string |
low, medium, high
|
Optional. Default: medium. Higher values can increase latency and output tokens. |
max_tokens |
int | 1–32768 |
Caps output length. |
temperature |
float | 0.0–2.0 |
Default: 1.0. |
top_p |
float | 0.0–1.0 |
Nucleus sampling. |
stream |
bool |
true, false
|
Enables server-sent events when true. |
tools |
array | OpenAI tool shape | Used for function calling. |
tool_choice |
string / object |
auto, none, or specific tool |
Uses standard OpenAI semantics. |
response_format |
object | { "type": "json_object" } |
Enables structured JSON output. |
seed |
int | any integer | Useful for reproducibility with temperature: 0. |
Minimal curl request
curl https://api.x.ai/v1/chat/completions \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.3",
"messages": [
{
"role": "system",
"content": "You are a senior backend engineer."
},
{
"role": "user",
"content": "Review this query plan and flag the bottleneck."
}
],
"reasoning_effort": "high"
}'
The response uses the standard OpenAI-style shape:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "..."
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 456,
"reasoning_tokens": 78,
"total_tokens": 657
}
}
Read the final text from:
response.choices[0].message.content
Choosing a reasoning effort
Grok 4.3 supports three reasoning levels.
Use low for fast, simple tasks
Good fits:
- Classification
- Summarization
- Rule extraction
- Simple Q&A
- Lightweight routing
Example:
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Classify this ticket as billing, bug, feature request, or account access: ...",
}
],
reasoning_effort="low",
)
Use medium for default production traffic
Good fits:
- Customer support
- Single-step tool use
- Data analysis
- Normal code explanations
- Function calling
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Analyze this API error log and suggest the most likely root cause.",
}
],
reasoning_effort="medium",
)
Use high for complex workflows
Good fits:
- Multi-step agents
- Long code review
- Complex math
- Planning-heavy tasks
- Debugging with many constraints
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Review this migration plan, identify risks, and produce a safer rollout sequence.",
}
],
reasoning_effort="high",
)
Reasoning is always enabled. Setting reasoning_effort to low reduces depth, but it does not disable reasoning.
Function calling
Grok 4.3 supports the standard OpenAI function-calling shape.
The flow is:
- Define tools.
- Send the user message and tool schema.
- Read
tool_callsfrom the assistant message. - Execute the tool in your application.
- Send the tool result back with role
tool. - Ask the model to produce the final answer.
Define a tool
tools = [
{
"type": "function",
"function": {
"name": "lookup_user",
"description": "Look up a user by ID.",
"parameters": {
"type": "object",
"properties": {
"user_id": {
"type": "string"
}
},
"required": ["user_id"],
},
},
}
]
Ask Grok 4.3 to call the tool
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Find user u_42 and tell me their last login.",
}
],
tools=tools,
reasoning_effort="medium",
)
message = response.choices[0].message
tool_calls = message.tool_calls
print(tool_calls)
Execute and return the tool result
messages = [
{
"role": "user",
"content": "Find user u_42 and tell me their last login.",
},
message,
]
for tool_call in tool_calls:
if tool_call.function.name == "lookup_user":
# Replace this with your real database/API call.
result = {
"user_id": "u_42",
"last_login": "2026-05-06T14:22:00Z",
}
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
}
)
final_response = client.chat.completions.create(
model="grok-4.3",
messages=messages,
reasoning_effort="medium",
)
print(final_response.choices[0].message.content)
The GDPval-AA gain is especially relevant here: Grok 4.3 should be better at choosing tools, avoiding redundant calls, and recovering from tool errors.
If you are testing tool workflows, MCP server testing in Apidog covers a replay-based setup.
Video input
Grok 4.3 is the first Grok model with native video input. Pass a video URL inside the message content array.
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe what happens in this clip and flag any anomalies.",
},
{
"type": "video_url",
"video_url": {
"url": "https://example.com/clip.mp4"
},
},
],
}
],
)
Video tokens count against input usage. If cost or latency matters:
- Trim the clip before sending.
- Downsample when full resolution is unnecessary.
- Avoid sending repeated static footage.
- Cache surrounding text context when possible.
The model reasons over frames natively, so you do not need to manually extract keyframes first.
Using the 1M-token context window
The 1M-token context window is useful when retrieval or chunking would remove important context.
Common patterns:
Whole-codebase review
Send:
- The diff
- Touched files
- Related interfaces
- Test output
- Lint output
- Migration notes
Prompt example:
Review this change as a senior backend engineer.
Focus on:
1. Data loss risks
2. Transaction boundaries
3. Backward compatibility
4. Test gaps
5. Rollback strategy
Context:
...
Long-document QA
Use it for:
- Legal contracts
- Earnings calls
- Compliance policies
- Technical specifications
- Incident timelines
Prompt example:
Answer only from the provided document.
Question:
Which clauses describe termination rights, and what notice period applies to each party?
Agent memory
For agent workflows, you can keep long conversation history in context instead of summarizing aggressively. This is useful when prior details affect personalization or task continuity.
Cached input pricing makes stable long context cheaper. For example, a 400k-token stable system prompt costs $0.08 per cached call at $0.20 per 1M cached tokens, instead of $0.50 at the fresh input rate.
Migrating from legacy Grok models
Eight legacy Grok models retire on May 15, 2026, 12:00 PM PT.
For most apps, migration is:
- model="grok-4.20"
+ model="grok-4.3"
or:
- model="grok-3"
+ model="grok-4.3"
Because the request shape is compatible, most Chat Completions calls should continue working.
Watch for two differences.
1. Reasoning behavior
Some legacy models did not accept reasoning_effort. Grok 4.3 always reasons.
If your previous workflow depended on a very fast non-reasoning path, start with:
{
"reasoning_effort": "low"
}
Then measure latency and quality before moving to medium or high.
2. Output formatting
Grok 4.3 tends to produce more structured output than Grok 4.20. If your application uses regex-based parsing, retest before switching production traffic.
For broader model pricing context, see GPT-5.5 pricing. For reasoning-model usage patterns, see How to use the GPT-5.5 API.
Testing Grok 4.3 in Apidog
Use Apidog to create repeatable API tests before migrating production traffic.
Recommended setup:
- Create an Apidog environment.
- Add these variables:
XAI_API_KEY = xai-...
BASE_URL = https://api.x.ai/v1
MODEL = grok-4.3
REASONING_EFFORT = medium
- Create a
POSTrequest:
{{BASE_URL}}/chat/completions
- Add headers:
Authorization: Bearer {{XAI_API_KEY}}
Content-Type: application/json
- Add the request body:
{
"model": "{{MODEL}}",
"messages": [
{
"role": "system",
"content": "You are a senior backend engineer."
},
{
"role": "user",
"content": "Review this API design and identify the top three implementation risks."
}
],
"reasoning_effort": "{{REASONING_EFFORT}}"
}
-
Duplicate the request three times:
Grok 4.3 - lowGrok 4.3 - mediumGrok 4.3 - high
Change only
REASONING_EFFORT.
Compare:
- Response quality
- Latency
usage.prompt_tokensusage.completion_tokensusage.reasoning_tokens- Total cost
To compare with another provider, duplicate the environment and change BASE_URL, MODEL, and the API key. Keep the same prompt and request body.
Download Apidog to run the comparison. For broader API testing strategy, see API testing tool for QA engineers.
Rate limits
xAI Console tier limits range from a few thousand requests per minute on Tier 1 to multi-hundred-thousand request limits on enterprise tiers. Exact numbers can change, so check your console dashboard.
The advertised 159 tokens/second throughput is per-stream output speed, not total account throughput. Concurrent requests scale within your tier limits.
If you exceed your limit, the API returns HTTP 429 with a retry-after header.
Basic retry pattern:
import time
from openai import RateLimitError
for attempt in range(5):
try:
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{
"role": "user",
"content": "Summarize this incident report.",
}
],
reasoning_effort="medium",
)
break
except RateLimitError as error:
wait_seconds = min(2 ** attempt, 30)
time.sleep(wait_seconds)
else:
raise RuntimeError("Request failed after retries")
In production, also add jitter and respect the retry-after header when present.
FAQ
Is Grok 4.3 OpenAI-compatible end to end?
For Chat Completions, yes. You can use the OpenAI SDK, change base_url, change model, and keep the same request shape. Function calling, structured output, and streaming use the same semantics.
Does Grok 4.3 support the Responses API?
The xAI surface is Chat Completions today. The Responses API is OpenAI-only.
What is the actual context limit?
The context limit is 1,000,000 tokens. Long inputs still cost money, so use cached input when your prompt is stable.
How does always-on reasoning affect latency?
First-token latency is higher than non-reasoning models, but Grok 4.3 streams output at around 159 tokens/second. Use low for simple paths and reserve high for planning-heavy work.
Can I use Grok 4.3 with Grok Voice?
Yes. The voice agent, grok-voice-think-fast-1.0, calls Grok 4.3 under the hood when it reasons. You can also call Grok 4.3 directly from a custom voice loop built with TTS and STT components.
What happens to old Grok 3 or Grok 4 calls after May 15?
They fail with HTTP 410 because the model is retired. Migrate before the cutoff.
Does Grok 4.3 support image input?
Yes. It supports image input alongside video input. Pass an image URL in a content block using the OpenAI-style message format.
Wrapping up
Grok 4.3 is a practical migration target if you need lower token costs, larger context, always-on reasoning, native video input, and OpenAI-compatible Chat Completions. For existing OpenAI SDK users, the migration is mostly a base URL and model-name change.
The fastest validation path is to create three request variants in Apidog, test low, medium, and high reasoning on your real prompts, then compare latency, quality, and token usage before moving production traffic.


Top comments (0)