GLM-5.2 is Z.ai’s open-weights coding model. You can use it in three coding harnesses many developers already have in their workflow: Claude Code, Cline, and Cursor. The setup differs by harness: Claude Code uses an Anthropic-compatible endpoint, while Cline and Cursor use an OpenAI-compatible endpoint. This guide shows the exact configuration for all three using the GLM Coding Plan.
If you want the model background first, start with the GLM-5.2 overview and the GLM-5.2 API reference. This article focuses on wiring GLM-5.2 into your coding tools.
What you need before you start
GLM-5.2 is a Mixture-of-Experts model around 753B parameters, served with a 1M token context window: 1,048,576 tokens. It is designed for coding, reasoning, and agentic tool use. According to Z.ai’s published results, it scores 81.0 on Terminal-Bench 2.1, up from GLM-5.1’s 62.0. VentureBeat described it as beating GPT-5.5 on long-horizon coding benchmarks for roughly one-sixth the cost.
Before configuring a harness, make sure you have:
- A Z.ai account and API key.
- For Claude Code and agentic coding tools, a GLM Coding Plan key rather than only a raw pay-as-you-go key, because the coding endpoint is scoped for those keys.
- One installed harness:
- Claude Code
- Cline, the VS Code extension
- Cursor
- The correct model id:
-
glm-5.2for Cline and Cursor -
glm-5.2[1m]for Claude Code
-
Cost note: the standard API is listed at $1.40 per 1M input tokens and $4.40 per 1M output tokens, confirmed by OpenRouter. Cached input is around $0.26 per 1M, attributed to VentureBeat. The GLM Coding Plan is a separate subscription with Lite, Pro, Max, and Team tiers. Public prices have changed over time, so verify current pricing at z.ai before committing.
Set up GLM-5.2 in Claude Code
Claude Code talks to an Anthropic-compatible API. Z.ai exposes a coding endpoint for this workflow. Configure Claude Code with environment variables, then launch it normally.
Add this block to your shell profile, such as ~/.zshrc or ~/.bashrc, or export it inline before launching Claude Code:
export ANTHROPIC_BASE_URL="https://api.z.ai/api/coding/paas/v4"
export ANTHROPIC_API_KEY="your-glm-coding-plan-key"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]"
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
export API_TIMEOUT_MS=3000000
Then start Claude Code:
claude
Why these Claude Code settings matter
ANTHROPIC_BASE_URL
Use:
https://api.z.ai/api/coding/paas/v4
This is the Anthropic-compatible coding endpoint. Some older guides mention:
https://open.z.ai/api/paas/v4
If you see 404 or authentication errors, verify the current endpoint in the Z.ai GLM-5.2 docs.
glm-5.2[1m]
Claude Code uses the [1m] suffix to select the 1M-context variant through the coding endpoint.
Set both model variables to the same value:
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]"
This ensures Claude Code routes either model tier to GLM-5.2.
CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
Claude Code auto-compacts conversations when they approach the context limit. The default assumes a smaller model context. Setting it to 1000000 lets Claude Code use GLM-5.2’s long context before summarizing.
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
API_TIMEOUT_MS=3000000
For large-context coding tasks, increase the timeout:
export API_TIMEOUT_MS=3000000
That is 3,000 seconds, or 50 minutes. Long-horizon tasks with Max thinking effort can take a while before the first token returns. Without a higher timeout, Claude Code may abort the request and surface a connection error.
Thinking effort
GLM-5.2 supports two thinking levels: High and Max. Z.ai recommends Max for coding. The coding endpoint applies a default, but if your harness lets you pass reasoning_effort, use:
{
"reasoning_effort": "max"
}
For faster, cheaper completions, thinking can also be disabled where supported.
If you used an earlier GLM model, the migration pattern is the same as in GLM-5.1 in Claude Code and GLM-4.5 with Claude Code: update the model id and base URL, keep the environment-variable structure.
Set up GLM-5.2 in Cline
Cline is a VS Code extension that runs an autonomous coding agent inside your editor. Unlike Claude Code, Cline uses an OpenAI-compatible endpoint.
Configure it like this:
- Install the Cline extension from the VS Code marketplace.
- Open Cline settings from the gear icon in the Cline panel.
- For API Provider, select OpenAI Compatible.
- Set Base URL to:
https://api.z.ai/api/paas/v4/
Use the general API base, not the Claude Code coding path.
- Paste your Z.ai API key into API Key.
- Set Model ID to:
glm-5.2
Do not use the [1m] suffix in Cline.
- Find the context window setting and set it to:
1000000
That completes the GLM-5.2 Cline setup.
Cline can make many tool calls in a single task. If the context window is too small, it may drop earlier planning steps, diffs, or test output. Setting the window to one million tokens lets Cline keep more of the agent run in scope.
Set up GLM-5.2 in Cursor
Cursor is a standalone AI-first editor. It also supports OpenAI-compatible APIs, so the setup is close to Cline’s.
Configure Cursor like this:
- Open Cursor settings.
- Go to Models.
- Scroll to the OpenAI API key section.
- Enable the custom base URL option. It may also be labeled Override OpenAI Base URL.
- Set the base URL to:
https://api.z.ai/api/paas/v4/
- Enter your Z.ai API key.
- Add a custom model with this id:
glm-5.2
- Make sure
glm-5.2is the active model. - Use Cursor’s built-in API key test to verify the connection.
- Send a prompt to confirm chat or inline edits work.
Once the connection verifies, GLM-5.2 can power Cursor chat and inline edits.
If you have compared Cursor with earlier GLM versions, the trade-offs from Claude Code vs Cursor with GLM-4.7 still apply: Cursor is smoother for inline editing, while Claude Code and Cline are stronger fits for autonomous, multi-step agent runs.
Side-by-side configuration
Use this table to copy the right values for each harness.
| Setting | Claude Code | Cline | Cursor |
|---|---|---|---|
| API format | Anthropic-compatible | OpenAI-compatible | OpenAI-compatible |
| Base URL |
https://api.z.ai/api/coding/paas/v4 verify live |
https://api.z.ai/api/paas/v4/ |
https://api.z.ai/api/paas/v4/ |
| Model id | glm-5.2[1m] |
glm-5.2 |
glm-5.2 |
| Key type | GLM Coding Plan key | API key | API key |
| Context window | CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000 |
Set to 1000000
|
Model default |
| Timeout | API_TIMEOUT_MS=3000000 |
n/a | n/a |
| Thinking effort | Max recommended for coding | Provider default | Provider default |
The two most common setup mistakes are:
- Using the wrong base URL for the harness type.
- Forgetting
glm-5.2[1m]andAPI_TIMEOUT_MS=3000000in Claude Code.
Test your setup with a raw API call
Before debugging a harness, confirm the key and model work with a direct request. This call uses the general OpenAI-compatible API and isolates credentials from editor configuration.
curl https://api.z.ai/api/paas/v4/chat/completions \
-H "Authorization: Bearer $ZAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5.2",
"messages": [
{"role": "user", "content": "Write a Python function that reverses a linked list."}
],
"thinking": {"type": "enabled"},
"reasoning_effort": "max",
"stream": false
}'
If the request returns a completion, your key and model id are valid. Any remaining issue is likely in the harness configuration.
You can also save this request in an API client. If you are testing GLM-5.2 alongside your own backend APIs, Apidog lets you store the request, manage headers such as ANTHROPIC_API_KEY or Authorization as environment variables, and replay the request without retyping the curl. You can download Apidog and import the curl directly.
Which harness should you use?
There is no single best option. Pick based on how you code.
-
Claude Code: best for terminal-native, long-horizon agent runs. It is the only one of these three harnesses that uses the full 1M context through
glm-5.2[1m]. Use it for large refactors and repo-wide changes. - Cline: best if you want an agent inside VS Code with visibility into each tool call. It is a practical middle ground.
- Cursor: best for fast inline edits, chat, and autocomplete-style workflows with minimal setup.
For a deeper plan comparison, see Claude Code vs Codex vs Cursor vs MiniMax vs GLM Plan.
For model comparisons, see:
FAQ
Why do I use glm-5.2[1m] in Claude Code but glm-5.2 in Cline and Cursor?
The [1m] suffix is a Claude Code convention for selecting the 1M-context variant through the coding endpoint.
Cline and Cursor use the plain model id:
glm-5.2
They send it to the OpenAI-compatible endpoint, where context behavior is configured in the harness UI instead of the model id.
What if Claude Code times out on long tasks?
Increase the timeout:
export API_TIMEOUT_MS=3000000
Long-context, Max-effort requests can take longer than the default timeout. Without this setting, Claude Code may abort before the model returns a response.
Do I need the GLM Coding Plan, or can I use pay-as-you-go?
Both can work, but the GLM Coding Plan key is what the Claude Code coding endpoint expects. For heavy daily coding, the plan’s Lite, Pro, Max, and Team tiers may be more practical than per-token billing.
Verify current tier pricing at z.ai, since published prices have shifted.
Which base URL is correct for Claude Code?
Use:
https://api.z.ai/api/coding/paas/v4
Some sources list:
https://open.z.ai/api/paas/v4
If one fails with authentication or 404 errors, try the other and check the live Z.ai docs.
Do not use the general API base for Claude Code. This base is for Cline and Cursor:
https://api.z.ai/api/paas/v4/
Can GLM-5.2 handle images?
No confirmed vision variant exists for GLM-5.2. It is a text-in, text-out coding and reasoning model. Do not expect image input unless Z.ai ships a vision variant.
Closing
GLM-5.2 works across Claude Code, Cline, and Cursor, but you need the right endpoint format for each tool.
Use this rule of thumb:
- Claude Code: Anthropic-compatible coding endpoint,
glm-5.2[1m], 1M compact window, long timeout. - Cline: OpenAI-compatible endpoint,
glm-5.2, context window set to1000000. - Cursor: OpenAI-compatible endpoint,
glm-5.2, custom base URL enabled.
If you want to run GLM-5.2 outside these harnesses, see how to use GLM-5.2 for free and the GLM-5.2 pricing breakdown. For local use, get the weights from Hugging Face or pull the model with Ollama.



Top comments (0)