GLM 5.2 as your Cursor and Cline backend in 2026: MIT-licensed open-weight coding model, the config that works, and the honest cost math

#glm #cursor #cline #continuedev

This article was originally published on aicoderscope.com

TL;DR: GLM 5.2 is Z.ai's 743B-parameter MoE coding model, released mid-June 2026 under an MIT license with open weights. At roughly $1.40/M input via the Z.ai API it slots into Cursor, Cline, and Continue.dev through a standard OpenAI-compatible endpoint in about ten minutes. It tops the open-weight field on long-horizon agentic coding — but the Cursor base-URL override has a sharp edge you need to know before you flip it on.

	GLM 5.2 (API)	DeepSeek V4-Flash	Claude Fable 5
Best for	Long-horizon agentic Cline/Cursor work	Cheapest agent backend	Top-tier reasoning, polish
Price (input / output per M)	~$1.40 / ~$4.40	$0.14 / $0.435	$10 / $50
License	MIT (self-host free)	MIT (self-host free)	Proprietary (API only)
Context window	1M	1M	200K
Params	743B total / 39B active (MoE)	MoE (cloud)	proprietary
The catch	Cursor BYOK override breaks built-in models	Thinking mode breaks Cline if left on	7× the price of GLM for daily loops

Honest take: If you run agentic loops in Cline or Cursor all day and the Claude bill is starting to sting, GLM 5.2 is the open-weight model to switch to right now — it's the strongest agent backend you can legally self-host, and the API is a fraction of Claude's price. Use Cline (not Cursor) for the cleanest setup, because Cursor's single global base-URL override is still a trap. Self-hosting only pays off above very heavy team usage — the break-even math is at the bottom.

What landed in June 2026

Z.ai (the company formerly known as Zhipu AI) shipped GLM 5.2 in mid-June 2026, with the open weights going live on Hugging Face around June 16. The model is a 743-billion-parameter mixture-of-experts transformer with roughly 39 billion active parameters per token, routed across 256 experts. The headline spec is the context window: a native 1,048,576-token (1M) window with a 131,072-token max output, both substantial jumps over GLM 5.1.

The two things that make it worth a fresh look are the license and the position. The weights are MIT-licensed — about as permissive as it gets, which means you can self-host inside a commercial product, run it air-gapped, or fine-tune it without a lawyer in the loop. And on the benchmarks that coverage has reported, it's the top open-weight model for long-horizon coding.

On SWE-bench Pro — the harder variant that tests whether a model can resolve real-world repository issues — GLM 5.2 scored 62.1, ahead of GPT-5.5 at 58.6 and its own predecessor GLM 5.1 at 58.4. On Terminal-Bench 2.1 (autonomous terminal-based coding) it reported 81.0, within four points of Claude Opus 4.8's 85.0. On the classic SWE-bench Verified it lands around 77.8, trailing the proprietary frontier (Claude Opus and GPT-5.x sit in the low 80s) but leading every other open-weight model.

One honest caveat: Z.ai launched without a full official benchmark table, so several of these figures come from independent coverage and the model card rather than a single first-party page. Treat the agentic numbers as "best open-weight, near-frontier," not as gospel to the decimal. The thing you actually feel day to day — that it holds engineering context across a 30-step Cline run without losing the thread — is real, and it's the reason to care.

Three ways to run it

You have three paths, and they map to different goals and budgets:

Z.ai API (api.z.ai) — fastest, zero hardware, OpenAI-compatible. Pay-as-you-go at roughly $1.40/M input and $4.40/M output, with cached input billed around $0.26/M. This is the default for almost everyone.
GLM Coding Plan — a flat subscription that routes GLM 5.2 to coding tools through a dedicated endpoint. Promotional tiers run around $10/month (Lite), $30/month (Pro), and $80/month (Max); the published list prices are higher and discounted by billing cycle, so verify the number on the checkout page the day you buy.
Self-hosted via vLLM or SGLang — the MIT payoff. At FP8 the weights are roughly 744 GB, which is an 8×H200 (or larger) node, not a workstation. This only makes sense at real scale or under a hard data-residency requirement.

A quick smoke test against the API confirms your key and the model name before you touch any editor config:

$ curl -s https://api.z.ai/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $ZAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.2",
    "messages": [{"role":"user","content":"Write a Python function that returns the nth Fibonacci number iteratively."}]
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['choices'][0]['message']['content'])"

def fib(n: int) -> int:
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

Verified against the Z.ai API on June 20, 2026. Two endpoints matter and they are easy to confuse: the general endpoint is https://api.z.ai/api/paas/v4, and if you bought the GLM Coding Plan you must use the coding endpoint https://api.z.ai/api/coding/paas/v4 instead. Point a Coding Plan key at the general URL (or vice versa) and you'll get 401s that look like a bad key.

Cline: the cleanest setup

Cline treats GLM 5.2 as a plain OpenAI-compatible provider, and because each provider config is self-contained, there's no collision with your other models. This is the path I'd recommend first.

Open the Cline settings (the gear icon in the Cline panel), then:

API Provider → OpenAI Compatible
Base URL → https://api.z.ai/api/paas/v4 (or the /coding/paas/v4 URL if you're on the Coding Plan)
API Key → your Z.ai key
Model ID → glm-5.2

That's it. Set the context window in Cline's model settings to match what you actually want to pay for — the 1M window is available, but every agentic step re-sends the running context, so a sprawling window on a long Plan/Act loop is how you turn a $5 task into a $40 one. For most repo work, capping the model context at 128K–256K is the sane default; reserve the full 1M for genuine whole-repo reasoning.

The real-world gotcha here isn't GLM-specific but it bites Z.ai users hard: some agent frontends append a spurious /v1 to whatever base URL you give them. If you paste https://api.z.ai/api/paas/v4 and the tool silently calls https://api.z.ai/api/paas/v4/v1/chat/completions, you get a 404 that the UI may report as a model-switch error rather than a bad URL. If your requests 404 immediately, check the resolved URL in the tool's logs before you blame the key.

Cursor: it works, but mind the override

Cursor can use GLM 5.2 through its custom-model path, and the steps are straightforward:

Settings → Models → Add Custom Model, choose the OpenAI protocol.
Enter the model name. Cursor has historically wanted the model name in uppercase in this field (people hit this with GLM-4.7), so if glm-5.2 is rejected, try GLM-5.2.
Toggle on OpenAI API Key and paste your Z.ai key.
Toggle on Override OpenAI Base URL and set it to https://api.z.ai/api/coding/paas/v4 (Coding Plan) or https://api.z.ai/api/paas/v4 (pay-as-you-go).

Here's the sharp edge, and it's a documented one: overriding the OpenAI base URL is global in Cursor, not per-model. The moment you point that override at Z.ai, your custom GLM model works — but Cursor's own first-party models (the ones it proxies through its servers) stop working, because Cursor now routes everything through your override URL. You're effectively in BYOK-only mode.

Practically, this means Cursor is an all-or-nothing switch for GLM today: great if you want GLM as your single backend, frustrating if you wanted to keep Cursor's bundled Claude/GPT access alongside it. There's an open feature request for per-model base URLs, but until it ships, the workaround most people use is OpenRouter — point the override at OpenRouter's base URL