Cristian Tala

Posted on Jun 29

GLM 5.2 isn't free: not even my US$4,000 Spark can run it

#ai #machinelearning #opensource #llm

GLM 5.2 is open source and free to download. Running it is another story. At best it needs around 240 GB of memory, and that's only in its most compressed version. I have a US$4,000 DGX Spark dedicated to this, with 128 GB, and it doesn't have enough to even start. This isn't opinion, it's arithmetic. What bothers me isn't the model. It's the smoke from people shouting "free" who never opened a terminal.

Why does everyone say GLM 5.2 is free?

Because they mix up two different things. One is the license. GLM 5.2 shipped under the MIT license on June 13, 2026: anyone can download the weights, no payment, no permission. That's real and it's great.

The other is the cost of using it. Downloading the model costs nothing. Making it run at a usable speed costs money, and a lot of it. Both facts coexist, but the posts you see only tell you the first one. "Open weights under MIT" doesn't pull as many likes as "it's free and it beats the paid model."

What's the best free AI? The question is wrong

It's the search everyone makes, so I'll answer it straight: the best open source AI today (GLM 5.2, DeepSeek, the big Qwens) isn't free to run for almost anyone. The word "free" assumes you already have somewhere to run it. That's the trap.

The open source models that compete with the paid ones are huge. They don't run on your laptop. They run on a tiny fraction of the computers that exist in the world. The small model that does fit on your machine isn't the one you saw winning the rankings.

What it really costs to run GLM 5.2

GLM 5.2 is a mixture-of-experts model with around 750 billion parameters. What it costs to run depends on two things: how much you compress it (and how much quality you give up) and how fast you want it. Here's the real map:

Version	Memory	Typical hardware	Approx. cost	Speed
FP16 (full, no quality loss)	~1,642 GB	2-3 DGX servers (16-24 GPUs)	US$500,000 to 1M+	like a provider
4-bit (decent, near-full quality)	~411 GB	several datacenter GPUs	~US$150,000	good
2-bit (minimum usable, degraded quality)	~240 GB	Mac Studio 256 GB or 4× RTX 4090 rig	~US$10,000	3-6 tokens/s
Doesn't fit	128 GB	DGX Spark	US$4,000	won't run

Read it from the bottom up. My US$4,000 Spark, bought precisely for this, doesn't even reach the minimum: GLM 5.2 needs 240 GB and it has 128. It doesn't even fit.

The first rung where the model runs is a US$10,000 Mac, compressed to 2-bit (quality drops) and at 3 to 6 tokens per second. At that speed you type faster than the model. To have it decent and fast you're already in six figures. And to run it the way a provider hands it to you (OpenRouter, Nvidia NIM and friends), at full precision, you're at two or three DGX servers and up to a million dollars.

And that's just buying the gear. It doesn't include the electricity bill of keeping it on, or that in a few months a bigger model ships and your investment falls short, or your hours maintaining all of it. The provider spreads that datacenter cost across thousands of users and charges you fractions of a dollar per million tokens. You'd pay it whole, just for you, with the machine off most of the day.

DeepSeek, Qwen, GLM: "free" is marketing

The pattern repeats with every launch. An open source model that fights the paid ones drops, and the next day half the internet announces that top-tier AI is now free. DeepSeek went through this. The big Qwens too.

The part that doesn't make it into the video: to run those models at a usable speed, you need a hardware investment that pays for several years of an API subscription on its own. The open license doesn't save you that cost. It just changes who pays it. Instead of the model provider, you pay it, in gear.

The pattern shows up in dozens of posts from creators talking about models they never ran. The formula is always the same: an epic image, a vault swinging open and the model flowing out toward a desktop computer, and a headline like "the best coding model is no longer rented, it's open source." Sounds incredible. It has two problems. First: that desktop computer with one graphics card doesn't run GLM 5.2, not even close. The image draws something that can't be done. Second: "no longer rented" is exactly backwards. You keep renting it through an API, because you have nowhere to run it. The vault opened, yes, but inside there's something that only fits in a US$12,000 machine. Open isn't the same as accessible. For most people, that "freed" model stays as far away as the paid closed one.

So what do I run locally? And for what

I'm not talking about this from the outside. I use open source models every day, in my operation and my work. On the Spark I run Gemma 4 and Qwen 3.6, smaller models that do fit. They work well. But because of memory bandwidth, the tokens per second aren't enough to use them in a live conversation.

So I give them the work where speed doesn't matter: my agents, processes I leave running overnight, and my own AI model benchmark. For that they're perfect and I pay no API.

And when I need an open source model with real speed (almost all my n8n automations), I run it through an API on Ollama Cloud. Open source, yes. Free, no. That's the point that gets lost: open source doesn't mean you don't pay. It means you choose where you pay, between your own hardware or an API. Same when I use Claude Code connected to open source models: the model is open, someone provides the compute and someone pays for it.

When does running a model locally actually make sense?

When the reason is your data, not your wallet. If you handle sensitive information and you don't want it leaving your machine, running the model at home makes all the sense in the world. Privacy and control are the honest argument for self-hosting. Savings aren't. When someone sells you local because it's cheap, be suspicious. When they sell it to you for privacy, listen. That's the filter.

The final proof is in the OpenRouter ranking

If running GLM 5.2 free at home were practical, nobody would pay to use it. Look at the OpenRouter usage ranking: GLM 5.2 is among the most-used models on the platform at the end of June 2026. And OpenRouter is a paid service, where you call the model through an API and they charge you per token.

In other words: even the people who love GLM 5.2 use it paying. Because that's what makes sense. The model is excellent. "Free" is the made-up part.

Before you share the next "it's free"

Let me be clear: I'm glad open source is this good. That's why I have the Spark, that's why I test every model that ships, that's why my stack is split across several models depending on the task. This isn't against open source. It's against the smoke.

Next time you see "this model is free and it beats the paid one," ask yourself two questions before hitting share: how much does the computer that runs it at a usable speed cost?, and do I need it answering live or does it work for me running overnight? With those two answers you decide for real, with your case and your budget. Not with the excitement of someone who never opened a terminal.

I write about AI and test models at cristiantala.com. Every month I publish a benchmark of 89 models.

DEV Community