So here's what happened: four Chinese LLMs Walk Into My Terminal: An Open Source Story
I'll admit it — I'm the kind of person who reads LICENSE files before I read README files. When Anthropic dropped Claude 3.5 Sonnet, I applauded the benchmarks and then immediately asked "is it open weights?" When OpenAI unveiled GPT-4o with its silky multimodal demos, I felt the same twinge of frustration that comes with every proprietary release. Walled gardens are beautiful on the outside, but I refuse to rent my intelligence.
So when Chinese labs started releasing model after model under Apache 2.0 and MIT licenses, my ears perked up. DeepSeek, Qwen, Kimi, and GLM aren't just alternatives to Western giants — they're freedom-flavored alternatives that you can actually inspect, fine-tune, and self-host if you're brave enough. I've spent the last two months hammering these models through Global API's unified endpoint, and I want to share what I found. Spoiler: the underdogs punch way above their price tag.
Why I Care About This Stack
I don't benchmark models for fun. I run a small side project that does code review, document summarization, and the occasional customer support agent. My monthly invoice from OpenAI used to make me wince. Then I discovered I could route the same OpenAI-compatible calls to Chinese providers and get equivalent (sometimes better) results for a fraction of the cost.
The walled garden I escaped had a price tag. The open meadow I wandered into had the same views and a tenth of the subscription. Let me walk you through what I learned.
The Four Horsemen (of Affordability)
Each of these labs publishes under permissive licenses — most of their model weights carry Apache 2.0, with some under MIT. That's the philosophical foundation I want to start with. When I'm shipping a startup, I want the freedom to:
- Modify the model weights if I really need to
- Self-host for compliance reasons
- Never get rug-pulled by a sudden API deprecation
All four of these providers allow that in spirit, even if you happen to be calling their hosted endpoints through a unified gateway like Global API. Below is the pricing landscape I compiled from my own usage logs.
| Feature | DeepSeek | Qwen | Kimi | GLM |
|---|---|---|---|---|
| Developer | DeepSeek (幻方) | Alibaba (阿里) | Moonshot AI (月之暗面) | Zhipu AI (智谱) |
| Price Range | $0.25–$2.50/M | $0.01–$3.20/M | $3.00–$3.50/M | $0.01–$1.92/M |
| Best Budget Model | V4 Flash @ $0.25/M | Qwen3-8B @ $0.01/M | N/A (all premium) | GLM-4-9B @ $0.01/M |
| Best Overall | V4 Flash @ $0.25/M | Qwen3-32B @ $0.28/M | K2.5 @ $3.00/M | GLM-5 @ $1.92/M |
| Code Generation | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Chinese Language | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| English Language | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Reasoning | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Vision/Multimodal | Limited | ✅ (VL, Omni) | ❌ | ✅ (GLM-4.6V) |
| Context Window | Up to 128K | Up to 128K | Up to 128K | Up to 128K |
| API Compatibility | OpenAI ✅ | OpenAI ✅ | OpenAI ✅ | OpenAI ✅ |
DeepSeek — The People's Champion
I have a soft spot for DeepSeek, and I'm not ashamed of it. The lab publishes research papers, ships open weights, and prices their flagship model like they're trying to bankrupt the competition. V4 Flash, their general-purpose daily driver, costs me $0.25 per million output tokens. Let that sink in. The cheapest GPT-4o class model I could find from a Western vendor costs roughly an order of magnitude more.
Model Menu
| Model | Output $/M | What I Use It For |
|---|---|---|
| V4 Flash | $0.25 | Daily coding, drafts, summaries |
| V3.2 | $0.38 | Latest architecture experiments |
| V4 Pro | $0.78 | Production workloads that need polish |
| R1 (Reasoner) | $2.50 | Math proofs, logic puzzles, hard planning |
| Coder | $0.25 | Pure code completion tasks |
What I Love
V4 Flash hits ~60 tokens per second in my benchmarks, which makes it feel snappier than certain closed-source competitors I won't name. It scores competitively on HumanEval and MBPP — two benchmarks I personally double-check because code generation is the single most important capability for my workflow. English output quality is on par with GPT-4-class models I've used before.
Where It Frustrates Me
DeepSeek doesn't ship native vision yet, and that's a real limitation when I'm building anything that needs to look at images. The Chinese-language performance is good but not best-in-class — GLM and Kimi edge it out on certain linguistic nuances. And the lineup is lean. I sometimes want a model between "fast" and "reasoning" and have to jump pricing tiers.
Here's how I call it through the unified endpoint:
from openai import OpenAI
client = OpenAI(
api_key="ga_xxxxxxxxxxxx",
base_url="https://global-apis.com/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Explain quantum computing in 100 words"}]
)
print(response.choices[0].message.content)
That tiny base_url change is the open source revolution in API form. Same client, same request shape, completely different vendor behind the curtain.
Qwen — The Swiss Army Knife I Keep Coming Back To
Alibaba's Qwen team ships models at a pace that makes my head spin. Every few weeks there's a new variant — and most of them land under Apache 2.0 or similar permissive terms. If DeepSeek is a scalpel, Qwen is a workshop with fifty tools in it.
Model Lineup
| Model | Output $/M | Sweet Spot |
|---|---|---|
| Qwen3-8B | $0.01 | Ultra-cheap classification, routing |
| Qwen3-32B | $0.28 | My default general-purpose model |
| Qwen3-Coder-30B | $0.35 | Code-heavy workloads |
| Qwen3-VL-32B | $0.52 | Image understanding tasks |
| Qwen3-Omni-30B | $0.52 | Audio + video + image in one shot |
| Qwen3.5-397B | $2.34 | Heavy reasoning for enterprise |
That price floor at $0.01 per million tokens for Qwen3-8B is almost comical. I use it for log classification, intent detection, and other jobs where I'm routing thousands of requests and the response is either "yes" or "no". Spending $0.10 to process ten thousand requests makes me giddy.
What I Love
The range. No other Chinese lab covers every budget tier the way Qwen does — I can pick $0.01 for trivial work, $0.28 for general tasks, $2.34 for the heavy lifting. Their VL and Omni models handle images and audio natively, which means I don't need separate vendors for multimodal. Under MIT/Apache terms mostly, which means I can theoretically fine-tune or self-host if my bills ever justify it.
Where It Frustrates Me
The naming is a mess. Qwen3-8B, Qwen3-32B, Qwen3.5-397B, Qwen3-Omni-30B, Qwen3-Coder-30B — I have a sticky note on my monitor with model identifiers because I keep getting them mixed up. Qwen3.6-35B at $1/M is one I personally find overpriced for what it delivers. And while English quality is good, it's not quite the polished output I get from DeepSeek V4 Flash in my side-by-side tests.
Kimi — The Reasoner I Send My Hardest Problems To
Kimi from Moonshot AI plays a different game. There's no $0.01 budget tier here. The cheapest model is $3.00/M, and the ceiling reaches $3.50/M. You're paying for raw reasoning horsepower, not token economics.
My Usage Pattern
I treat Kimi like a specialist consultant. I'll send my gnarliest multi-step planning problems — "design a migration plan for a 50-million-row database that minimizes downtime" — and let it chew through the logic. K2.5 at $3.00/M is my go-to. The reasoning score in that table above (⭐⭐⭐⭐⭐) isn't charity; it's earned.
Open Source Status
Kimi's situation is more nuanced — they publish weights for some models under modified MIT-style agreements but reserve rights around commercial-scale distillation. That bothers me philosophically because it edges toward walled garden territory, but the API endpoints remain OpenAI-compatible and the results are excellent. If you're pure open source, weigh this carefully. If you're pragmatically open, the reasoning quality might be worth the compromise.
Where It Frustrates Me
Speed is the tradeoff. Kimi thinks harder, which means I get longer time-to-first-token on streaming responses. No vision capabilities either — pure text. And the pricing means I can't just casually fire off requests like I do with DeepSeek.
GLM — The Quiet Polyglot
Zhipu AI's GLM family feels like the lab underdog nobody talks about at dinner parties but everybody secretly respects. They've been around since the early Chinese LLM era and they keep iterating.
Model Selection
| Model | Output $/M | What I Reach For It |
|---|---|---|
| GLM-4-9B | $0.01 | Tiny classification jobs |
| GLM-5 | $1.92 | Chinese-heavy production work |
| GLM-4.6V | varies | Image understanding, multimodal |
GLM-4-9B at $0.01 is genuinely competitive with Qwen3-8B for similar tasks. But where GLM pulls ahead is Chinese-language quality — they're tied with Kimi for the top spot, and in my anecdotal tests with native Chinese speakers reviewing outputs, GLM-5 produces noticeably more natural phrasing for mainland reader audiences.
Open Source Status
GLM weights are available, mostly under terms aligned with Apache 2.0 depending on the variant. The lab has done open releases, even if they fly under the radar compared to the Qwen hype train. The multimodal GLM-4.6V is a capability I appreciate since vision is a non-negotiable for some of my projects.
Where It Frustrates Me
Code generation scores are slightly below the other three in my testing — still good, but if I'm doing heavy software engineering work I gravitate to Deep
Top comments (0)