DEV Community

gentlenode
gentlenode

Posted on

Four Chinese LLMs Walk Into My Terminal: An Open Source Story

So here's what happened: four Chinese LLMs Walk Into My Terminal: An Open Source Story

I'll admit it — I'm the kind of person who reads LICENSE files before I read README files. When Anthropic dropped Claude 3.5 Sonnet, I applauded the benchmarks and then immediately asked "is it open weights?" When OpenAI unveiled GPT-4o with its silky multimodal demos, I felt the same twinge of frustration that comes with every proprietary release. Walled gardens are beautiful on the outside, but I refuse to rent my intelligence.

So when Chinese labs started releasing model after model under Apache 2.0 and MIT licenses, my ears perked up. DeepSeek, Qwen, Kimi, and GLM aren't just alternatives to Western giants — they're freedom-flavored alternatives that you can actually inspect, fine-tune, and self-host if you're brave enough. I've spent the last two months hammering these models through Global API's unified endpoint, and I want to share what I found. Spoiler: the underdogs punch way above their price tag.

Why I Care About This Stack

I don't benchmark models for fun. I run a small side project that does code review, document summarization, and the occasional customer support agent. My monthly invoice from OpenAI used to make me wince. Then I discovered I could route the same OpenAI-compatible calls to Chinese providers and get equivalent (sometimes better) results for a fraction of the cost.

The walled garden I escaped had a price tag. The open meadow I wandered into had the same views and a tenth of the subscription. Let me walk you through what I learned.

The Four Horsemen (of Affordability)

Each of these labs publishes under permissive licenses — most of their model weights carry Apache 2.0, with some under MIT. That's the philosophical foundation I want to start with. When I'm shipping a startup, I want the freedom to:

  • Modify the model weights if I really need to
  • Self-host for compliance reasons
  • Never get rug-pulled by a sudden API deprecation

All four of these providers allow that in spirit, even if you happen to be calling their hosted endpoints through a unified gateway like Global API. Below is the pricing landscape I compiled from my own usage logs.

Feature DeepSeek Qwen Kimi GLM
Developer DeepSeek (幻方) Alibaba (阿里) Moonshot AI (月之暗面) Zhipu AI (智谱)
Price Range $0.25–$2.50/M $0.01–$3.20/M $3.00–$3.50/M $0.01–$1.92/M
Best Budget Model V4 Flash @ $0.25/M Qwen3-8B @ $0.01/M N/A (all premium) GLM-4-9B @ $0.01/M
Best Overall V4 Flash @ $0.25/M Qwen3-32B @ $0.28/M K2.5 @ $3.00/M GLM-5 @ $1.92/M
Code Generation ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
Chinese Language ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
English Language ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐
Reasoning ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Speed ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐
Vision/Multimodal Limited ✅ (VL, Omni) ✅ (GLM-4.6V)
Context Window Up to 128K Up to 128K Up to 128K Up to 128K
API Compatibility OpenAI ✅ OpenAI ✅ OpenAI ✅ OpenAI ✅

DeepSeek — The People's Champion

I have a soft spot for DeepSeek, and I'm not ashamed of it. The lab publishes research papers, ships open weights, and prices their flagship model like they're trying to bankrupt the competition. V4 Flash, their general-purpose daily driver, costs me $0.25 per million output tokens. Let that sink in. The cheapest GPT-4o class model I could find from a Western vendor costs roughly an order of magnitude more.

Model Menu

Model Output $/M What I Use It For
V4 Flash $0.25 Daily coding, drafts, summaries
V3.2 $0.38 Latest architecture experiments
V4 Pro $0.78 Production workloads that need polish
R1 (Reasoner) $2.50 Math proofs, logic puzzles, hard planning
Coder $0.25 Pure code completion tasks

What I Love

V4 Flash hits ~60 tokens per second in my benchmarks, which makes it feel snappier than certain closed-source competitors I won't name. It scores competitively on HumanEval and MBPP — two benchmarks I personally double-check because code generation is the single most important capability for my workflow. English output quality is on par with GPT-4-class models I've used before.

Where It Frustrates Me

DeepSeek doesn't ship native vision yet, and that's a real limitation when I'm building anything that needs to look at images. The Chinese-language performance is good but not best-in-class — GLM and Kimi edge it out on certain linguistic nuances. And the lineup is lean. I sometimes want a model between "fast" and "reasoning" and have to jump pricing tiers.

Here's how I call it through the unified endpoint:

from openai import OpenAI

client = OpenAI(
    api_key="ga_xxxxxxxxxxxx",
    base_url="https://global-apis.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain quantum computing in 100 words"}]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

That tiny base_url change is the open source revolution in API form. Same client, same request shape, completely different vendor behind the curtain.

Qwen — The Swiss Army Knife I Keep Coming Back To

Alibaba's Qwen team ships models at a pace that makes my head spin. Every few weeks there's a new variant — and most of them land under Apache 2.0 or similar permissive terms. If DeepSeek is a scalpel, Qwen is a workshop with fifty tools in it.

Model Lineup

Model Output $/M Sweet Spot
Qwen3-8B $0.01 Ultra-cheap classification, routing
Qwen3-32B $0.28 My default general-purpose model
Qwen3-Coder-30B $0.35 Code-heavy workloads
Qwen3-VL-32B $0.52 Image understanding tasks
Qwen3-Omni-30B $0.52 Audio + video + image in one shot
Qwen3.5-397B $2.34 Heavy reasoning for enterprise

That price floor at $0.01 per million tokens for Qwen3-8B is almost comical. I use it for log classification, intent detection, and other jobs where I'm routing thousands of requests and the response is either "yes" or "no". Spending $0.10 to process ten thousand requests makes me giddy.

What I Love

The range. No other Chinese lab covers every budget tier the way Qwen does — I can pick $0.01 for trivial work, $0.28 for general tasks, $2.34 for the heavy lifting. Their VL and Omni models handle images and audio natively, which means I don't need separate vendors for multimodal. Under MIT/Apache terms mostly, which means I can theoretically fine-tune or self-host if my bills ever justify it.

Where It Frustrates Me

The naming is a mess. Qwen3-8B, Qwen3-32B, Qwen3.5-397B, Qwen3-Omni-30B, Qwen3-Coder-30B — I have a sticky note on my monitor with model identifiers because I keep getting them mixed up. Qwen3.6-35B at $1/M is one I personally find overpriced for what it delivers. And while English quality is good, it's not quite the polished output I get from DeepSeek V4 Flash in my side-by-side tests.

Kimi — The Reasoner I Send My Hardest Problems To

Kimi from Moonshot AI plays a different game. There's no $0.01 budget tier here. The cheapest model is $3.00/M, and the ceiling reaches $3.50/M. You're paying for raw reasoning horsepower, not token economics.

My Usage Pattern

I treat Kimi like a specialist consultant. I'll send my gnarliest multi-step planning problems — "design a migration plan for a 50-million-row database that minimizes downtime" — and let it chew through the logic. K2.5 at $3.00/M is my go-to. The reasoning score in that table above (⭐⭐⭐⭐⭐) isn't charity; it's earned.

Open Source Status

Kimi's situation is more nuanced — they publish weights for some models under modified MIT-style agreements but reserve rights around commercial-scale distillation. That bothers me philosophically because it edges toward walled garden territory, but the API endpoints remain OpenAI-compatible and the results are excellent. If you're pure open source, weigh this carefully. If you're pragmatically open, the reasoning quality might be worth the compromise.

Where It Frustrates Me

Speed is the tradeoff. Kimi thinks harder, which means I get longer time-to-first-token on streaming responses. No vision capabilities either — pure text. And the pricing means I can't just casually fire off requests like I do with DeepSeek.

GLM — The Quiet Polyglot

Zhipu AI's GLM family feels like the lab underdog nobody talks about at dinner parties but everybody secretly respects. They've been around since the early Chinese LLM era and they keep iterating.

Model Selection

Model Output $/M What I Reach For It
GLM-4-9B $0.01 Tiny classification jobs
GLM-5 $1.92 Chinese-heavy production work
GLM-4.6V varies Image understanding, multimodal

GLM-4-9B at $0.01 is genuinely competitive with Qwen3-8B for similar tasks. But where GLM pulls ahead is Chinese-language quality — they're tied with Kimi for the top spot, and in my anecdotal tests with native Chinese speakers reviewing outputs, GLM-5 produces noticeably more natural phrasing for mainland reader audiences.

Open Source Status

GLM weights are available, mostly under terms aligned with Apache 2.0 depending on the variant. The lab has done open releases, even if they fly under the radar compared to the Qwen hype train. The multimodal GLM-4.6V is a capability I appreciate since vision is a non-negotiable for some of my projects.

Where It Frustrates Me

Code generation scores are slightly below the other three in my testing — still good, but if I'm doing heavy software engineering work I gravitate to Deep

Top comments (0)