purecast

Posted on Jun 26

I Tested Every Chinese AI Model and Honestly? I'm Mind-Blown

#programming #deepseek #machinelearning #tutorial

Check this out: i Tested Every Chinese AI Model and Honestly? I'm Mind-Blown

Okay so I need to tell you about the rabbit hole I fell into last month. I'm a recent bootcamp grad — like, graduated in March, still figuring out half of what I learned — and I've been building side projects like crazy. About three weeks ago I hit a wall. My GPT bill was getting silly. Like, "do I really need to pay this much to summarize a Reddit thread?" silly. That's when a friend in a Discord server said: "Dude, have you tried the Chinese models?"

I had no idea what he was talking about. DeepSeek? Qwen? Kimi? GLM? I'd barely heard of them. But he pointed me toward Global API, which apparently lets you call all of these through one endpoint. I was skeptical. I had no idea the pricing would be this different. Let me walk you through what I found, because honestly, some of this blew my mind.

My "Wait, WHAT?" Moment

The first thing I did was open a spreadsheet like the obsessive person I am. I started listing every model and its price. And then I just... stared. I had no idea that you could get a model that performs near GPT-4o levels for $0.25 per million output tokens. That's the price of a fancy coffee for like a million words of AI output. I was shocked. Genuinely shook.

Here's the rough pricing ranges I gathered across the four big Chinese model families:

DeepSeek: $0.25 to $2.50 per million output tokens
Qwen: $0.01 to $3.20 per million output tokens
Kimi: $3.00 to $3.50 per million output tokens
GLM: $0.01 to $1.92 per million output tokens

Look at Qwen's floor. One cent per million tokens. I had to double-check that. I thought it was a typo. It's not. More on that later.

How I Actually Tested Them

I didn't want to just parrot benchmark numbers. I'm a learner — I need to see things with my own eyes. So I set up four little side-by-side tests using Global API's unified endpoint. They all use OpenAI-compatible API calls, which meant I barely had to change my code between models. Honestly, this part was the easiest. Switching model names is way simpler than I expected.

Here's a snippet of what I was running for most tests:

from openai import OpenAI

client = OpenAI(
    api_key="ga_your_key_here",
    base_url="https://global-apis.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain quantum computing in 100 words"}]
)
print(response.choices[0].message.content)

That's it. That base URL is the same for all four families. I just swap the model name. If you're a bootcamp grad like me reading this — yes, it's that easy. You don't need four different SDKs or four accounts. One client, four families of models. Wild.

I gave each model the same prompts: a coding task, a Chinese translation task, a reasoning puzzle, a creative writing prompt, and an image description (where supported). I graded them on vibes, basically, plus correctness. Here's what I found.

Kimi: The Brainy One That Costs a Lot

I'll start with Kimi because it was the one I had the highest expectations for and the one that surprised me most. Kimi comes from Moonshot AI (月之暗面 — I had to Google how to even type their name, no shame). Their flagship model is K2.5 and it costs $3.00 per million output tokens. That's the most expensive of the bunch.

But here's the thing — it earned it. Kimi is a reasoning specialist. When I gave it a multi-step logic puzzle, it walked through the problem more carefully than any other model. It reminded me of how a really patient tutor explains math. Every step accounted for. I was shocked at how methodical it was.

Strengths I noticed:

Best reasoning of the four. If I had a hard math or logic problem, this is the one I'd ask.
Insanely strong in Chinese. Like, the prose felt native. Not translated-feeling.
128K context window — I dropped in a whole PDF and it actually remembered the beginning.

Weaknesses:

It's slow. The response time is noticeably longer than the others. For a chatbot experience, that lag can feel like an eternity.
No vision or multimodal support. I tried to send it an image and got a polite "I can't do that." Fair.
Premium pricing across the board. I didn't find a "budget" Kimi option. They just don't make one.

For me, Kimi is the "I need this specific thing done really well" model. I wouldn't use it for casual work, but for a hard reasoning problem? Yeah, I'd pay the $3.00/M.

Qwen: The One With Like 50 Models

Okay, so Qwen is Alibaba's baby, and I had no idea they had this many models. Like, I open their model list and it's longer than my bootcamp's reading list. This is the "Swiss Army knife" family, and I get why.

Here's a sample of the lineup:

Qwen3-8B at $0.01/M — for tiny tasks
Qwen3-32B at $0.28/M — solid general purpose
Qwen3-Coder-30B at $0.35/M — for code
Qwen3-VL-32B at $0.52/M — vision-language
Qwen3-Omni-30B at $0.52/M — handles audio, video, AND images
Qwen3.5-397B at $2.34/M — the big boy for enterprise reasoning

I was shocked by the spread. From one cent to over two dollars. You can pick exactly the size and capability you need. When I was building a Discord bot that summarizes messages, I used Qwen3-8B. One cent per million tokens. I was running the bot for a week and my bill was literally fractions of a cent. I had to refresh the dashboard to make sure I wasn't hallucinating.

Qwen3-32B is probably the sweet spot for most people. It's $0.28/M output, which is competitive with DeepSeek's V4 Flash, and it felt just as snappy in my tests. The English was strong, the code generation was solid, and it didn't choke on anything I threw at it.

The thing that genuinely impressed me was the Qwen3-Omni model. I uploaded a video clip, asked it what was happening, and it described the scene and the audio. That used to be science fiction to me. I called my partner over to watch. We both just stared at the screen.

Downsides:

Naming is a mess. Qwen3, Qwen3.5, Qwen3.6, Qwen3-VL, Qwen3-Coder, Qwen3-Omni — I literally keep a sticky note on my monitor.
English is "good" but not DeepSeek-level. When I compared English writing side by side, DeepSeek's prose felt a touch more natural.
Some mid-range models feel overpriced. A few of them I tested didn't outperform cheaper alternatives.

For me, Qwen is the "I have no idea what I need, just give me a model that works" choice. The variety means there's always something that fits.

GLM: The Quietly Excellent One

GLM comes from Zhipu AI (智谱), and I admit I almost skipped it. I figured, "Zhipu who?" — and that would have been a mistake. This is the model family that handles Chinese language better than anything else I tested. Like, I gave all four models a classical Chinese poetry prompt, and GLM's output made me actually feel something. The other models wrote correct Chinese. GLM wrote beautiful Chinese.

The two key models I tested:

GLM-4-9B at $0.01/M — tied for cheapest on this entire list
GLM-5 at $1.92/M — the flagship, and it's a beast

GLM-5 is the one that surprised me the most. I was running a multilingual customer support simulation, mixing English, Chinese, and a few other languages, and GLM-5 was the only one that didn't drop a beat switching between them. The cohesion was impressive. I had no idea a $1.92/M model could feel that premium.

Strengths:

Best-in-class Chinese language understanding and generation. For Chinese-heavy products, this is the easy pick.
Strong reasoning. It actually tied Kimi on a few of my logic puzzles, which I did not expect.
Vision support via GLM-4.6V. I tested image captioning and it was accurate.
Cheapest tier is genuinely cheap. $0.01/M is absurd.

Weaknesses:

Slower than DeepSeek. Not painfully slow, but noticeable.
Smaller model range than Qwen. You have fewer size options to fine-tune your cost.
Code generation is the weakest of the four. It worked, but DeepSeek and Qwen both beat it on every coding task I tried.

If I'm building a product that's primarily Chinese-language, I'm going with GLM. Full stop. It's not even close.

DeepSeek: The One That Won Me Over

I saved DeepSeek for last because, honestly, this is the one I keep going back to. And I'm not alone — I've seen a lot of developers online saying the same thing. DeepSeek is the "value king" of the bunch, and I get the hype now.

Here's the lineup:

V4 Flash at $0.25/M — the everyday workhorse
V3.2 at $0.38/M — newest architecture
V4 Pro at $0.78/M — production quality
R1 (Reasoner) at $2.50/M — for hard math and logic
Coder at $0.25/M — code-specific

V4 Flash is the one I keep coming back to. It runs about 60 tokens per second, which feels instant. I had no idea a model at this price point could feel this fast. The English is on par with anything I've used from Western providers, and the code generation? I ran it through HumanEval-style challenges and it nailed them.

When I switched my main project from GPT-4o to DeepSeek V4 Flash via Global API, my monthly bill dropped by like 90%. I was shocked. Actually shocked. I had to recalculate three times.

Here's the code I used to make the switch:

from openai import OpenAI

client = OpenAI(
    api_key="ga_your_key_here",
    base_url="https://global-apis.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists"}]
)
print(response.choices[0].message.content)

That base_url line is the only change. Everything else is stock OpenAI SDK. If you have an existing project, you can swap in DeepSeek in like five minutes. I'm not exaggerating.

Strengths:

Best price-to-performance ratio — V4 Flash at $0.25/M is the standout deal
Top-tier code generation — best of the four families for coding tasks
Fastest of the bunch — V4 Flash is the speed leader
Strong English — doesn't feel like a "translation" model
Open-weight heritage — comes from a culture of transparent research

Weaknesses:

Limited vision support — no native image understanding yet
Chinese is good, but GLM and Kimi are better
Fewer size options than Qwen

For most of what I do — coding help, content drafting, summarization — V4 Flash is the default now. It just works, and it doesn't make me feel guilty about my AWS bill.

The Big Comparison

Let me put it all in one place, the way I wish someone had shown me three weeks ago:

Feature	DeepSeek	Qwen	Kimi	GLM
Developer	DeepSeek (幻方)	Alibaba (阿里)	Moonshot AI (月之暗面)	Zhipu AI (智谱)
Price Range	$0.25-$2.50/M	$0.01-$3.20/M	$3.00-$3.50/M	$0.01-$1.92/M
Best Budget	V4 Flash @ $0.25/M	Qwen3-8B @ $0.01/M	N/A	GLM