Chinese AI Models Are Insane: My Two-Week Deep Dive
I had no idea what I was getting into.
Two weeks ago, I was just a regular bootcamp grad trying to figure out which AI API to use for a side project. I had been using OpenAI, burning through my free credits like crazy, and I kept hearing whispers in Discord servers about these "Chinese models" that were way cheaper. I figured they were probably junk. Boy, was I wrong.
What started as a "let me just try one" turned into me testing every major Chinese AI family I could get my hands on. I ran coding tests, asked weird questions, timed response speeds, and basically turned my apartment into a tiny AI research lab. Let me walk you through what I found.
How I Even Got Here
So my bootcamp just ended, I'm building portfolio projects, and I needed an LLM for a chatbot idea. OpenAI's pricing felt brutal for someone who isn't making money yet. A friend in a Slack channel casually dropped a link to Global API and said, "Just try DeepSeek. Trust me."
I had no idea DeepSeek was made by a Chinese company called 幻方 (High-Flyer). Or that there's an entire ecosystem of Chinese AI labs I had never heard of. We're talking DeepSeek, Qwen, Kimi, and GLM — four major model families, all with their own strengths.
I was shocked that nobody in my bootcamp cohort had talked about these. We'd spent months learning React and Python, and here were models that could do the same stuff as GPT-4o for literal cents.
The Four Players You Should Know
Quick rundown of who's who, because this confused me for a while:
- DeepSeek comes from a Chinese quant fund called 幻方 (High-Flyer). They make models that punch way above their price.
- Qwen is made by Alibaba. Yes, the shopping Alibaba. Their model lineup is massive.
- Kimi comes from Moonshot AI (月之暗面, which literally translates to "Dark Side of the Moon" — I thought that was cool).
- GLM is from Zhipu AI (智谱), a lab that spun out of Tsinghua University.
All four have OpenAI-compatible APIs, which is huge for a newbie like me. You can just swap the base URL and use the same Python code you'd use for ChatGPT. Speaking of which, here's the setup that worked for me:
from openai import OpenAI
client = OpenAI(
api_key="ga_xxxxxxxxxxxx",
base_url="https://global-apis.com/v1"
)
That's it. That's the whole setup. Once I figured that out, I could test any of the models just by changing the model name. This blew my mind a little — I had been scared of "foreign" APIs for no reason.
My Coding Test: Which One Writes the Best Python?
I have to start with coding because that's my world right now. I gave all four models the same prompt: "Write a Python function that takes a list of dictionaries and groups them by a specified key."
Here's where I got my first real surprise. DeepSeek absolutely crushed this. The code was clean, the function handled edge cases I didn't even mention, and it threw in a docstring. I was using V4 Flash, which costs $0.25 per million output tokens. For context, that's nothing.
I ran the same prompt through Qwen3-Coder-30B at $0.35/M. Also great output, very Pythonic, maybe slightly more verbose. Kimi at $3.00/M gave me a solid answer but didn't blow me away compared to the cheaper options. GLM gave me working code too, but it felt a bit more "textbook" than the others.
Quick personal ranking for code: DeepSeek > Qwen > Kimi > GLM.
The thing that really got me was that DeepSeek's V4 Flash and their Coder model are both $0.25/M. Same price. You're not paying extra for the specialized version. That seems like a steal to me.
Reasoning and Math: The Brain Test
Next I wanted to see which one was actually smart. I pulled out a logic puzzle and a word problem from my old bootcamp homework.
The question was something like: "Three friends — Alice, Bob, and Carol — have different ages. Alice is older than Bob. Carol is not the oldest. Bob is not the youngest. What's the order from oldest to youngest?"
DeepSeek got it right. Qwen got it right. Kimi got it and explained its reasoning step by step. GLM got it too, but added way too much preamble before getting to the answer.
For math, I tried asking each model to solve a multi-step algebra problem. Kimi was the standout here. It walked through the problem like a tutor, showed every step, and didn't lose the thread halfway through. That matches what I kept reading — Kimi is the reasoning specialist.
But here's the kicker: Kimi K2.5 costs $3.00/M output tokens. That's twelve times what I was paying for DeepSeek. So the question becomes: do I need that level of reasoning? For most of my projects, probably not. For math tutoring apps? Maybe yes.
The Chinese Language Showdown
I don't speak Chinese, but I was curious, so I asked a friend who's bilingual to help me test. We ran the same Chinese prompts through all four models and asked her to rate the naturalness.
GLM and Kimi tied for best. Both produced really natural-sounding Chinese that didn't feel like it was translated. DeepSeek was good but slightly less natural. Qwen was solid too, but our tester said it occasionally felt "stiff."
If you're building something for a Chinese audience, GLM and Kimi are your go-tos. Honestly, I had no idea how good these models were at Chinese — I just assumed the big American models would be the best at everything.
Speed: How Fast Do They Actually Respond?
Speed matters when you're building user-facing apps. Nobody wants to wait ten seconds for a response.
I timed the responses across all four using similar prompts and here's what I found:
DeepSeek V4 Flash was the fastest by a clear margin. I was getting around 60 tokens per second, which honestly felt instant. Qwen was close behind, maybe 50 tokens/sec. GLM was solid, around 45 tokens/sec. Kimi was noticeably slower, but I think that's because it was "thinking" more before responding. For reasoning tasks, the slower speed makes sense.
The Price Breakdown That Made Me Question Everything
Let me just list what I found because this is what shocked me the most:
- DeepSeek V4 Flash: $0.25/M output
- DeepSeek V3.2: $0.38/M
- DeepSeek V4 Pro: $0.78/M
- DeepSeek R1 (the reasoner): $2.50/M
- DeepSeek Coder: $0.25/M
- Qwen3-8B: $0.01/M (yes, one cent)
- Qwen3-32B: $0.28/M
- Qwen3-Coder-30B: $0.35/M
- Qwen3-VL-32B: $0.52/M
- Qwen3-Omni-30B: $0.52/M
- Qwen3.5-397B: $2.34/M
- Kimi K2.5: $3.00/M
- GLM-4-9B: $0.01/M
- GLM-5: $1.92/M
That Qwen3-8B at $0.01/M is not a typo. You could send a million tokens of output for a single penny. I had no idea that was even possible. For a bootcamp grad, that changes what kinds of projects I can realistically build.
DeepSeek's range goes from $0.25 to $2.50. Qwen's range is the widest — from $0.01 all the way up to $3.20/M. Kimi sits at the premium end. GLM has both ultra-cheap and mid-tier options.
My Quick Test on Vision and Multimodal
I didn't test this as deeply because most of my projects are text-based, but I tried uploading an image of a graph and asking the models to describe the trend.
DeepSeek doesn't really do vision. That was disappointing but understandable at that price. Qwen3-VL-32B handled it well and gave me a clear description. Qwen3-Omni-30B supposedly handles audio and video too, which is wild. GLM-4.6V also did a solid job with the image. Kimi doesn't really do vision stuff, from what I gathered.
If you need image understanding, you're looking at Qwen or GLM. Not the other two.
The Real-World Project Test
Here's where things got real. I took a small piece of my actual portfolio project — a function that needed to parse user input, extract intent, and generate a helpful response — and ran it through all four models.
I compared cost vs quality:
- DeepSeek V4 Flash: cost me basically nothing, output was production-ready
- Qwen3-32B: similar cost at $0.28/M, output was great
- Kimi K2.5: beautiful output but cost 12x more
- GLM-5: solid output, mid-range cost
I ended up picking DeepSeek V4 Flash for my main logic. The cost difference means I can build the whole thing, host it, and not worry about going broke while I iterate. For a beginner dev, that's huge.
What I'd Actually Recommend
If you're a bootcamp grad like me and you're wondering where to start, here's my honest take after two weeks of testing:
For coding and general purpose: DeepSeek V4 Flash. The $0.25/M price is almost stupid, and the output quality is genuinely good. I keep finding excuses to use it.
For when you need a specific model for a specific job: Qwen. They have something for literally every use case. The naming is a little confusing (Qwen3, Qwen3.5, Qwen3.6 — why so many?) but the variety is unmatched.
For complex reasoning and math: Kimi K2.5. Yes, it's $3.00/M, but if you need deep thinking, it earns its price. I wouldn't default to it for everyday use.
For Chinese-language projects: GLM or Kimi. Both are excellent here.
Here's another quick example for you — this is how I switched to GLM-5 for one of my experiments:
python
response = client.chat.completions.create(
model="glm-5",
Top comments (0)