loyaldash

Posted on Jul 3

I Spent a Week Comparing Chinese AI Models and Here's What I Found

#webdev #api #python #programming

Okay, real talk. When I graduated from my coding bootcamp three months ago, I thought I knew AI. Like, I'd used ChatGPT, I knew what an API was, I'd even made a couple of calls to OpenAI for a class project. I figured I had a handle on things.

Then I stumbled onto this whole world of Chinese AI models and honestly? My brain exploded a little. I had no idea there were so many options, and some of them are genuinely mind-blowing. So I spent about a week just messing around with four of the big ones: DeepSeek, Qwen, Kimi, and GLM. All through Global API, which I had no idea existed before this rabbit hole.

Let me walk you through what I learned, because if you're a fellow beginner, this might save you a lot of Googling.

Why I Even Started Looking at Chinese Models

So here's the thing. My bootcamp capstone project involved building a chatbot for a small business, and my budget was basically zero. I kept running into this wall where OpenAI's GPT-4o was amazing but cost more than I could justify for a portfolio piece, and the free tier felt limiting.

A friend in my cohort mentioned that "everyone in dev circles" was using Chinese open-source models through some unified API. I was skeptical. Like, what could possibly be as good as GPT-4o for cheap? Turns out, plenty. I was shocked by the quality gap that doesn't exist between Western and Chinese models anymore.

The Quick Rundown (For Skimmers)

Before I dive deep, here's the cheat sheet I wish someone had handed me on day one:

DeepSeek — Your wallet's best friend. V4 Flash at $0.25/M output is genuinely the steal of the century
Qwen — The "there's a model for literally everything" family. Alibaba makes like forty of them
Kimi — A thinking machine. If you need reasoning, this is it. But it'll cost you
GLM — Made by Zhipu AI. Killer at Chinese language stuff and surprisingly cheap

All four have 128K context windows, which still blows my mind. That's like a whole book's worth of memory in a single conversation. And they all work with OpenAI's API format, so you don't need to learn a new SDK for each one.

My DeepSeek Deep-Dive

I'll be honest, DeepSeek is the one that turned me from "curious" to "obsessed." The whole family is built by this company called DeepSeek (or 幻方 in Chinese, which I think translates to something mystical?), and they clearly care about making AI accessible.

The Pricing Hit Me Like a Truck

I was shocked when I saw these numbers. Like, physically gasped. V4 Flash costs $0.25 per million output tokens. For reference, GPT-4o costs $10.00 per million. I had no idea such a massive gap existed. That's forty times cheaper, and the quality is comparable for most things I'd want to do.

Here's the full DeepSeek menu I compiled:

V4 Flash — $0.25/M (this is the one everyone raves about)
V3.2 — $0.38/M
V4 Pro — $0.78/M
R1 (the reasoning one) — $2.50/M
Coder — $0.25/M (specialized for code)

So the price range runs from $0.25 to $2.50 per million tokens. And the V4 Flash hits around 60 tokens per second, which is honestly faster than I've ever needed.

What It Nails

Code generation. I threw a bunch of coding problems at it from LeetCode and my bootcamp exercises, and DeepSeek absolutely crushed them. It ties with the best models I've tried on HumanEval-style tasks. I was building a Python script for parsing CSV files, and DeepSeek spat out working code in one shot where I'd been struggling for an hour.

English is also rock solid. Like, it doesn't have that slightly-off feel some non-Western models have. I sent it the prompt "explain quantum computing in 100 words" and got back a clean, clear explanation. No weird phrasing, no awkward metaphors.

Where It Stumbles

Vision. DeepSeek doesn't really do images. I tried uploading a screenshot of a UI mockup, and yeah, no dice. That's where Qwen and GLM pulled ahead.

Chinese language tasks are also slightly weaker than GLM or Kimi, but honestly, for a beginner like me who only reads English, this doesn't matter at all.

My First Real Test

Here's literally the first thing I did with DeepSeek. I made this simple Python call:

from openai import OpenAI

client = OpenAI(
    api_key="ga_xxxxxxxxxxxx",
    base_url="https://global-apis.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain quantum computing in 100 words"}]
)
print(response.choices[0].message.content)

Look at how clean that is. Same exact code you'd write for OpenAI, just with a different base URL and model name. That's it. That blew my mind. No new library, no weird config files, no tears.

Then I Met Qwen And Lost My Mind A Little

If DeepSeek is a laser-focused tool, Qwen is the entire hardware store. Alibaba (the company, yeah, the e-commerce giant) makes the Qwen family, and there are... a lot of them.

The Model Buffet

I literally made a spreadsheet to keep track. Here's the lineup:

Qwen3-8B — $0.01/M (yes, one cent per million tokens)
Qwen3-32B — $0.28/M (the sweet spot for most things)
Qwen3-Coder-30B — $0.35/M
Qwen3-VL-32B — $0.52/M (vision!)
Qwen3-Omni-30B — $0.52/M (multimodal everything)
Qwen3.5-397B — $2.34/M (the big daddy enterprise model)

The price range is $0.01 to $3.20 per million. That $0.01 model? I thought it was a typo. It is not. I tested it. It works. For trivial stuff, obviously, but it works.

Why I Kept Coming Back

Two words: multimodal magic. Qwen3-VL-32B understands images. Like, you can send it a picture of a flowchart and ask it to convert the chart to code. I was doing a UX design exercise for my portfolio and Qwen basically became my design partner. The Qwen3-Omni model goes even further and handles audio and video too, though I haven't fully explored that yet.

Also, Qwen3-Coder-30B is fantastic. For the price ($0.35/M), the code quality is up there with DeepSeek's specialized Coder model.

What Confused Me

The naming. Oh my god, the naming. Why is one model Qwen3-32B and another Qwen3.5-397B? Why the periods? Why is there a Qwen3.5 and a Qwen3.6? I had to ask in a Discord server what the difference was. Apparently the dot versions are newer, but I genuinely don't know which ones are production-ready versus experimental without digging through docs.

Also, a heads up — Qwen3.6-35B sits around $1/M, and for that price I'd probably just use a different option.

My Qwen Test Run

response = client.chat.completions.create(
    model="Qwen/Qwen3-32B",
    messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists"}]
)

The model name has a "Qwen/" prefix on Global API, which tripped me up the first time. Heads up if you try this. The function it wrote was clean, well-commented, and included an example. Honestly, I learned a thing or two about Pythonic style from reading its output.

Kimi Made Me Feel Dumb (In A Good Way)

Okay, Kimi is a different beast. Made by Moonshot AI (月之暗面, which I learned means "dark side of the moon," cute), this is the model family for people who want their AI to actually think.

The Premium Pricing Reality

Here's the thing about Kimi. There's no budget option. The cheapest model is K2.5 at $3.00 per million output tokens, and the premium tier goes up to $3.50. For someone who was just trying to save money, this initially turned me off.

But then I tried it. And oh.

When Kimi Shines

Reasoning tasks. I gave it a logic puzzle that had stumped me for like twenty minutes, and Kimi walked through it step by step, explaining its thought process. I was shocked. It felt like pair programming with someone way smarter than me.

For math, coding challenges that require multiple steps, and any task that needs the model to "show its work," Kimi is unmatched among these four. On standard reasoning benchmarks, it consistently leads.

The Trade-Off

It's slower than DeepSeek. Way slower. We're talking about a model that takes its time to think. And the price reflects that premium reasoning capability. For my chatbot project, I couldn't justify Kimi. But for one-off complex problems? Worth every penny.

I also noticed Kimi doesn't do vision. Multimodal isn't really its thing. It's a pure text-reasoning specialist.

GLM Was My Surprise Favorite

Going in, I expected GLM to be the weakest of the four. I had no reason to think that, but I just hadn't heard as much buzz about it. Turns out, Zhipu AI (智谱) is cooking.

The Best Of Both Worlds

GLM has models that go from $0.01 to $1.92 per million. That's the cheapest top-tier option among all four families. The GLM-4-9B at $0.01 is dirt cheap, and GLM-5 at $1.92 is genuinely competitive with much more expensive options.

Why I Ended Up Using It Most

Chinese language. I know, I know, I said earlier I only read English. But I started helping a friend who's learning Mandarin, and GLM blew Kimi and DeepSeek out of the water on Chinese translation and explanation tasks. There's a reason Chinese developers love it.

It also has GLM-4.6V for vision tasks, which actually held its own against Qwen's vision models. For a smaller, less hyped company, I was shocked by how competitive their offerings are.

Where GLM Falls Short

Code generation. It scored lower on my informal coding tests compared to DeepSeek and Qwen. The code works, but it's not as elegant or efficient. Speed is also middle-of-the-road — not slow, but not the rocket that V4 Flash is.

What I Actually Ended Up Using

After all this testing, here's my current setup for the chatbot project:

Default conversations — DeepSeek V4 Flash ($0.25/M)
Image-related stuff — Qwen3-VL-32B ($0.52/M)
Tough logic problems — Kimi K2.5 ($3.00/M, sparingly)
Quick cheap tasks — Qwen3-8B or GLM-4-9B ($0.01/M)

This combo keeps my costs low while letting me use the right tool for each job. I'm spending maybe a few dollars a month now, versus what would have been $50+ on GPT-4o.

The Stuff That Confused Me At First

A few things that might trip you up too:

Context window sizing. All four claim 128K, but that doesn't mean they all handle it well. For massive documents, I found DeepSeek and Qwen more reliable.

Rate limits. I didn't hit them, but if you're building something with high traffic, look into this early. I saw people complaining about Kimi rate limits in particular.

Model availability. Some models are listed on Global API one day and gone the next. Always check the current model list before committing to architecture. I had to swap from one Qwen model mid-project because it got deprecated.

"Best" is subjective. All the ratings I gave are based on my use cases. Your mileage may vary. The only way to know is to test with your actual prompts and data.

My Honest Takeaway

Here's what I really want to emphasize, because I wish someone had told me this three months ago: you don't have to pick just one model. The smartest thing I did was set up a system where I can route different requests to different models based on the task. Cheap stuff goes to Qwen3-8B, complex stuff goes to Kimi, images go to Qwen-VL, default goes to DeepSeek.

And the fact that Global API lets me access all of these through one endpoint with one API key? That's the part that really makes this whole setup possible. I don't have to manage four different accounts, four different billing systems, four different SDKs. Just the OpenAI Python client with a different base URL.

If you're a bootcamp grad like me, or really anyone just starting out, I can't stress enough how much this approach can stretch your budget. The savings I got let me actually finish my capstone project without compromising on quality. Check out Global API at global-apis.com/v1 if you want to try it — no pressure, just a tool that genuinely made my life easier when I was figuring all this out.

Now if you'll excuse me, I have a chatbot to ship. And maybe a few more Chinese AI models to stress-test. I've got the bug now.

DEV Community

I Spent a Week Comparing Chinese AI Models and Here's What I Found

Why I Even Started Looking at Chinese Models

The Quick Rundown (For Skimmers)

My DeepSeek Deep-Dive

The Pricing Hit Me Like a Truck

What It Nails

Where It Stumbles

My First Real Test

Then I Met Qwen And Lost My Mind A Little

The Model Buffet

Why I Kept Coming Back

What Confused Me

My Qwen Test Run

Kimi Made Me Feel Dumb (In A Good Way)

The Premium Pricing Reality

When Kimi Shines

The Trade-Off

GLM Was My Surprise Favorite

The Best Of Both Worlds

Why I Ended Up Using It Most

Where GLM Falls Short

What I Actually Ended Up Using

The Stuff That Confused Me At First

My Honest Takeaway

Top comments (0)