loyaldash

Posted on Jul 2

I Compared DeepSeek vs Qwen vs Kimi vs GLM - Real Results

#webdev #deepseek #tutorial #machinelearning

ok so heres the thing. ive been building indie projects for like 6 years now and the past few months have been WILD when it comes to chinese AI models. like honestly, every time i blink theres a new model dropping and i genuinely cant keep up anymore.

last month i finally snapped and decided to actually sit down and test four of the big ones: DeepSeek, Qwen, Kimi, and GLM. i ran them through my actual workflows. coding tasks, content writing, some chinese language stuff for a client project, reasoning puzzles. the whole deal.

heres what i found. and yes, im gonna be brutally honest about the stuff that sucked too.

Why I Even Bothered With This

look, i know theres a million "AI comparison" posts out there. most of them are garbage. they just regurgitate marketing benchmarks and call it a day. i wanted to actually USE these models for real work and see which ones earned their keep.

the other thing is pricing has gotten SO weird. you got models at like $0.01 per million tokens and others at $3.50. thats a 350x difference. you cannot just pick one randomly and hope it works out. i learned that the hard way when i burned through $200 in a weekend testing stuff (dont ask).

i tested everything through Global API btw because their unified endpoint lets me swap models without rewriting code. absolute lifesaver. more on that later.

The Quick and Dirty Comparison

heres a quick table before i dive deep. ill keep the prices EXACT because messing those up would be criminal:

Feature	DeepSeek	Qwen	Kimi	GLM
Developer	DeepSeek (幻方)	Alibaba (阿里)	Moonshot AI (月之暗面)	Zhipu AI (智谱)
Price Range	$0.25-$2.50/M	$0.01-$3.20/M	$3.00-$3.50/M	$0.01-$1.92/M
Budget Pick	V4 Flash @ $0.25/M	Qwen3-8B @ $0.01/M	N/A (all premium)	GLM-4-9B @ $0.01/M
Best Overall	V4 Flash @ $0.25/M	Qwen3-32B @ $0.28/M	K2.5 @ $3.00/M	GLM-5 @ $1.92/M
Code	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Chinese	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
English	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Reasoning	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Speed	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Vision	Limited	✅ (VL, Omni)	❌	✅ (GLM-4.6V)
Context	128K	128K	128K	128K
OpenAI API	✅	✅	✅	✅

ok so right off the bat you can see these models are NOT interchangeable. they all have their thing.

GLM: The Underdog That Surprised Me

honestly, i almost skipped GLM. i figured "eh, its chinese-only, probably not useful for me." WRONG. i was so wrong.

GLM comes from Zhipu AI (智谱), and they make some genuinely solid models. the pricing is INSANE too. like GLM-4-9B at $0.01/M output tokens? thats basically free. i ran like 5000 queries and my bill was literally like four dollars.

heres the model lineup:

GLM-4-9B at $0.01/M - the ultra-cheap workhorse
GLM-5 at $1.92/M - their flagship, competes with the big boys

what blew me away was the chinese language performance. im working on this project that requires generating formal chinese business emails and GLM-5 just CRUSHED it. way better than DeepSeek. way better than Qwen for this specific use case. the nuance was actually there.

the downsides tho: code generation is mediocre. like its FINE for basic stuff but for anything complex i was reaching for DeepSeek or Qwen. and the english isnt bad but its not DeepSeek-level either.

oh and they have GLM-4.6V which handles vision tasks. i tested it on some product photos and it was pretty solid. not GPT-4o level but for the price? absolutely usable.

would i use GLM again? YES. specifically for chinese language work and any task where i need to save money. its like the budget king nobody talks about.

Kimi: The Brainy One That Costs a Lot

ok Kimi is from Moonshot AI (月之暗面) and these guys are clearly going for the "smart model" angle. you can tell because they only have ONE pricing tier and its expensive: $3.00-$3.50/M output.

their flagship K2.5 sits at $3.00/M and honestly? it earns it. i threw some genuinely hard reasoning tasks at it - like the kind where other models get confused halfway through - and Kimi just handled it. multi-step logic, math word problems, the works.

the thing is, i cant use it for everything. at $3.00/M output, running kimi on bulk tasks would bankrupt me. i literally used it for like 30 test queries and already felt the cost creep up. this is a "break glass in case of emergency" model for me.

where it falls short: speed. its noticeably slower than DeepSeek. and theres no vision/multimodal support which in 2025 is kinda wild. if i need to handle images, kimi is not my pick.

english is solid tho, no complaints there. its just expensive.

would i use Kimi again? YES but only for the hard stuff. its like hiring a genius consultant - you dont call them for every little question.

DeepSeek: My Daily Driver

ok heres where i get to be really excited. DeepSeek has become my default for like 80% of my work. its just SO good for the price.

lets talk models:

V4 Flash at $0.25/M - this is the one i use constantly
V3.2 at $0.38/M - their newer architecture
V4 Pro at $0.78/M - for when i need extra polish
R1 (Reasoner) at $2.50/M - the deep thinking model
Coder at $0.25/M - specialized for code

heres what gets me. V4 Flash at $0.25/M genuinely rivals the quality of GPT-4o for most of my use cases. and its FAST. like 60 tokens/sec fast. when im iterating on code or generating content, that speed matters a lot.

i tested V4 Flash against some western models for english copywriting and it actually WON on a few prompts. like the tone was better. more natural. less "AI-ese."

code generation is the real standout tho. i ran it through my standard coding test suite and it consistently nailed HumanEval and MBPP style problems. better than Qwen for code in my experience. better than Kimi (which surprised me).

the downsides: no real vision capabilities. chinese is good but not GLM-level. and the model lineup is smaller than Qwen so if you need specific sizes you might be out of luck.

would i use DeepSeek again? ABSOLUTELY. its already my default.

Qwen: The Swiss Army Knife

alibaba's Qwen family is WILD because they have SO MANY models. like its almost overwhelming. heres what theyve got:

Qwen3-8B at $0.01/M - the ultra-budget option
Qwen3-32B at $0.28/M - general purpose sweet spot
Qwen3-Coder-30B at $0.35/M - code specialized
Qwen3-VL-32B at $0.52/M - vision language
Qwen3-Omni-30B at $0.52/M - the everything model
Qwen3.5-397B at $2.34/M - enterprise tier

the range is genuinely impressive. you can go from $0.01 to $3.20/M and find a Qwen model at basically every price point in between. theres also Qwen3.6 stuff in there which can get pricey.

where Qwen shines: multimodal. like if you need ONE model that handles text AND images AND audio AND video, Qwen3-Omni-30B at $0.52/M is genuinely compelling. i tested it on a mixed-media workflow and it just worked. no special handling needed.

the alibaba backing means the infrastructure is solid. enterprise grade. i never had uptime issues with any Qwen model during my testing period.

weaknesses tho, and i gotta be honest here: the naming is CONFUSING. like Qwen3 vs Qwen3.5 vs Qwen3.6 vs the various suffixes (VL, Omni, Coder). it took me like 20 minutes just to figure out which model was which. and some of the mid-tier models feel overpriced - Qwen3.6-35B at $1/M output is steep for what you get.

english is solid but in my testing DeepSeek edged it out slightly for fluency. not by a lot tho.

would i use Qwen again? YES but specifically when i need vision or multimodal capabilities. its the best in class for that.

How I Actually Use These Models Now

heres my actual workflow after all this testing:

Daily coding work: DeepSeek V4 Flash. no contest. $0.25/M and its FAST.

Hard reasoning tasks: Kimi K2.5. when i genuinely need the brain power, i pay the $3.00/M.

Chinese language projects: GLM-5 or Kimi K2.5 (both are amazing at chinese).

Vision/multimodal stuff: Qwen3-VL-32B or Qwen3-Omni-30B.

Ultra budget experiments: Qwen3-8B or GLM-4-9B at $0.01/M.

i know thats a lot of models but thats the reality. you dont need ONE model for everything. you need the RIGHT model for each task.

Some Actual Code I Wrote

heres a basic python setup that lets me swap between models easily. i use this all the time:

from openai import OpenAI

client = OpenAI(
    api_key="ga_xxxxxxxxxxxx",
    base_url="https://global-apis.com/v1"
)

def ask_model(model_name, prompt):
    response = client.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

prompt = "Explain the difference between TCP and UDP in simple terms"

print("=== DeepSeek V4 Flash ===")
print(ask_model("deepseek-v4-flash", prompt))

print("\n=== Qwen3-32B ===")
print(ask_model("Qwen/Qwen3-32B", prompt))

see how clean that is? same code structure, just swap the model name. thats why i love Global API's unified setup. i dont have to learn four different SDKs or auth systems.

heres another example for vision stuff with Qwen:

response = client.chat.completions.create(
    model="Qwen/Qwen3-VL-32B",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }]
)
print(response.choices[0].message.content)

The Stuff Nobody Talks About

a few things i noticed that arent in the spec sheets:

Latency matters more than you think. DeepSeek V4 Flash being fast means i iterate faster. when youre testing prompts or debugging code, those saved seconds add up.

Context window 128K is fine for MOST stuff. people obsess over context but unless youre doing novel-length analysis, 128K is plenty.

Model "tier" doesnt always mean quality. Qwen3-8B at $0.01/M is shockingly good for basic tasks. dont pay more than you need to.

Reasoning models are worth it sometimes. i was skeptical about Kimi K2.5 and R1 until i had a genuinely complex multi-step problem. then i got it.

Which One Should You Pick?

heres my honest advice:

If youre building a startup and need cheap reliable text generation: DeepSeek V4 Flash. seriously. just use it.
If you need multimodal/vision: Qwen3-VL or Qwen3-Omni.
If youre doing heavy chinese language work: GLM-5 or Kimi K2.5.
If you need the absolute best reasoning: Kimi K2.5.
If youre experimenting on a tight budget: Qwen3-8B or GLM-4-9B at $0.01/M.

theres no single winner. the original article's TLDR said DeepSeek V4 Flash wins on price-to-performance and i agree with that. but honestly? the ecosystem is mature enough now that you should be using multiple models.

Final Thoughts

ok so wrapping this up. i spent way too many hours testing these models and im still finding new use cases. the chinese AI ecosystem has gotten legitimately good. like scary good. and the prices are so low that you can afford to experiment.

my biggest piece of advice: dont commit to one model. set up something like what i showed you above where you can swap models easily. test them on YOUR actual workloads. the benchmarks are useful but they dont capture everything.

if you want to try

DEV Community

I Compared DeepSeek vs Qwen vs Kimi vs GLM - Real Results

Why I Even Bothered With This

The Quick and Dirty Comparison

GLM: The Underdog That Surprised Me

Kimi: The Brainy One That Costs a Lot

DeepSeek: My Daily Driver

Qwen: The Swiss Army Knife

How I Actually Use These Models Now

Some Actual Code I Wrote

The Stuff Nobody Talks About

Which One Should You Pick?

Final Thoughts

Top comments (0)