Discussion on: Qwen 3 Benchmarks, Comparisons, Model Specifications, and More

View post

Replies for: Qwen3 is... highly overrated! Here'a my case. I wanted to build some visuals in code based on a description of the visual. Qwen3 required many man...

It's good for the right task, and dramatically benefits from tool use, such as MCP servers. And what size model are you saying wasn't good? I hope you meant the largest 235B model at the very least (when comparing to a commercial model). I also find that one model is not the best on its own. Generally, "Architect" mode where you use one model as reasoner and another as coder (even if technically the same model), which based on aider benchmarks can improve output by ~20%. But the major takeaway is your workflow using a commercial solution vs open source will be different, that's expected because they figured everything out for you... so things just "work".

RedDragen • May 6

Fair (even though MCP isn't even possible on the released models). Yes, i was comparing the largest model. It was on the Qwen site itself which should've been my best possible experience.

Micheal Kinney • May 6

I've been using it free through Openrouter or smaller models locally (I'm split between 30B A3B and 14B, they seem very close). Might check their API out directly. I've had some promising luck setting up several models to automate processes, but a lot more work needs to be done. For one, you can use Task-Master MCP and tell the AI to build a robust debugging framework around the project with tests and verbose log everything, so it can solve its own problems. And you can make it "learn" by appending its knowledge base with lesions learned, text-embedding for context and faster search, etc. "Functional AGI" feels so close (to me just end-to-end AI without human interaction that can run 24/7 and self-improve). And to me, it's less about the model itself and the MoE + scaffolding around your AI system (the better the framework, the less capable the AI needs to be IMO).

And maybe I don't understand, but you should be able to use MCP's for anything you can get API access for. Unless you mean though their "Chat" counterparts, which I mostly only use now for deep research.

RedDragen • May 6

chat.qwen.ai/
MCP is grayed out.

It does seem the model itself does support it as it's plain there in the description on huggingface: huggingface.co/Qwen/Qwen3-235B-A22B

I don't really know how to use it yet, i haven't played with MCP yet though i guess with all the hype around it i'll probably give it a try soon. The thing that stops me is the extra fuss i have to go through (using open-webui) to get that mcp stuff running. I need a proxy for it apparently, yeah just haven't had the interest yet to dig into it.

I'm also using openrouter (and hooked it into open-webui). It's nice being able to pick from hundreds of models right there! And local is on the list too if you have models locally.

Best Codes • May 7

The Qwen chat is just an interface for the model. Even the official interfaces for models aren't always the best.
Also, MCP isn't a thing that only some models support (though some might be more optimized for it), it's just a protocol that models can use.

Best Codes • May 6 • Edited

Some researchers have pruned the Qwen 3 30b model to 16b and will soon prune the 235b to 150b, which will make it much lighter and a higher performance/size ratio. It will be interesting to test that one.

Micheal Kinney • May 6

Qwen 3 30b A3 and 14B are already at par with each other. They trade blows on Math and Coding, but within a margin of each other and are both pretty good at memory usage. I use 4bit quants though, as from my experience you only loose ~5%-15% accuracy and save 75% resources. And I'd be concerned with pruning producing highly domain specific models (which is fine if that's what you want). You should also look into mixed quants done by Unsloth, very promising... their findings show almost no quality/accuracy loss with 80% resource reduction with faster speeds. They got DeepSeek R1 671B to run in <20GB of VRAM.

Got any links?

Best Codes • May 7

Link to X post about the Qwen 3 prune:
x.com/kalomaze/status/191837896041...

And I have looked into the work Unsloth is doing a lot! I was using the Unsloth gemma-3 not long ago and all the work over there is very impressive.