DEV Community

QwQ-32B vs DeepSeek-R1-671B

Qwen is a series of LLMs released and maintained by Alibaba Cloud. QwQ is the model with reasoning capabilities in Qwen series. A while ago, the team released a preview version of this model and now, they’ve released QwQ-32B model completely. It is available in Huggingface and Ollama model repository.

Image generated by ChatGPT

Links

https://huggingface.co/Qwen/QwQ-32B
https://ollama.com/library/qwq

They’ve used reinforcement learning (RL) scaling approach driven by outcome-based rewards. As mentioned in their blog post, instead of a traditional reward model, an accuracy verifier is used in training this model. It is trained with rewards from general reward model and some rule-based verifiers. You can use QwQ-32B via Hugging Face Transformers and Alibaba Cloud DashScope API.

Example Code with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/QwQ-32B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many r's are in the word \"strawberry\""
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Enter fullscreen mode Exit fullscreen mode

Example Code with DashScope API

from openai import OpenAI
import os

# Initialize OpenAI client
client = OpenAI(
    # If the environment variable is not configured, replace with your API Key: api_key="sk-xxx"
    # How to get an API Key:https://help.aliyun.com/zh/model-studio/developer-reference/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

reasoning_content = ""
content = ""

is_answering = False

completion = client.chat.completions.create(
    model="qwq-32b",
    messages=[
        {"role": "user", "content": "Which is larger, 9.9 or 9.11?"}
    ],
    stream=True,
    # Uncomment the following line to return token usage in the last chunk
    # stream_options={
    #     "include_usage": True
    # }
)

print("\n" + "=" * 20 + "reasoning content" + "=" * 20 + "\n")

for chunk in completion:
    # If chunk.choices is empty, print usage
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
    else:
        delta = chunk.choices[0].delta
        # Print reasoning content
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content is not None:
            print(delta.reasoning_content, end='', flush=True)
            reasoning_content += delta.reasoning_content
        else:
            if delta.content != "" and is_answering is False:
                print("\n" + "=" * 20 + "content" + "=" * 20 + "\n")
                is_answering = True
            # Print content
            print(delta.content, end='', flush=True)
            content += delta.content
Enter fullscreen mode Exit fullscreen mode

Performance Evaluation

Below is the evaluation chart to show how this 32B model is competing against other reasoning model, especially DeepSeek-R1–671B.

Image from blog post by Qwen

It is heavily competing against DeepSeek-R1–671B model in all the five benchmarks and outperforming OpenAI-o1-mini (except for IFEval).

I was wondering what OpenAI’s ChatGPT would think about it 🤣

Here’s a small insight from the comparison between QwQ-32b and Deepseek-r1–671b, generated by ChatGPT with this provided chart.
(NOTE: For some reason, 671b is shown 67.1b by ChatGPT — Please ignore)

Image by Author (Screenshot of output from ChatGPT)

It is clear that the storage requirements and hardware requirements are higher for Deepseek-R1–671b model when compared to QwQ-32b model.

Based on the evaluation, it is understood that the QwQ model is performing mostly very well for real-world tasks.

You can also try using the QwQ reasoning model in Qwen Chat at https://chat.qwen.ai/

NOTE — Referenced this video for some information: https://youtu.be/W85kbOduL8c?si=058s4_cmslrhRAxk

Happy Learning !

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

While many AI coding tools operate as simple command-response systems, Qodo Gen 1.0 represents the next generation: autonomous, multi-step problem-solving agents that work alongside you.

Read full post →

Top comments (0)

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay