Originally published on Remote OpenClaw.
Qwen3 Max is the best Qwen model for Hermes Agent when using Alibaba's DashScope API, delivering strong reasoning and tool calling at $0.78 per million input tokens and $3.90 per million output tokens. For local deployments, Qwen3 8B via Ollama runs Hermes Agent at zero marginal cost on a machine with 8GB RAM. The entire Qwen3 lineup is open-source under Apache 2.0, giving teams the flexibility to run models locally, self-host on their own infrastructure, or access them through Alibaba's cloud — a breadth of deployment options that proprietary models cannot match.
Key Takeaways
- Qwen3 Max ($0.78/$3.90 per million tokens) is the best cloud Qwen model for Hermes Agent — strong reasoning and tool calling.
- Qwen3 8B runs locally via Ollama on 8GB RAM with zero API cost — ideal for privacy-focused or offline agents.
- Qwen3 235B-A22B (MoE) is the flagship open model: 235B total params, 22B active, available through DashScope or self-hosted.
- All Qwen3 models are Apache 2.0 licensed — full freedom to self-host, fine-tune, and modify.
- Two connection paths: DashScope API (cloud, OpenAI-compatible) or Ollama (local, no API key needed).
In this guide
- Qwen Models Ranked for Hermes Agent
- DashScope API Setup for Hermes
- Local Qwen via Ollama Setup
- Open-Source Advantage for Self-Hosted Agents
- Qwen vs Other Hermes Providers
- Limitations and Tradeoffs
- FAQ
Qwen Models Ranked for Hermes Agent
Alibaba's Qwen3 series offers models from 0.6B to 235B parameters, all released under Apache 2.0. For Hermes Agent, the relevant models span three tiers: flagship cloud API models for maximum quality, mid-size local models for balanced performance, and lightweight models for resource-constrained deployments. Every model below exceeds Hermes Agent's 64K minimum context requirement.
Model
Parameters
Context
Cost (DashScope)
Ollama
Best For
Undisclosed
128K
$0.78/$3.90
No
Flagship cloud reasoning, complex tasks
235B (22B active)
128K
Via DashScope
Needs 48GB+ VRAM
Self-hosted flagship, MoE efficiency
Qwen3 32B
32B (dense)
128K
Via DashScope
Needs 20GB+ RAM
Strong local reasoning, coding
Qwen3 8B
8B (dense)
128K
Via DashScope
Needs 8GB RAM
Best local model for most hardware
Qwen3 30B-A3B
30B (3B active)
128K
Via DashScope
Needs 4GB RAM
Ultra-efficient MoE local model
Qwen3 4B
4B (dense)
128K
Via DashScope
Needs 4GB RAM
Minimal hardware, basic tasks
Qwen3 Max is the recommended choice for cloud deployments — it consistently performs well on agentic benchmarks and its pricing undercuts Claude Sonnet by roughly 4x on input tokens. For local deployments, Qwen3 8B is the sweet spot: it fits comfortably in 8GB RAM, supports tool calling in both thinking and non-thinking modes, and produces results that meaningfully exceed what smaller 4B models can achieve. The 30B-A3B MoE variant is a strong alternative that activates only 3B parameters per token, running nearly as fast as the 4B dense model while accessing 30B total parameters.
DashScope API Setup for Hermes
DashScope is Alibaba Cloud's model serving platform and provides an OpenAI-compatible API endpoint, which means Hermes Agent can connect to it using the custom provider configuration. No special SDK or plugin is required.
Step 1: Get Your DashScope API Key
Sign up at Alibaba Cloud Model Studio. Navigate to the API Keys section and generate a key. As of April 2026, new accounts receive free credits for Qwen model usage.
Step 2: Configure config.yaml
Since DashScope is OpenAI-compatible, configure it as a custom provider in Hermes Agent with the DashScope base URL:
# ~/.hermes/config.yaml
model:
default: qwen3-max
provider: custom
base_url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
api_key_env: DASHSCOPE_API_KEY
Step 3: Set the API Key
hermes config set DASHSCOPE_API_KEY your-api-key-here
The base_url must include /v1 because the OpenAI Python SDK appends /chat/completions directly. For international users, use the dashscope-intl endpoint. For users in mainland China, replace with https://dashscope.aliyuncs.com/compatible-mode/v1.
For complete Hermes Agent installation and general setup, see our Hermes Agent setup guide.
Local Qwen via Ollama Setup
Hermes Agent auto-detects models installed through Ollama and includes per-model tool call parsers optimized for local models. Running Qwen locally means zero API cost, complete data privacy, and no rate limits — at the tradeoff of requiring adequate hardware.
Step 1: Install Ollama and Pull a Qwen Model
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# Pull Qwen3 8B (recommended for most machines)
ollama pull qwen3:8b
# Or pull the efficient MoE variant (runs fast on limited hardware)
ollama pull qwen3:30b-a3b
Step 2: Configure Hermes Agent
Hermes Agent detects Ollama automatically. Run hermes model and select Ollama from the provider list — it will show all locally installed models. Or configure manually:
# ~/.hermes/config.yaml
model:
default: qwen3:8b
provider: ollama
No API key is needed for Ollama. The default Ollama endpoint is http://localhost:11434. If you run Ollama on a different host or port, specify it:
model:
default: qwen3:8b
provider: custom
base_url: http://your-server:11434/v1
Hardware Requirements
Model
Min RAM
Recommended RAM
Speed (tokens/sec)
Qwen3 4B
4GB
8GB
~30-50 on CPU
Qwen3 8B
8GB
16GB
~20-40 on CPU
Qwen3 30B-A3B
4GB
8GB
~25-45 on CPU (MoE)
Qwen3 32B
20GB
32GB
~10-20 on CPU
Apple Silicon Macs with unified memory are particularly well-suited for local Qwen models — an M2 or M3 MacBook with 16GB runs Qwen3 8B comfortably with GPU acceleration through Ollama's Metal support.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Open-Source Advantage for Self-Hosted Agents
Every Qwen3 model is released under Apache 2.0, the most permissive standard open-source license. This is not a "source-available" or "research-only" license — Apache 2.0 grants full commercial use, modification, distribution, and private deployment rights with no restrictions beyond attribution.
For Hermes Agent deployments, the open-source advantage plays out in several ways:
- Complete data sovereignty. Run Qwen locally via Ollama or on your own server via vLLM or SGLang. No data leaves your network. This is a hard requirement for legal, healthcare, financial, and government agent deployments.
- No API dependency. Your Hermes Agent keeps working if Alibaba's API has downtime, changes pricing, or deprecates a model. The weights are yours to keep and serve indefinitely.
- Fine-tuning for domain tasks. Unlike proprietary models, you can fine-tune Qwen3 on your domain data (legal documents, codebase patterns, internal knowledge) and serve the tuned model through Ollama. Hermes Agent connects to it identically.
- Cost at scale. For teams running multiple Hermes agents or processing high volumes, self-hosting Qwen3 on a GPU server eliminates per-token costs entirely. The breakeven point depends on hardware costs, but for sustained workloads it typically arrives within weeks.
Qwen is not the only open-source option for Hermes — Llama 4 and Mistral are also available via Ollama. But Qwen3's Apache 2.0 license is less restrictive than Llama's custom license, and the model size range (0.6B to 235B) offers more granularity for matching model size to available hardware. For a broader look at Qwen models beyond Hermes, see our Qwen models overview for 2026. For Qwen configuration in OpenClaw specifically, see the Qwen models for OpenClaw guide.
Qwen vs Other Hermes Providers
Qwen3 Max competes with cloud API models from Anthropic, OpenAI, and DeepSeek, while Qwen3's local models compete with Llama 4 Maverick and Mistral. The table below compares across the dimensions that matter most for Hermes Agent users.
Model
Input/Output Cost
Context
License
Local Option
Agent Strength
Qwen3 Max
$0.78/$3.90
128K
Apache 2.0
No
Strong reasoning, bilingual
Qwen3 8B (Ollama)
Free (local)
128K
Apache 2.0
Yes (8GB)
Privacy, zero cost
Claude Sonnet 4.6
$3.00/$15.00
200K
Proprietary
No
Best reasoning and tool calling
DeepSeek V4
$0.30/$0.50
1M
MIT
No (too large)
Budget coding, huge context
Llama 4 Maverick
Free (local)
1M
Llama License
Yes (16GB+)
Local privacy, large context
Qwen3 Max's pricing ($0.78 input) places it between DeepSeek V4 ($0.30) and GLM-5.1 ($0.95), making it a mid-budget cloud option with strong bilingual capability. For local deployments, Qwen3 8B requires less RAM than Llama 4 Maverick (8GB vs 16GB+) while the Apache 2.0 license offers more permissive terms than Llama's custom license. For overall rankings across all providers, see our best models for Hermes Agent guide.
Limitations and Tradeoffs
Qwen models have specific constraints to consider before selecting them for Hermes Agent.
- DashScope requires custom endpoint config. Unlike Anthropic, OpenAI, or MiniMax, Alibaba's DashScope is not a first-class Hermes provider. You must configure it as a custom OpenAI-compatible endpoint with the correct base URL. This adds a setup step that other providers skip.
- Local model quality ceiling. Qwen3 8B is capable but cannot match the reasoning depth of cloud models like Claude Sonnet 4.6 or GPT-4.1. Complex multi-step agent tasks — multi-file code generation, nuanced research synthesis — will produce noticeably weaker results on small local models.
- 128K context maximum. All Qwen3 models cap at 128K tokens. This exceeds Hermes Agent's 64K minimum but falls short of DeepSeek V4 (1M), GPT-4.1 (1M), and MiniMax-Text-01 (4M). For memory-heavy or document-heavy workflows, the 128K ceiling is a real constraint.
- DashScope latency outside Asia. Alibaba Cloud's primary data centers are in Asia. Users in North America or Europe may experience higher API latency compared to US-based providers. OpenRouter availability for Qwen models can mitigate this.
- Ollama tool call parsing. While Hermes Agent includes per-model tool call parsers for Ollama models, local Qwen tool calling can be less reliable than cloud API tool calling. Test your specific workflow before deploying to production with a local Qwen model.
Related Guides
- Best AI Models for Hermes Agent in 2026
- How to Install and Set Up Hermes Agent
- Best Qwen Models for OpenClaw
- Best Qwen Models in 2026
FAQ
How do I connect Qwen3 to Hermes Agent via DashScope?
Configure Hermes Agent with a custom provider pointing to DashScope's OpenAI-compatible endpoint. In ~/.hermes/config.yaml, set provider: custom, base_url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1, and default: qwen3-max. Set your API key with hermes config set DASHSCOPE_API_KEY your-key. The /v1 suffix in the base URL is required because the OpenAI SDK appends /chat/completions directly.
Can I run Qwen3 locally with Hermes Agent using Ollama?
Yes. Install Ollama, pull a Qwen3 model with ollama pull qwen3:8b, then set provider: ollama and default: qwen3:8b in your Hermes config.yaml. No API key is needed. Hermes Agent auto-detects Ollama models and includes optimized tool call parsers for local models. Qwen3 8B requires approximately 8GB RAM and runs on macOS, Linux, and WSL2.
Which Qwen3 model should I choose for local Hermes Agent?
Qwen3 8B is the recommended local model for most hardware — it balances reasoning quality with resource requirements (8GB RAM minimum). For machines with only 4GB RAM, use the Qwen3 30B-A3B MoE variant, which activates only 3B parameters per token and runs nearly as fast as the 4B model while accessing more total knowledge. For machines with 20GB+ RAM, Qwen3 32B offers substantially better reasoning.
Is Qwen3 Max better than Claude Sonnet for Hermes Agent?
Claude Sonnet 4.6 generally outperforms Qwen3 Max on reasoning quality and tool calling reliability in Hermes Agent. Hermes's tool call parsers are most thoroughly tested with Anthropic models. However, Qwen3 Max costs roughly 4x less on input tokens ($0.78 vs $3.00) and offers native Chinese-English bilingual quality that Claude does not match. Choose Qwen3 Max when cost or bilingual capability is the priority; choose Claude when reasoning quality matters most.
Can I fine-tune Qwen3 and use the tuned model with Hermes Agent?
Yes. Since Qwen3 is Apache 2.0 licensed, you can fine-tune any Qwen3 model on your own data, then serve it through Ollama, vLLM, or SGLang. Point Hermes Agent at the local endpoint using the custom provider configuration. This is useful for domain-specific agents — for example, fine-tuning on legal documents, medical records, or internal codebases to improve the agent's performance on specialized tasks.
Top comments (0)