DEV Community

zac
zac

Posted on • Originally published at remoteopenclaw.com

Best Open-Source Models for OpenClaw — Run Locally, No API Costs

Originally published on Remote OpenClaw.

The best open-source model for most OpenClaw operators running locally in April 2026 is Qwen3.5 — it ships sizes from 0.8B to 397B, supports 256K context, and the 27B variant fits comfortably on a 24GB GPU with Q4_K_M quantization. For coding-heavy workflows, DeepSeek-R1-Distill-32B offers the strongest reasoning at that VRAM tier. For multimodal tasks, Llama 4 Scout provides a 10M context window and runs on a single H100.

Key Takeaways

  • Qwen3.5:27b is the best all-round local model for OpenClaw — 256K context, strong agentic performance, and 24GB VRAM with Q4 quantization.
  • DeepSeek-R1-Distill-32B delivers the best local reasoning performance, outperforming OpenAI o1-mini on multiple benchmarks.
  • Llama 4 Scout (17B active, 16 experts) offers a 10M context window and beats Gemma 3 and Gemini 2.0 Flash-Lite on broad benchmarks.
  • Gemma 4 from Google is the newest entrant (April 2026), optimized for running on devices from phones to workstations.
  • Hardware matters more than model choice — set Ollama to at least 64K context for OpenClaw, which means Q4_K_M quantization is the practical default for most operators.

Part of The Complete Guide to OpenClaw — the full reference covering setup, security, memory, and operations.

In this guide

  1. Open-Source Model Rankings by Task Type
  2. Full Comparison Table
  3. Hardware Requirements and VRAM Guide
  4. Ollama Setup for OpenClaw
  5. Which Model Should You Pick?
  6. Limitations and Tradeoffs
  7. FAQ

Open-Source Model Rankings by Task Type

Open-source models for OpenClaw split into distinct strength categories as of April 2026. No single model leads everywhere, so the right pick depends on what your agent actually does.

General-Purpose Agent Work

Qwen3.5 is the strongest all-round open-source family for OpenClaw. The 27B variant scores competitively with proprietary models on agentic benchmarks, and the full 397B-A17B MoE flagship surpasses Qwen3-235B-A22B despite using fewer active parameters. Alibaba's Qwen platform provides both local weights and API access with international endpoints in Singapore, Frankfurt, and Virginia.

Reasoning and Math

DeepSeek-R1-Distill-32B outperforms OpenAI o1-mini across multiple reasoning benchmarks and is the strongest local reasoning model you can run on a 24GB GPU. The full R1 scores 79.8% on AIME 2024 and 97.3% on MATH-500 — the distilled 32B retains most of that capability. Weights are available on Hugging Face and through Ollama.

Code Generation

Codestral from Mistral (22B parameters) supports 80+ languages with 256K context and scores 86.6% on HumanEval. It is the most efficient dedicated coding model for local deployment. For broader code tasks, Qwen3-Coder:30b is a strong alternative with deeper agentic integration.

Multimodal (Text + Image)

Llama 4 Scout is Meta's first natively multimodal open model, released April 5, 2026. With 17B active parameters and 16 experts, it handles both text and image input with a 10M context window. According to Meta's announcement, it outperforms Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across broad benchmarks.

Small / Edge Deployment

Phi-4 from Microsoft (14B parameters) specializes in complex reasoning and scores 84.8 on MMLU and 82.6 on HumanEval — remarkable for its size. Phi-4-mini adds multilingual support in 20+ languages, and the model runs comfortably on 8GB VRAM.


Full Comparison Table

As of April 2026, these are the leading open-source models for OpenClaw, ranked by their primary strength.

Model

Developer

Parameters

Context

Best For

Min VRAM (Q4)

Key Benchmark

Qwen3.5:27b

Alibaba

27B

256K

General agent

~18GB

BFCL-V4: 72.2 (tool use)

DeepSeek-R1:32b

DeepSeek

32B

64K

Reasoning

~22GB

AIME 2024: 79.8%

Llama 4 Scout

Meta

17B active (MoE)

10M

Multimodal

~20GB

DocVQA: 94.4%

Gemma 4

Google

Up to 31B

128K

Balanced local

~20GB

Exceeds 400B rivals (per Google)

Codestral

Mistral

22B

256K

Code generation

~14GB

HumanEval: 86.6%

GLM-5

Zhipu AI

744B (44B active)

128K

Agentic + reasoning

~30GB

LMArena: #1 open model

Mistral Small 4

Mistral

119B (6B active)

256K

Multimodal reasoning

~6GB

Merged Magistral + Pixtral

Phi-4

Microsoft

14B

16K

Edge / small

~8GB

MMLU: 84.8

VRAM figures assume Q4_K_M quantization at default context. Extending to 64K context adds significant KV cache overhead — see the hardware section below.


Hardware Requirements and VRAM Guide

VRAM determines which models you can realistically run with OpenClaw. The model weights are only part of the equation — the KV cache for context length is the hidden cost that catches most operators.

At Q4_K_M quantization (the practical default for consumer hardware), an 8B model uses approximately 6-7GB for weights alone. But at 64K context — the minimum Ollama recommends for OpenClaw — the KV cache adds roughly 15-20GB, pushing total VRAM requirements far beyond what the model size suggests.

VRAM Available

Best Model Choice

Max Practical Context

Notes

8GB (RTX 4060, M1)

Phi-4 (14B Q4), Qwen3.5:9b

8-16K

Usable for simple tasks; 64K context not realistic

16GB (RTX 4080, M2 Pro)

Qwen3.5:14b, Codestral

16-32K

Functional for shorter agent sessions

24GB (RTX 4090, M3 Max)

Qwen3.5:27b, DeepSeek-R1:32b

32-64K

Sweet spot for serious local OpenClaw use

48GB+ (Dual GPU, M4 Ultra)

DeepSeek-R1:70b, Llama 4 Scout

64K+

Full capability; can sustain long agent sessions

For more detailed GPU optimization, see our GPU optimization guide for Ollama and OpenClaw.


Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Ollama Setup for OpenClaw

Ollama is the standard way to run open-source models locally with OpenClaw. The setup is straightforward, but the context length configuration is critical.

Install and Pull a Model

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the recommended general-purpose model
ollama pull qwen3.5:27b

# Pull the recommended reasoning model
ollama pull deepseek-r1:32b
Enter fullscreen mode Exit fullscreen mode

Set Context Length for OpenClaw

Ollama's documentation recommends at least 64K context for agent tools and coding workflows. OpenClaw falls squarely into that category. Without this setting, your agent will lose track of instructions and context mid-session.

# Run with 64K context (minimum for OpenClaw)
ollama run qwen3.5:27b --num-ctx 65536
Enter fullscreen mode Exit fullscreen mode

Point OpenClaw at Ollama

Once Ollama is running, configure OpenClaw to use it as the backend:

{
  "model": "qwen3.5:27b",
  "provider": "ollama",
  "baseUrl": "http://localhost:11434/v1"
}
Enter fullscreen mode Exit fullscreen mode

For a complete walkthrough, see our OpenClaw Ollama setup guide.


Which Model Should You Pick?

The right model depends on three variables: what your OpenClaw agent does, how much VRAM you have, and whether you can tolerate quality gaps versus proprietary models.

  • General-purpose agent work: Start with qwen3.5:27b. It has the best balance of capability, context window, and hardware requirements across the family.
  • Reasoning-heavy tasks: Use deepseek-r1:32b. Nothing else in the open-source local tier matches its math and logic performance.
  • Coding agents: Use codestral for focused code generation, or qwen3-coder:30b if you need broader agentic capabilities alongside code.
  • Budget hardware (8-16GB): Start with qwen3.5:9b or phi-4. Expect reduced capability compared to 27B+ models, but both are functional for lighter workflows.
  • Maximum local quality: If you have 48GB+ VRAM, deepseek-r1:70b or the full Llama 4 Scout gives you the closest experience to cloud API quality.

If local hardware becomes the bottleneck, consider the API route instead. The Ollama vs OpenRouter comparison covers when cloud makes more sense than forcing a weak local setup.


Limitations and Tradeoffs

Open-source local models have real limitations that OpenClaw operators should understand before committing.

  • Quality gap: Even the best open-source models trail frontier proprietary models on complex agentic tasks. Claude Opus 4.6 scores ~80% on SWE-bench Verified; the best open-source model (GLM-5) scores ~78%. For simpler tasks, the gap is much smaller.
  • Context vs VRAM tradeoff: Running 64K+ context locally requires serious hardware. An 8B model at 128K context can consume 20GB+ of VRAM just for the KV cache, leaving little room for the model weights themselves.
  • No guaranteed uptime: Local models depend on your hardware staying on and healthy. Cloud APIs offer reliability guarantees that local setups cannot match.
  • Update lag: Open-source models update less frequently than hosted APIs. When DeepSeek or Qwen release a new version, Ollama support may lag by days or weeks.
  • Quantization quality loss: Q4_K_M quantization typically loses less than 3% quality compared to full precision, but on edge cases and complex reasoning chains, the degradation can be more noticeable.

When not to go local: if you need guaranteed 99.9% uptime, if your workflows regularly exceed 64K context, or if your hardware cannot sustain the minimum VRAM requirements for your chosen model at 64K context.


Related Guides


FAQ

What is the best open-source model for OpenClaw in 2026?

Qwen3.5:27b is the best general-purpose open-source model for OpenClaw as of April 2026. It offers 256K context, strong tool-use performance (72.2 on BFCL-V4), and fits on a 24GB GPU with Q4_K_M quantization. For reasoning tasks specifically, DeepSeek-R1-Distill-32B is stronger.

How much VRAM do I need to run local models with OpenClaw?

For serious OpenClaw use, you need at least 24GB of VRAM (RTX 4090 or M3 Max). This lets you run 27-32B models at Q4 quantization with 32-64K context. An 8GB GPU can run smaller models like Phi-4 or Qwen3.5:9b, but context length will be limited to 8-16K.

Can I run open-source models for OpenClaw completely free?

Yes. All models listed in this guide have open weights that you can download and run through Ollama at zero API cost. The only cost is your hardware and electricity. Models like GLM-4.7-Flash and GLM-4.5-Flash are also available as free cloud APIs from Zhipu AI.

Which open-source model is best for coding with OpenClaw?

Codestral from Mistral (22B parameters) scores 86.6% on HumanEval with 256K context and is the most efficient dedicated coding model for local deployment. For broader agentic coding that includes debugging and repo-level work, Qwen3-Coder:30b offers stronger integration.

Should I use a local model or a cloud API with OpenClaw?

Use local models if you have 24GB+ VRAM, need data privacy, or want to avoid recurring API costs. Use cloud APIs if your hardware is limited, your workflows need 64K+ context reliably, or you need guaranteed uptime. Many operators use both — local for development, cloud for production.

Top comments (0)