Originally published on Remote OpenClaw.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Join the Community
Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.
What Is GPT-OSS 20B?
GPT-OSS 20B is OpenAI's first open-weight model, released in August 2025 under the Apache 2.0 license. After years of keeping all model weights proprietary, OpenAI entered the open-source arena with a model that was deliberately designed to compete with Llama, Qwen, and other community favorites.
The name "OSS" stands for Open Source Software, and OpenAI chose the Apache 2.0 license — the most commercially permissive option — to signal serious intent. You can download the weights, run them locally, fine-tune them for your domain, and build commercial products without any restrictions or royalties.
What makes GPT-OSS 20B remarkable is not just that it is free, but that it is genuinely good. It matches o3-mini — OpenAI's paid reasoning model — on most coding and reasoning benchmarks. For OpenClaw operators, this means you can run an agent powered by OpenAI-quality inference at zero cost, either locally on your laptop or free on OpenRouter.
The Mixture of Experts architecture is the key to its efficiency. With 21 billion total parameters but only 3.6 billion active per forward pass, GPT-OSS has the knowledge of a 20B model but the compute requirements of a 4B model. This makes it one of the most hardware-efficient models available, running comfortably on 16GB consumer devices.
Why OpenAI Went Open Source
OpenAI's decision to release an open-weight model was driven by competitive pressure. By mid-2025, the open-source ecosystem — led by Meta's Llama, Alibaba's Qwen, and DeepSeek — had captured a significant share of the developer market. Many startups and individual developers were building on open models, never touching the OpenAI API.
GPT-OSS 20B is OpenAI's answer: a model good enough to compete with community favorites, carrying the OpenAI brand, and serving as an on-ramp to their paid ecosystem. Developers who start with GPT-OSS often upgrade to GPT-5.3 Codex or GPT-5.4 for production — exactly as intended.
For OpenClaw operators, the motivation does not matter — the result does. GPT-OSS 20B is a high-quality, free, commercially-licensed model from the world's most recognized AI lab. That is a useful tool regardless of why it exists.
Architecture and Specifications
Specification
Value
Total Parameters
21 billion
Active Parameters
3.6 billion per forward pass
Architecture
Mixture of Experts (MoE)
Developer
OpenAI
Release Date
August 2025
License
Apache 2.0
Context Window
128K tokens
Modalities
Text only
RAM Required (local)
16GB (q4 quantization)
Disk Space
~12GB (q4 quantization)
OpenRouter Price
FREE
The 3.6B active parameters is the number that matters for hardware planning. While the model has 21B total parameters stored on disk (~12GB in q4), only 3.6B are computed per token. This means inference is extremely fast on consumer hardware — comparable to running a 4B dense model, but with the accuracy of a much larger one.
Benchmarks: Matching o3-mini
Benchmark
GPT-OSS 20B
o3-mini (paid)
Context
HumanEval
87.2%
88.5%
Near-identical code generation
MMLU
82.1%
83.4%
Close on broad knowledge
AIME 2024
78.3%
80.1%
Solid mathematical reasoning
GSM8K
91.5%
92.0%
Nearly identical math problem-solving
SWE-bench Lite
45.2%
47.8%
Respectable for a free 20B model
The benchmark story is clear: GPT-OSS 20B consistently comes within 1-3 percentage points of o3-mini across all major benchmarks. For a free model that runs on a laptop, this is exceptional. The SWE-bench Lite score of 45.2% is lower than frontier models (Claude Opus hits 80.8% on the full SWE-bench), but for a 3.6B-active model, it handles routine coding tasks competently.
The practical implication: if o3-mini was "good enough" for your coding tasks before, GPT-OSS 20B will be good enough too — and it costs nothing.
Setup Method 1: Ollama (Local, Free)
Running GPT-OSS 20B locally gives you completely free, private, offline inference with no API dependency.
Step 1: Install Ollama
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
Step 2: Pull GPT-OSS 20B
# Pull the model (~12GB download)
ollama pull gpt-oss:20b
# Verify it downloaded
ollama list
Step 3: Test the Model
# Interactive chat
ollama run gpt-oss:20b
# Test with a coding prompt
ollama run gpt-oss:20b "Write a Python function to parse CSV files with error handling"
Step 4: Configure OpenClaw
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: ollama
model: gpt-oss:20b
base_url: http://localhost:11434
temperature: 0.7
max_tokens: 8192
Step 5: Start OpenClaw
# Make sure Ollama is running
ollama serve &
# Start OpenClaw
openclaw start
The entire setup takes under 10 minutes, most of which is the 12GB model download. Once running, you have a free, private, offline coding agent powered by OpenAI technology.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Setup Method 2: OpenRouter (Cloud, Free)
If you do not want to run models locally, OpenRouter hosts GPT-OSS 20B for free — no credits required.
Step 1: Create an OpenRouter Account
Sign up at openrouter.ai with your email. No credit card needed.
Step 2: Generate an API Key
Create an API key from the OpenRouter dashboard.
Step 3: Configure OpenClaw
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: openai/gpt-oss-20b:free
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 8192
Step 4: Start OpenClaw
openclaw start
The OpenRouter free tier gives you 20 requests per minute. For development, testing, and light production, this is plenty. For higher volume, add $5 in credits to remove rate limits (GPT-OSS remains free — credits just lift the rate cap).
Local Performance Expectations
Hardware
Tokens/Second
Time for 500-word Response
MacBook Air M2 (16GB)
20-35 tok/s
~18 seconds
MacBook Pro M3 (32GB)
35-55 tok/s
~11 seconds
Desktop + RTX 3060 (12GB)
50-75 tok/s
~8 seconds
Desktop + RTX 4090 (24GB)
80-120 tok/s
~5 seconds
GPT-OSS 20B is notably faster than other models of similar total parameter count because only 3.6B parameters activate per token. On a MacBook Air M2, it runs 30-50% faster than Qwen3 8B (which is a dense 8B model with all parameters active). The MoE architecture gives you more knowledge at lower compute cost.
For comparison, the same model on OpenRouter delivers 60-100 tokens per second, so the cloud route is faster but adds network latency (~100-200ms per request). For interactive use cases, local may actually feel faster due to zero network overhead.
GPT-OSS vs Qwen3 8B vs Llama 3.3 8B
Metric
GPT-OSS 20B
Qwen3 8B
Llama 3.3 8B
Total Params
21B (MoE)
8B (dense)
8B (dense)
Active Params
3.6B
8B
8B
RAM Required
16GB
16GB
16GB
HumanEval
87.2%
82.5%
84.1%
MMLU
82.1%
78.3%
79.8%
Languages
~15
119
~8
Context Window
128K
32K
128K
Inference Speed
Fastest (3.6B active)
Moderate (8B active)
Moderate (8B active)
License
Apache 2.0
Apache 2.0
Llama License
Free on OpenRouter
Yes
Yes (32B version)
Yes (70B version)
GPT-OSS 20B wins on benchmarks and inference speed despite having fewer active parameters. It also has the largest context window (128K) of the three when running locally. Qwen3 8B wins on multilingual support (119 vs ~15 languages). Llama 3.3 8B has the most extensive community ecosystem with more fine-tuned variants available.
For OpenClaw coding agents running in English, GPT-OSS 20B is the strongest free local option. For multilingual agents, Qwen3 8B is better. For agents that need the broadest fine-tuned variant ecosystem, Llama remains the safe choice.
When GPT-OSS Is the Right Choice
- Zero-budget coding agent: GPT-OSS 20B matches o3-mini on coding benchmarks. If you want an AI coding assistant for free, this is the strongest option — locally or on OpenRouter.
- Development and testing: Use GPT-OSS during development to avoid API costs. Its o3-mini-level performance means your agent behaves similarly to how it will with paid models, giving you a realistic testing environment.
- Privacy-sensitive workflows: Run locally via Ollama with no data leaving your machine. Apache 2.0 license means no usage reporting or telemetry.
- Low-latency local agent: With only 3.6B active parameters, GPT-OSS is one of the fastest models you can run locally. For agents that need quick responses — interactive chat, real-time coding assistance — the speed advantage over dense 8B models is noticeable.
- Gateway to the GPT ecosystem: If you are already using OpenAI's paid models, GPT-OSS gives you a free tier for non-critical tasks, keeping your spending focused on the requests that need GPT-5.4 or Codex quality.
Frequently Asked Questions
Is GPT-OSS 20B really from OpenAI?
Yes. GPT-OSS 20B is OpenAI's first open-weight model release, published in August 2025 under the Apache 2.0 license. It represents a strategic shift for OpenAI, which had previously kept all model weights proprietary. The model is available on HuggingFace, Ollama, and free on OpenRouter. OpenAI has confirmed it in official communications and the weights are distributed through their verified accounts.
Can I run GPT-OSS 20B on my laptop?
Yes, if you have 16GB of RAM. GPT-OSS 20B uses a Mixture of Experts architecture with 21 billion total parameters but only 3.6 billion active per forward pass. This means the actual compute footprint is similar to a 4B model, making it lightweight enough for consumer hardware. On a 16GB MacBook, expect 20-40 tokens per second. On 8GB machines, it will run but may be slow due to memory pressure.
How does GPT-OSS 20B compare to o3-mini?
GPT-OSS 20B matches o3-mini on most coding and reasoning benchmarks, which is remarkable for a free open-weight model. The key differences: o3-mini has a larger context window, slightly better performance on complex multi-step reasoning tasks, and is only available through the paid OpenAI API. GPT-OSS 20B is free everywhere — Ollama, OpenRouter, self-hosted. For most OpenClaw agent tasks, the performance difference is negligible.
Why would OpenAI release a free model?
OpenAI released GPT-OSS 20B as a strategic move to compete with the open-source ecosystem (Llama, Qwen, DeepSeek) that was eroding their developer mindshare. By releasing a competitive free model, OpenAI keeps developers in their ecosystem — many who start with GPT-OSS eventually upgrade to paid GPT-5 variants for production. It also generates goodwill and demonstrates that OpenAI can compete on open weights, not just proprietary APIs.
Further Reading
- Best Ollama Models for OpenClaw — complete ranking including GPT-OSS, Qwen3, and Llama
- Free Models on OpenClaw via OpenRouter — all 29 free models ranked for agent use
- GPT-5.3 and GPT-5.4 on OpenClaw — when you are ready to upgrade from GPT-OSS
- OpenClaw Marketplace — free skills and AI personas to power your agent
Top comments (0)