zac

Posted on Apr 14 • Originally published at remoteopenclaw.com

GPT-OSS 20B on OpenClaw: OpenAI's Free Open-Weight Model

#claude #ai #productivity #tutorial

Originally published on Remote OpenClaw.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Join the Community

Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.

Join the Community →

What Is GPT-OSS 20B?

GPT-OSS 20B is OpenAI's first open-weight model, released in August 2025 under the Apache 2.0 license. After years of keeping all model weights proprietary, OpenAI entered the open-source arena with a model that was deliberately designed to compete with Llama, Qwen, and other community favorites.

The name "OSS" stands for Open Source Software, and OpenAI chose the Apache 2.0 license — the most commercially permissive option — to signal serious intent. You can download the weights, run them locally, fine-tune them for your domain, and build commercial products without any restrictions or royalties.

What makes GPT-OSS 20B remarkable is not just that it is free, but that it is genuinely good. It matches o3-mini — OpenAI's paid reasoning model — on most coding and reasoning benchmarks. For OpenClaw operators, this means you can run an agent powered by OpenAI-quality inference at zero cost, either locally on your laptop or free on OpenRouter.

The Mixture of Experts architecture is the key to its efficiency. With 21 billion total parameters but only 3.6 billion active per forward pass, GPT-OSS has the knowledge of a 20B model but the compute requirements of a 4B model. This makes it one of the most hardware-efficient models available, running comfortably on 16GB consumer devices.

Why OpenAI Went Open Source

OpenAI's decision to release an open-weight model was driven by competitive pressure. By mid-2025, the open-source ecosystem — led by Meta's Llama, Alibaba's Qwen, and DeepSeek — had captured a significant share of the developer market. Many startups and individual developers were building on open models, never touching the OpenAI API.

GPT-OSS 20B is OpenAI's answer: a model good enough to compete with community favorites, carrying the OpenAI brand, and serving as an on-ramp to their paid ecosystem. Developers who start with GPT-OSS often upgrade to GPT-5.3 Codex or GPT-5.4 for production — exactly as intended.

For OpenClaw operators, the motivation does not matter — the result does. GPT-OSS 20B is a high-quality, free, commercially-licensed model from the world's most recognized AI lab. That is a useful tool regardless of why it exists.

Architecture and Specifications

Specification

Value

Total Parameters

21 billion

Active Parameters

3.6 billion per forward pass

Architecture

Mixture of Experts (MoE)

Developer

OpenAI

Release Date

August 2025

License

Apache 2.0

Context Window

128K tokens

Modalities

Text only

RAM Required (local)

16GB (q4 quantization)

Disk Space

~12GB (q4 quantization)

OpenRouter Price

FREE

The 3.6B active parameters is the number that matters for hardware planning. While the model has 21B total parameters stored on disk (~12GB in q4), only 3.6B are computed per token. This means inference is extremely fast on consumer hardware — comparable to running a 4B dense model, but with the accuracy of a much larger one.

Benchmarks: Matching o3-mini

Benchmark

GPT-OSS 20B

o3-mini (paid)

Context

HumanEval

87.2%

88.5%

Near-identical code generation

MMLU

82.1%

83.4%

Close on broad knowledge

AIME 2024

78.3%

80.1%

Solid mathematical reasoning

GSM8K

91.5%

92.0%

Nearly identical math problem-solving

SWE-bench Lite

45.2%

47.8%

Respectable for a free 20B model

The benchmark story is clear: GPT-OSS 20B consistently comes within 1-3 percentage points of o3-mini across all major benchmarks. For a free model that runs on a laptop, this is exceptional. The SWE-bench Lite score of 45.2% is lower than frontier models (Claude Opus hits 80.8% on the full SWE-bench), but for a 3.6B-active model, it handles routine coding tasks competently.

The practical implication: if o3-mini was "good enough" for your coding tasks before, GPT-OSS 20B will be good enough too — and it costs nothing.

Setup Method 1: Ollama (Local, Free)

Running GPT-OSS 20B locally gives you completely free, private, offline inference with no API dependency.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Step 2: Pull GPT-OSS 20B

# Pull the model (~12GB download)
ollama pull gpt-oss:20b

# Verify it downloaded
ollama list

Step 3: Test the Model

# Interactive chat
ollama run gpt-oss:20b

# Test with a coding prompt
ollama run gpt-oss:20b "Write a Python function to parse CSV files with error handling"

Step 4: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: ollama
  model: gpt-oss:20b
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 8192

Step 5: Start OpenClaw

# Make sure Ollama is running
ollama serve &

# Start OpenClaw
openclaw start

The entire setup takes under 10 minutes, most of which is the 12GB model download. Once running, you have a free, private, offline coding agent powered by OpenAI technology.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Setup Method 2: OpenRouter (Cloud, Free)

If you do not want to run models locally, OpenRouter hosts GPT-OSS 20B for free — no credits required.

Step 1: Create an OpenRouter Account

Step 2: Generate an API Key

Create an API key from the OpenRouter dashboard.

Step 3: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: openai/gpt-oss-20b:free
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 8192

Step 4: Start OpenClaw

openclaw start

The OpenRouter free tier gives you 20 requests per minute. For development, testing, and light production, this is plenty. For higher volume, add $5 in credits to remove rate limits (GPT-OSS remains free — credits just lift the rate cap).

Local Performance Expectations

Hardware

Tokens/Second

Time for 500-word Response

MacBook Air M2 (16GB)

20-35 tok/s

~18 seconds

MacBook Pro M3 (32GB)

35-55 tok/s

~11 seconds

Desktop + RTX 3060 (12GB)

50-75 tok/s

~8 seconds

Desktop + RTX 4090 (24GB)

80-120 tok/s

~5 seconds

GPT-OSS 20B is notably faster than other models of similar total parameter count because only 3.6B parameters activate per token. On a MacBook Air M2, it runs 30-50% faster than Qwen3 8B (which is a dense 8B model with all parameters active). The MoE architecture gives you more knowledge at lower compute cost.

For comparison, the same model on OpenRouter delivers 60-100 tokens per second, so the cloud route is faster but adds network latency (~100-200ms per request). For interactive use cases, local may actually feel faster due to zero network overhead.

GPT-OSS vs Qwen3 8B vs Llama 3.3 8B

Metric

GPT-OSS 20B

Qwen3 8B

Llama 3.3 8B

Total Params

21B (MoE)

8B (dense)

Active Params

3.6B

RAM Required

16GB

HumanEval

87.2%

82.5%

84.1%

MMLU

82.1%

78.3%

79.8%

Languages

~15

119

Context Window

128K

32K

128K

Inference Speed

Fastest (3.6B active)

Moderate (8B active)

License

Apache 2.0

Llama License

Free on OpenRouter

Yes

Yes (32B version)

Yes (70B version)

GPT-OSS 20B wins on benchmarks and inference speed despite having fewer active parameters. It also has the largest context window (128K) of the three when running locally. Qwen3 8B wins on multilingual support (119 vs ~15 languages). Llama 3.3 8B has the most extensive community ecosystem with more fine-tuned variants available.

For OpenClaw coding agents running in English, GPT-OSS 20B is the strongest free local option. For multilingual agents, Qwen3 8B is better. For agents that need the broadest fine-tuned variant ecosystem, Llama remains the safe choice.

When GPT-OSS Is the Right Choice

Zero-budget coding agent: GPT-OSS 20B matches o3-mini on coding benchmarks. If you want an AI coding assistant for free, this is the strongest option — locally or on OpenRouter.
Development and testing: Use GPT-OSS during development to avoid API costs. Its o3-mini-level performance means your agent behaves similarly to how it will with paid models, giving you a realistic testing environment.
Privacy-sensitive workflows: Run locally via Ollama with no data leaving your machine. Apache 2.0 license means no usage reporting or telemetry.
Low-latency local agent: With only 3.6B active parameters, GPT-OSS is one of the fastest models you can run locally. For agents that need quick responses — interactive chat, real-time coding assistance — the speed advantage over dense 8B models is noticeable.
Gateway to the GPT ecosystem: If you are already using OpenAI's paid models, GPT-OSS gives you a free tier for non-critical tasks, keeping your spending focused on the requests that need GPT-5.4 or Codex quality.

Frequently Asked Questions

Is GPT-OSS 20B really from OpenAI?

Yes. GPT-OSS 20B is OpenAI's first open-weight model release, published in August 2025 under the Apache 2.0 license. It represents a strategic shift for OpenAI, which had previously kept all model weights proprietary. The model is available on HuggingFace, Ollama, and free on OpenRouter. OpenAI has confirmed it in official communications and the weights are distributed through their verified accounts.

Can I run GPT-OSS 20B on my laptop?

Yes, if you have 16GB of RAM. GPT-OSS 20B uses a Mixture of Experts architecture with 21 billion total parameters but only 3.6 billion active per forward pass. This means the actual compute footprint is similar to a 4B model, making it lightweight enough for consumer hardware. On a 16GB MacBook, expect 20-40 tokens per second. On 8GB machines, it will run but may be slow due to memory pressure.

How does GPT-OSS 20B compare to o3-mini?

GPT-OSS 20B matches o3-mini on most coding and reasoning benchmarks, which is remarkable for a free open-weight model. The key differences: o3-mini has a larger context window, slightly better performance on complex multi-step reasoning tasks, and is only available through the paid OpenAI API. GPT-OSS 20B is free everywhere — Ollama, OpenRouter, self-hosted. For most OpenClaw agent tasks, the performance difference is negligible.

Why would OpenAI release a free model?

OpenAI released GPT-OSS 20B as a strategic move to compete with the open-source ecosystem (Llama, Qwen, DeepSeek) that was eroding their developer mindshare. By releasing a competitive free model, OpenAI keeps developers in their ecosystem — many who start with GPT-OSS eventually upgrade to paid GPT-5 variants for production. It also generates goodwill and demonstrates that OpenAI can compete on open weights, not just proprietary APIs.

DEV Community