Phil Rentier Digital

Posted on Mar 23 • Originally published at rentierdigital.xyz

I Tried Running Openclaw (ex Clawdbot) with a Free LLM. Here’s What Happened.

#programming #ai #claudecode #llm

Sounds great. I have a headless server. I have Ollama. I have dreams.

Here’s how reality crushed them — and how I eventually won.

⚡ Update (Feb 20, 2026): The free LLM landscape moved fast since January. Three things changed:

1. Anthropic banned Claude Max tokens in OpenClaw. If you were running on your Max subscription, that’s over. I rebuilt mine for $15/month using Kimi K2.5 + MiniMax M2.5 fallback.

2. New models entered the ring. Qwen 3.5 (Alibaba) — native agentic capabilities, $0.40/M input tokens. DeepSeek V3.2 “Speciale” — 88.7% on LiveCodeBench, MIT license, $0.28/M input. Both OpenRouter-compatible, plug straight into OpenClaw.

3. NVIDIA published an official guide for running OpenClaw locally on RTX GPUs with Ollama. If you have an RTX card, LM Studio + 7B model = truly $0.

My current pick: Kimi K2.5 via OpenRouter for near-Claude quality at pennies. Ollama + Qwen 3.5 locally for $0.

TL;DR: Running OpenClaw with free LLMs is possible but painful — expect hours debugging configs like "api": "openai-completions" vs "openai" and finding models that don't take 7 minutes to say "OK." You'll get working configs for Ollama + Qwen 2.5, minimum specs that actually matter (16GB RAM, 32k context window), and the brutal truth about whether your potato server can handle local AI without melting.

When your free AI setup requires more config than NASA's mission control.

Act 1: The Config Wilderness

The official docs said:

{

"api": "openai"

}

My server said:

Invalid input

Turns out the actual magic words are:

{

"api": "openai-completions"

}

One hyphen. Three hours of my life.

Act 2: The Model That Could(n’t)

First attempt: qwen2.5:7b — a respectable 7 billion parameters.

Time to respond to “Say OK”: 7 minutes.

My mass-produced Chinese rice cooker has better inference speed.

Act 3: The Context Window Betrayal

“Fine,” I said. “I’ll use TinyLlama. It’s tiny. It’s a llama. What could go wrong?”

FailoverError: Model context window too small (2048 tokens). Minimum is 16000.

Clawdbot requires a PhD-level attention span. TinyLlama has the memory of a goldfish.

Act 4: The Goldilocks Model

Finally: qwen2.5:1.5b

Size: 986 MB (not too big)
Context: 32k tokens (not too small)
Speed: Actually responds before my coffee gets cold
Quality: Hallucinates a bit, but who doesn’t?

The Working Config

{

"models": {

"mode": "merge",

"providers": {

"ollama": {

"baseUrl": "http://127.0.0.1:11434/v1",

"apiKey": "ollama-local",

"api": "openai-completions",

"models": [{

"id": "qwen2.5:1.5b",

"name": "Qwen 2.5 1.5B",

"reasoning": false,

"input": ["text"],

"contextWindow": 32768,

"maxTokens": 8192,

"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }

}]

}

}

},

"agents": {

"defaults": {

"model": { "primary": "ollama/qwen2.5:1.5b" }

}

}

}

Save to: ~/.clawdbot/clawdbot.json AND ~/.clawdbot/agents/main/agent/models.json

Yes, both. Don’t ask.

The Commands That Actually Work

# Install the model

ollama pull qwen2.5:1.5b

# Test directly (bypass gateway complexity)

clawdbot agent --agent main --local --message "Hello"

# Or with gateway

clawdbot gateway &

clawdbot agent --agent main --message "Hello"

# Interactive TUI

clawdbot tui

The Honest Truth

What they promised vs. what you get:

“Free AI” → Free if your time is worthless
“Local privacy” → Actually true ✓
“Fast responses” → Depends on your definition of “fast”
“Easy setup” → api: "openai-completions" (not "openai")

Should You Do This?

Yes, if:

You have a GPU (even a modest one)
You enjoy debugging configs at 2 AM
You value privacy over speed
You find corporate AI pricing offensive

No, if:

You have a CPU-only potato server
You expect ChatGPT-level responses
You value your sanity

The Real Minimum Specs

RAM: 8 GB minimum, 16 GB recommended
Model: qwen2.5:1.5b minimum, qwen2.5:7b + GPU recommended
Context window: 16k+ required
Patience: Infinite

Need a VPS That Can Actually Handle This?

If you’re tired of running AI on a potato, a proper VPS makes all the difference. I recommend starting with at least 8GB RAM and some decent CPU cores.

👉 Get a VPS with extra bonus here

Written by someone who mass-retry’d configs until something worked. You’re welcome.