How to Run a Fully Local AI Agent With OpenClaw and Ollama (Zero API Costs)

#agents #ai #llm #tutorial

Most developers are still burning $50-200/month on API calls for tasks a local model handles fine. This guide shows you how to build a fully local AI agent with OpenClaw and Ollama.

Originally published at TechPulse Lab.

The Local Model Tipping Point

Flash-MoE ran on a MacBook Pro with 192GB unified memory at 4.4 tok/s. Not a datacenter. A laptop.

Ollama now offers:

Qwen 3.5 27B - Excellent tool calling. 20GB RAM.
Llama 3.3 70B - Stronger reasoning. 48GB+.
GLM-4.7-Flash - Fast agentic tasks.

Local Models + Agent Frameworks

Local LLMs alone are chatbots. OpenClaw turns them into agents that do things: shell commands, file ops, web searches, API calls. It treats Ollama as a first-class backend.

Quick Setup

1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh

2. Pull model: ollama pull qwen3.5:27b

3. Install OpenClaw: npm install -g openclaw && openclaw onboard

4. Set model to ollama/qwen3.5:27b and baseURL to http://localhost:11434/v1

5. Test: openclaw chat and ask it to list your files

6. Connect Discord: openclaw connect discord

Models That Work

Qwen 3.5 27B (20GB, 8-15 tok/s) - Default choice
Llama 3.3 70B (48GB, 3-6 tok/s) - Better reasoning
GLM-4.7-Flash (8GB, 15-25 tok/s) - Speed priority

Cloud AI tax is optional. Full guide: TechPulse Lab

What is your local AI setup? Comment below.

DEV Community

How to Run a Fully Local AI Agent With OpenClaw and Ollama (Zero API Costs)

The Local Model Tipping Point

Local Models + Agent Frameworks

Quick Setup

Models That Work

Top comments (0)