The dream of running your own AI assistant on dedicated hardware is now reality. OpenClaw has emerged as the go-to solution for self-hosted AI agents, and the Mac Mini M4 has become the hardware of choice.
I spent the last few weeks testing different configurations, models, and setups. Here's what actually works.
Why Mac Mini for Local AI?
Apple Silicon changed the game. Unified memory means no bottleneck between CPU and GPU—everything shares one pool. A Mac Mini M4 Pro with 64GB can run 32B parameter models at 10-15 tokens per second while drawing only 20-40W.
Compare that to an RTX 4090 setup pulling 500W+ and screaming at you through GPU fans.
Hardware Sweet Spots
Budget ($800): Mac Mini M4 24GB — runs 7-8B models, good for experimentation
Recommended ($2,000): Mac Mini M4 Pro 64GB — runs 32B models, the practical choice for most developers
Enthusiast ($10,000): Mac Studio M3 Ultra 512GB — can technically run Kimi K2 at 1-2 tokens/second
The $2,000 M4 Pro is where most developers should stop. It handles Qwen3-Coder-30B and GLM-4.7-Flash without breaking a sweat.
Best Models for Agentic Coding
After testing dozens of models:
GLM-4.7-Flash — The current recommendation for OpenClaw and Claude Code. 128K context, excellent tool-calling, runs on 24GB+.
Qwen3-Coder-30B-A3B — MoE architecture means only 3B parameters active at inference. Fast despite 30B total size. Needs 64GB.
GPT-OSS-20B — OpenAI's first open-weights model. Just works everywhere.
The Kimi K2 Reality Check
Everyone wants to run Kimi K2 locally. The truth: its 1 trillion parameters require 250GB+ just for weights. Even quantized to 1.8-bit, you need a $10,000 Mac Studio and get 1-2 tokens/second.
Jeff Geerling got 28 tokens/second... using four Mac Studios connected via RDMA. That's a $40,000 cluster.
For most of us, Kimi K2 is better accessed via API.
Claude Code + Local Models
Since Ollama v0.14.0 (January 2026), Claude Code works with local models:
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
claude --model qwen3-coder:30b
One command: ollama launch claude
Performance is slower than API calls, but you get complete privacy and zero ongoing costs.
This is the teaser. The full guide covers:
- Step-by-step OpenClaw installation with Telegram/WhatsApp
- Ollama configuration for 64K+ context
- Model routing and hybrid cloud/local setups
- Performance benchmarks by hardware tier
- Troubleshooting common issues
👉 Read the complete guide on my blog
What's your local AI setup? Running anything interesting? Drop a comment.
Top comments (0)