DEV Community

Marco
Marco

Posted on • Originally published at marc0.dev

OpenClaw + Mac Mini: Run Your Own 24/7 AI Agent Locally

The dream of running your own AI assistant on dedicated hardware is now reality. OpenClaw has emerged as the go-to solution for self-hosted AI agents, and the Mac Mini M4 has become the hardware of choice.

I spent the last few weeks testing different configurations, models, and setups. Here's what actually works.

Why Mac Mini for Local AI?

Apple Silicon changed the game. Unified memory means no bottleneck between CPU and GPU—everything shares one pool. A Mac Mini M4 Pro with 64GB can run 32B parameter models at 10-15 tokens per second while drawing only 20-40W.

Compare that to an RTX 4090 setup pulling 500W+ and screaming at you through GPU fans.

Hardware Sweet Spots

Budget ($800): Mac Mini M4 24GB — runs 7-8B models, good for experimentation

Recommended ($2,000): Mac Mini M4 Pro 64GB — runs 32B models, the practical choice for most developers

Enthusiast ($10,000): Mac Studio M3 Ultra 512GB — can technically run Kimi K2 at 1-2 tokens/second

The $2,000 M4 Pro is where most developers should stop. It handles Qwen3-Coder-30B and GLM-4.7-Flash without breaking a sweat.

Best Models for Agentic Coding

After testing dozens of models:

GLM-4.7-Flash — The current recommendation for OpenClaw and Claude Code. 128K context, excellent tool-calling, runs on 24GB+.

Qwen3-Coder-30B-A3B — MoE architecture means only 3B parameters active at inference. Fast despite 30B total size. Needs 64GB.

GPT-OSS-20B — OpenAI's first open-weights model. Just works everywhere.

The Kimi K2 Reality Check

Everyone wants to run Kimi K2 locally. The truth: its 1 trillion parameters require 250GB+ just for weights. Even quantized to 1.8-bit, you need a $10,000 Mac Studio and get 1-2 tokens/second.

Jeff Geerling got 28 tokens/second... using four Mac Studios connected via RDMA. That's a $40,000 cluster.

For most of us, Kimi K2 is better accessed via API.

Claude Code + Local Models

Since Ollama v0.14.0 (January 2026), Claude Code works with local models:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
claude --model qwen3-coder:30b
Enter fullscreen mode Exit fullscreen mode

One command: ollama launch claude

Performance is slower than API calls, but you get complete privacy and zero ongoing costs.


This is the teaser. The full guide covers:

  • Step-by-step OpenClaw installation with Telegram/WhatsApp
  • Ollama configuration for 64K+ context
  • Model routing and hybrid cloud/local setups
  • Performance benchmarks by hardware tier
  • Troubleshooting common issues

👉 Read the complete guide on my blog


What's your local AI setup? Running anything interesting? Drop a comment.

Top comments (0)