DEV Community

Vilius
Vilius

Posted on

Running a local coding agent on a Mac Mini — the actual setup

Running a local coding agent on a Mac Mini

Running a local coding agent on a Mac Mini — the actual setup

By Vilius Vystartas

I have an agent that does my low-stakes coding. File edits, test fixes, build verification. The kind of work you'd normally do yourself but it's faster to delegate. It also writes Playwright tests, reviews code, updates documentation, and runs deploys.

It runs locally — Mac Mini M4, 24 GB. No cloud API calls for the coding part. The orchestration layer still uses a cheap cloud model for planning and routing. The actual file editing is done by Pi, a coding agent that connects to oMLX, an OpenAI-compatible local LLM server.

The same setup can drive Claude Code, Codex, or any coding agent that speaks OpenAI-compatible API. Pi is what I use, but the oMLX server works with anything.

All the model names, config files, and paths are inside the script at the bottom.

Two models

I keep two and swap depending on the task. The 24 GB can't hold both at once.

One as good as I can have on this machine — 9B class, ~20 tok/s. Primary coding model.

Another fast — 4B class, ~27 tok/s. File edits, quick fixes, daily tasks.

The swap script moves one out, brings the other in, restarts the server. Takes about 5 seconds.

What Pi does

  • File edits and refactoring
  • Writing and fixing tests (Playwright, unit tests)
  • Build verification
  • Code review
  • Documentation updates
  • Running deploys

Anything more complex than a one-liner goes through RPC mode. The orchestration layer writes a prompt, Pi executes, the result comes back. No tmux, no process wrangling.

Pi extensions — what they do, why I use them

  • pix-optimizer — ponytail + caveman (lazy dev mode and token compression). Keeps Pi output tight and skips boilerplate.
  • context-mode — workspace routing and tool call interception. Keeps Pi from wandering into the wrong directories.
  • pi-subagents — spawns sub-agents. Parallel work without blocking the main session.
  • pi-workflow-engine — multi-step task orchestration. Lets Pi handle sequences without losing context.
  • pi-mcp-adapter — MCP server connectivity. Connects to context7 and scrapling for external tools.
  • @fgladisch/pi-caveman — additional compression on top of pix-optimizer.

Known issues

  • Can only keep one model loaded at a time. Two = OOM. Swap script handles it.
  • Thinking mode must be disabled. Defaults to chain-of-thought, kills speed.
  • Full chat history in prompts crashes the local model. Prompts must be just the files and changes.
  • Print mode skips safety controls. Use RPC mode for anything non-trivial.
  • First request after a model swap can time out. Retry once.

The blueprint

curl -fsSL https://workswithagents.dev/static/setup-local-llm-pi.sh | sh
Enter fullscreen mode Exit fullscreen mode

Top comments (0)