Running a local coding agent on a Mac Mini — the actual setup

#ai #agents #macos #tutorial

Running a local coding agent on a Mac Mini

Running a local coding agent on a Mac Mini — the actual setup

By Vilius Vystartas

I have an agent that does my low-stakes coding. File edits, test fixes, build verification. The kind of work you'd normally do yourself but it's faster to delegate. It also writes Playwright tests, reviews code, updates documentation, and runs deploys.

It runs locally — Mac Mini M4, 24 GB. No cloud API calls for the coding part. The orchestration layer still uses a cheap cloud model for planning and routing. The actual file editing is done by Pi, a coding agent that connects to oMLX, an OpenAI-compatible local LLM server.

The same setup can drive Claude Code, Codex, or any coding agent that speaks OpenAI-compatible API. Pi is what I use, but the oMLX server works with anything.

All the model names, config files, and paths are inside the script at the bottom.

Two models

I keep two and swap depending on the task. The 24 GB can't hold both at once.

One as good as I can have on this machine — 9B class, ~20 tok/s. Primary coding model.

Another fast — 4B class, ~27 tok/s. File edits, quick fixes, daily tasks.

The swap script moves one out, brings the other in, restarts the server. Takes about 5 seconds.

What Pi does

File edits and refactoring
Writing and fixing tests (Playwright, unit tests)
Build verification
Code review
Documentation updates
Running deploys

Anything more complex than a one-liner goes through RPC mode. The orchestration layer writes a prompt, Pi executes, the result comes back. No tmux, no process wrangling.

Pi extensions — what they do, why I use them

pix-optimizer — ponytail + caveman (lazy dev mode and token compression). Keeps Pi output tight and skips boilerplate.
context-mode — workspace routing and tool call interception. Keeps Pi from wandering into the wrong directories.
pi-subagents — spawns sub-agents. Parallel work without blocking the main session.
pi-workflow-engine — multi-step task orchestration. Lets Pi handle sequences without losing context.
pi-mcp-adapter — MCP server connectivity. Connects to context7 and scrapling for external tools.
@fgladisch/pi-caveman — additional compression on top of pix-optimizer.

Known issues

Can only keep one model loaded at a time. Two = OOM. Swap script handles it.
Thinking mode must be disabled. Defaults to chain-of-thought, kills speed.
Full chat history in prompts crashes the local model. Prompts must be just the files and changes.
Print mode skips safety controls. Use RPC mode for anything non-trivial.
First request after a model swap can time out. Retry once.

The blueprint

curl -fsSL https://workswithagents.dev/static/setup-local-llm-pi.sh | sh

Top comments (1)

Armorer Labs • Jun 21

This is exactly the category of setup I think will become more common: local machine, local state, coding agent doing useful low-stakes work.

The thing I keep wanting on top is an operations layer. Which agents are installed, which jobs are running, which provider config they used, where the logs are, and how to stop or resume a task when it gets weird.

That is the angle behind Armorer: local agents need a local operations layer, not just another prompt wrapper.