Claude Code is the best AI coding assistant available right now. But it calls the Anthropic API by default, which adds up fast on long sessions.
What if you could run it entirely locally - free, private, and on hardware you already own?
mlx-serve makes this possible on any Apple Silicon Mac.
What is mlx-serve?
mlx-serve is a native Zig server for MLX-format language models on Apple Silicon. It exposes OpenAI-compatible, Anthropic-compatible, and Ollama-compatible HTTP APIs - all on a single port, from a single binary.
brew install mlx-serve
That's it. No Python. No conda. No Docker.
Running Claude Code locally
Claude Code looks for ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY in your environment. mlx-serve implements the full Anthropic Messages API, so you just point Claude Code at it:
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=local
export ANTHROPIC_DEFAULT_MODEL=mlx-serve
mlx-serve --model ~/.mlx-serve/models/mlx-community/gemma-4-e4b-it-4bit --serve
Then launch Claude Code as normal. Streaming, tool calls, thinking blocks - all work.
Full setup guide: https://mlxserve.com/claude-code-local/
Performance
On Apple Silicon, mlx-serve achieves 35%+ faster decode than LM Studio on Gemma 4 E4B 4-bit. The server is written in Zig with no Python runtime overhead.
Other features
-
Ollama drop-in: same
/api/chat,/api/generate,/api/embedendpoints - works with Raycast, Open WebUI, Obsidian - Agent Sandbox: isolated Linux VM via Virtualization.framework, live port forwarding to localhost
- Image/video/audio gen: FLUX, LTX-Video, Qwen3-TTS in the same server process
- macOS menu-bar app: free, includes agent mode, quick launcher (Control+Space), voice cloning
Links
- Website: https://mlxserve.com
- Claude Code setup: https://mlxserve.com/claude-code-local/
- LM Studio comparison: https://mlxserve.com/lm-studio-alternative/
- Ollama alternative: https://mlxserve.com/ollama-alternative/
- GitHub: https://github.com/ddalcu/mlx-serve
Top comments (0)