Local AI development on Apple Silicon is moving incredibly fast, but running a local coding agent usually comes with two massive headaches: bloated context windows that slow down over time, and destructive code edits when the agent goes off the rails.
Enter mlx-code.
It bundles a native MLX inference server, a terminal harness, git worktree isolation, and composable multi-agent primitives into a single, offline-first Python package. You can run it entirely local, pipe it into standard shell scripts, or swap in remote APIs.
Here is what a standard tool-calling session looks like under the hood:
user@mac:~$ mlc --model mlx-community/Qwen3.6-27B-OptiQ-4bit
[System] Booting local MLX server on Apple Silicon...
[System] Workspace snapshotted to isolated git worktree.
> Fix the token counter bug and write tests to verify it.
↳ Calling tool [Grep] with "count_tokens" in "./"...
↳ Calling tool [Read] on "mlx_code/utils.py"...
[Agent] Found the issue. Delegating test creation to a sub-agent to keep my context clean.
↳ Calling tool [Agent] with prompt "Write pytest cases for count_tokens..."
[Sub-Agent] ↳ Calling tool [Write] on "tests/test_utils.py"...
[Sub-Agent] ↳ Calling tool [Bash] with "pytest tests/test_utils.py"...
[Agent] Tests failed as expected. Applying the fix now.
↳ Calling tool [Edit] on "mlx_code/utils.py"...
[Agent] Fix applied. Tests pass. Changes committed to your local timeline.
What Makes it Different?
1. Git Worktree Isolation
Instead of letting an AI YOLO-edit your active working directory, mlx-code provisions a fresh git worktree for every single session. Every tool interaction and conversation state is automatically captured as a local git commit. If the agent hallucinates or breaks your build, your workspace timeline is entirely under your control—just roll back.
2. Built-In Context Decay Mitigation
Long coding sessions degrade LLM performance as context limits fill up. mlx-code natively tackles this by allowing the primary agent to spawn sandboxed sub-agents for heavy sub-tasks (like writing targeted unit tests). The sub-agent does the heavy lifting, terminates, and returns only the finalized result, keeping your primary context pristine.
3. Absolute Composability
The entire framework is modular. You can pipe outputs across completely different local or remote models directly from your terminal, or build parallel agent workflows in pure Python.
Quick Start
Get dropped directly into the built-in local REPL harness in two lines:
pip install mlx-code
mlc
The Power of UNIX-Style Pipes
Because it treats stdout natively, you can string complex agent tasks together using standard shell architecture:
# Critique a generated solution across entirely different backends
echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
Concurrent Agents in Python
Need a swarm? Fire up parallel researchers using asyncio to build custom local workflows:
import asyncio
from mlx_code.repl import Agent
async def main():
topics = ["history", "algorithms", "industry_usage"]
agents = [Agent() for _ in topics]
# Spawn workers concurrently
await asyncio.gather(*[
a.run(f"Research {t} of BFT. Save to kb/{t}.md.")
for a, t in zip(agents, topics)
])
# Synthesize results
reducer = Agent()
await reducer.run("Read all files in kb/. Synthesise into final_report.md.")
asyncio.run(main())
Extending It
Adding a custom tool is completely boilerplate-free. Subclass Tool, map a Pydantic schema for parameters, and drop it in:
from mlx_code.tools import Tool
from mlx_code.repl import Agent
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
query: str = Field(description="SQL query to run")
class LiveDBTool(Tool):
name = "QueryDB"
description = "Execute a query against the dev database"
parameters = QueryParams
async def execute(self, params: QueryParams, signal=None) -> dict:
result = run_query(params.query)
return {"content": [{"type": "text", "text": result}], "is_error": False}
agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
💬 Let's Discuss
The repo is fully open-source and ready to play with:
How are you currently dealing with context window bloat when running local developer agents?
Top comments (0)