DEV Community

Josef Albers
Josef Albers

Posted on

Meet mlx-code: A Composable, Git-Isolated Coding Agent Built for Mac

Local AI development on Apple Silicon is moving incredibly fast, but running a local coding agent usually comes with two massive headaches: bloated context windows that slow down over time, and destructive code edits when the agent goes off the rails.

Enter mlx-code.

It bundles a native MLX inference server, a terminal harness, git worktree isolation, and composable multi-agent primitives into a single, offline-first Python package. You can run it entirely local, pipe it into standard shell scripts, or swap in remote APIs.

Here is what a standard tool-calling session looks like under the hood:

user@mac:~$ mlc --model mlx-community/Qwen3.6-27B-OptiQ-4bit
[System] Booting local MLX server on Apple Silicon...
[System] Workspace snapshotted to isolated git worktree.

> Fix the token counter bug and write tests to verify it.
↳ Calling tool [Grep] with "count_tokens" in "./"...
↳ Calling tool [Read] on "mlx_code/utils.py"...
[Agent] Found the issue. Delegating test creation to a sub-agent to keep my context clean.
    ↳ Calling tool [Agent] with prompt "Write pytest cases for count_tokens..."
    [Sub-Agent] ↳ Calling tool [Write] on "tests/test_utils.py"...
    [Sub-Agent] ↳ Calling tool [Bash] with "pytest tests/test_utils.py"...
[Agent] Tests failed as expected. Applying the fix now.
↳ Calling tool [Edit] on "mlx_code/utils.py"...
[Agent] Fix applied. Tests pass. Changes committed to your local timeline.

Enter fullscreen mode Exit fullscreen mode

What Makes it Different?

1. Git Worktree Isolation

Instead of letting an AI YOLO-edit your active working directory, mlx-code provisions a fresh git worktree for every single session. Every tool interaction and conversation state is automatically captured as a local git commit. If the agent hallucinates or breaks your build, your workspace timeline is entirely under your control—just roll back.

2. Built-In Context Decay Mitigation

Long coding sessions degrade LLM performance as context limits fill up. mlx-code natively tackles this by allowing the primary agent to spawn sandboxed sub-agents for heavy sub-tasks (like writing targeted unit tests). The sub-agent does the heavy lifting, terminates, and returns only the finalized result, keeping your primary context pristine.

3. Absolute Composability

The entire framework is modular. You can pipe outputs across completely different local or remote models directly from your terminal, or build parallel agent workflows in pure Python.


Quick Start

Get dropped directly into the built-in local REPL harness in two lines:

pip install mlx-code
mlc

Enter fullscreen mode Exit fullscreen mode

The Power of UNIX-Style Pipes

Because it treats stdout natively, you can string complex agent tasks together using standard shell architecture:

# Critique a generated solution across entirely different backends
echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000

Enter fullscreen mode Exit fullscreen mode

Concurrent Agents in Python

Need a swarm? Fire up parallel researchers using asyncio to build custom local workflows:

import asyncio
from mlx_code.repl import Agent

async def main():
    topics = ["history", "algorithms", "industry_usage"]
    agents = [Agent() for _ in topics]

    # Spawn workers concurrently
    await asyncio.gather(*[
        a.run(f"Research {t} of BFT. Save to kb/{t}.md.")
        for a, t in zip(agents, topics)
    ])

    # Synthesize results
    reducer = Agent()
    await reducer.run("Read all files in kb/. Synthesise into final_report.md.")

asyncio.run(main())

Enter fullscreen mode Exit fullscreen mode

Extending It

Adding a custom tool is completely boilerplate-free. Subclass Tool, map a Pydantic schema for parameters, and drop it in:

from mlx_code.tools import Tool
from mlx_code.repl import Agent
from pydantic import BaseModel, Field

class QueryParams(BaseModel):
    query: str = Field(description="SQL query to run")

class LiveDBTool(Tool):
    name = "QueryDB"
    description = "Execute a query against the dev database"
    parameters = QueryParams

    async def execute(self, params: QueryParams, signal=None) -> dict:
        result = run_query(params.query)
        return {"content": [{"type": "text", "text": result}], "is_error": False}

agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])

Enter fullscreen mode Exit fullscreen mode

💬 Let's Discuss

The repo is fully open-source and ready to play with:

GitHub Repository

Watch the YouTube Demo

How are you currently dealing with context window bloat when running local developer agents?

Top comments (0)