DEV Community: Kiên Bùi

Building magic-code: An Open-Source AI Coding Agent That Runs on Your Hardware

Kiên Bùi — Thu, 16 Apr 2026 02:32:30 +0000

Building magic-code: An Open-Source AI Coding Agent That Runs on Your Hardware

How we built a TUI coding agent in Rust, tested it with 274 scenarios across 5 platforms, and made it work with a $600 GPU.

The Problem

AI coding assistants are powerful — but they come with trade-offs. Cloud-based tools send your code to external servers. Proprietary agents lock you into specific providers. And the costs add up fast.

We wanted something different: a coding agent that's fast, private, and runs on your own hardware. That's why we built magic-code.

What is magic-code?

magic-code is an open-source TUI (terminal UI) agentic AI coding agent built in Rust. It works with any LLM provider — from Claude and GPT to self-hosted models like Qwen 3.5 on your own GPU.

$ magic-code "add error handling to the API endpoints"

The agent reads your code, plans changes, edits files, runs tests, and iterates — all from your terminal.

Key numbers

Metric	Value
Language	Rust
Binary size	9.1 MB (static musl)
Startup time	0ms
Built-in tools	30
Test coverage	274 unit tests
Golden test scenarios	274
Supported providers	15+
Lines of code	18,691
License	MIT

Architecture: 6 Crates, Zero Coupling

mc-cli      → Binary, TUI runner, provider selection
mc-tui      → Terminal UI (ratatui), no dependencies on other mc-* crates
mc-core     → Runtime, ReAct loop, agents, memory, compaction
mc-provider → LLM providers (Anthropic, OpenAI, Gemini, generic)
mc-tools    → 30 tool implementations, permissions, sandbox
mc-config   → Configuration types and loader

The strict rule: mc-provider and mc-tools never depend on each other. Only mc-core orchestrates them. This keeps the codebase maintainable as it grows.

The Self-Hosted Challenge

Our primary goal was making magic-code work well with Qwen 3.5 9B — a model that runs on a single RTX 4070 Ti. This is a fundamentally different challenge than building for Claude or GPT-4.

What we learned

1. Small models need explicit instructions

With Claude, you can say "add a greet function" and it figures out the rest. With Qwen 9B, you need "read src/lib.rs then add a greet function using edit_file." We built a 4-tier prompt system that adapts instructions based on model capability:

Tier 1 (Frontier: Claude, GPT-4): Full autonomy, 30 tools
Tier 2 (Strong: Gemini, DeepSeek): Slightly more structured
Tier 3 (Local: Llama, Mistral): Minimal tools, simple English
Tier 4 (Qwen): Optimized for agentic tool calling, 10 tools

2. Thinking mode and tool calling don't mix (yet)

We discovered that Qwen 3.5 with vLLM's --reasoning-parser qwen3 puts tool calls inside thinking blocks — which the tool call parser can't extract. The fix: disable thinking when tools are present, re-enable for pure Q&A. This is actually recommended by the Qwen team.

3. Context window matters more than model size

Qwen 3.5 9B with 256K context on vLLM outperforms larger models with smaller context windows for real coding tasks. We added Qwen to our model registry with proper context window settings and adaptive compaction thresholds.

Testing: 274 Scenarios, 5 Platforms, Honest Results

We built a comprehensive golden test suite to evaluate magic-code across different languages and app types. Every scenario runs in a Docker sandbox with a fresh project, and results are verified by checking actual file contents — not just "did the model respond."

Test structure

tests/golden/
├── fixtures/          # 6 project templates (Rust, Python, React, Go, etc.)
├── scenarios/         # 274 scenarios across 22 categories
├── run.sh             # Parallel test runner (Docker sandbox)
├── run-platform.sh    # Platform-specific runner
├── verify.py          # Content verification (L1/L2 checks)
└── compare.py         # Cross-model comparison

Verification levels

We don't just check if the model responded. We verify:

L0: Did the model produce output? (tool calls + text)
L1: Does the expected file exist?
L2: Does the file contain the expected code patterns?

Results: Qwen 3.5 9B (self-hosted, RTX 4070 Ti)

Platform	L0 (responds)	L2 (verified correct)
Python Web API (FastAPI)	100%	69%
Python Desktop (Tkinter)	100%	82%
Go Web API	100%	68%
React Web App	100%	28%
React Native Mobile	100%	47%

Overall: 60% verified correct across 110 platform scenarios.

We're sharing these numbers honestly. A 9B model on a single GPU won't match Claude Sonnet — but it handles Python and Go tasks well, and it costs nothing to run.

Where Qwen 9B excels

✅ Single file edits (add function, fix bug)
✅ Python code (FastAPI, Tkinter)
✅ Go code (stdlib HTTP, tests)
✅ Bug fixes with clear descriptions
✅ Reading and understanding code

Where it struggles

❌ Creating new files from scratch (often runs bash instead of write_file)
❌ Complex TypeScript/JSX (React components)
❌ Multi-step refactoring
❌ Abstract patterns (ABC, generics, advanced types)

Comparison: Gemini 2.5 Pro via LiteLLM

Platform	Qwen 3.5 9B	Gemini 2.5 Pro
Python Web API	69%	96%
React Web App	28%	100%
Go Web API	68%	96%
Python Desktop	82%	95%
React Native	47%	90%

Gemini 2.5 Pro scores significantly higher — but it's a cloud model. The beauty of magic-code is you can switch between models with a single flag:

# Self-hosted (free)
magic-code --base-url http://localhost:4000 --model vllm/qwen3.5-9b "fix the bug"

# Cloud (when you need it)
magic-code --model gemini-2.5-pro "refactor the entire module"

What Makes magic-code Different

1. Provider agnostic

15+ providers out of the box. Anthropic, OpenAI, Gemini, Groq, DeepSeek, Mistral, Ollama, LiteLLM, vLLM — or any OpenAI-compatible endpoint.

2. Full agentic loop

Not just code completion. magic-code runs a ReAct loop: read code → plan → edit → run tests → iterate. It has 30 built-in tools including file operations, search, bash, browser, memory, and MCP support.

3. Context engineering

Smart compaction keeps conversations going without losing important context. Repo maps (via tree-sitter) give the model project awareness without reading every file. Memory persists facts across sessions.

4. Security by default

Permission system for dangerous operations
Sandbox for bash execution
Prompt injection guards
Audit logging
8 CI security scanners (CodeQL, SonarCloud, cargo-audit, etc.)

5. Headless mode

Integrate magic-code into CI/CD pipelines:

# Auto-fix failing tests
magic-code --yes --json "fix the failing tests" -o result.json

# Batch processing
magic-code --yes --batch tasks.txt

# NDJSON streaming for web apps
magic-code --ndjson "explain auth.rs" | process_events.sh

Installation

# Quick install (binary)
curl -fsSL https://raw.githubusercontent.com/kienbui1995/mc-code/main/install.sh | sh

# Via cargo
cargo install magic-code

# From source
git clone https://github.com/kienbui1995/mc-code.git
cd mc-code/mc && cargo install --path crates/mc-cli

Self-Hosted Setup

Run Qwen 3.5 9B with vLLM:

vllm serve QuantTrio/Qwen3.5-9B-AWQ \
    --port 8300 \
    --gpu-memory-utilization 0.95 \
    --max-model-len 262144 \
    --quantization awq_marlin \
    --enable-prefix-caching \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --served-model-name qwen3.5-9b

Point magic-code at it:

magic-code --base-url http://localhost:8300 --model qwen3.5-9b "your task"

Or use LiteLLM as a proxy to switch between self-hosted and cloud models seamlessly.

What's Next

Improving Qwen 3.5 performance on file creation tasks
Testing with larger self-hosted models (Qwen 32B, Llama 70B)
HTTP API server for web app integration
Watch mode (file watcher, auto-respond)

Try It

magic-code is MIT licensed and available on GitHub and crates.io.

We built this because we believe AI coding tools should be open, fast, and runnable on your own hardware. The results aren't perfect — but they're honest, reproducible, and improving with every release.

cargo install magic-code

magic-code is built by kienbui1995. Star the repo if you find it useful. Contributions welcome.

MagiC v0.4: Embedded Server Mode + Real Benchmark Numbers

Kiên Bùi — Fri, 27 Mar 2026 07:19:36 +0000

We shipped MagiC v0.4 today. Two things worth talking about: embedded server mode and the benchmark suite we built to validate our performance claims.

Background: What is MagiC?

MagiC is an open-source framework for managing fleets of AI workers. Think Kubernetes for AI agents — it doesn't build agents, it manages any agents built with any framework (CrewAI, LangChain, custom bots) through an open HTTP protocol.

         You (CEO)
          |
     MagiC Server  ←— Go, 15MB binary
    /    |    |    \
ContentBot  SEOBot  LeadBot  CodeBot
(Python)   (Node)  (Python)  (Go)

The core is Go. Workers are anything that speaks HTTP.

What's new in v0.4

1. Embedded server mode

Before v0.4, using MagiC from Python required you to separately install and run the Go server. Now you can do this:

from magic_ai_sdk import MagiC

with MagiC() as client:
    # Server is running. client is a MagiCClient.
    client.submit_task({"type": "summarize", "input": {"url": "..."}})
# Server stops automatically.

MagiC() downloads the correct binary for your platform on first use (Linux/macOS, amd64/arm64), caches it at ~/.magic/bin/, and starts it in the background. Second run is instant — no download.

This is how it works under the hood:

class MagiC:
    def __enter__(self) -> MagiCClient:
        binary = _get_binary(self._version)   # download if not cached
        self._proc = subprocess.Popen(
            [str(binary), "serve"],
            env={"MAGIC_PORT": str(self._port), **os.environ},
        )
        self._wait_ready()  # polls /health until 200 OK
        return MagiCClient(base_url=f"http://localhost:{self._port}")

    def __exit__(self, *_):
        self._proc.terminate()

The binary is pulled from GitHub Releases, which the release CI builds automatically for 4 targets on every v* tag:

- run: |
    GOOS=linux  GOARCH=amd64 go build -ldflags="-s -w" -o dist/magic-linux-amd64  ./cmd/magic
    GOOS=linux  GOARCH=arm64 go build -ldflags="-s -w" -o dist/magic-linux-arm64  ./cmd/magic
    GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" -o dist/magic-darwin-amd64 ./cmd/magic
    GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o dist/magic-darwin-arm64 ./cmd/magic

Result: pip install magic-ai-sdk is all a Python developer needs. No Go toolchain required.

2. Benchmark suite

A frequent question is: "Why Go? Won't Python be fine?" We now have actual numbers.

The benchmark suite lives in core/benchmarks/ — 13 benchmarks covering routing, registration, heartbeat, and the event bus. Run them yourself:

git clone https://github.com/kienbui1995/magic
cd magic/core
go test -bench=. -benchtime=5s -benchmem ./benchmarks/...

Results on an i7-12700 (20 logical cores):

Benchmark	Latency	Throughput
Route task — 10 workers	2.8 µs	920K tasks/s
Route task — 100 workers	20 µs	50K tasks/s
Route task — 1000 workers	240 µs	4K tasks/s
Worker registration	1.9 µs	1.3M/s
Heartbeat — 100 workers	84 µs	—
Heartbeat — 1000 workers	1.0 ms	—
Event bus (10 subscribers)	163 ns	12.5M/s
Event bus (parallel)	77 ns	26M/s

A few things worth noting:

Routing scales O(n) with worker count, which is expected — we scan all workers to filter by capability. With 1000 workers the overhead is 240µs. An LLM call is 1–30 seconds. The orchestrator is never the bottleneck.

Heartbeat at 1000 workers costs ~1ms per check cycle. In practice, heartbeats happen every 30 seconds per worker, so the server processes ~33 heartbeats/second at 1000 workers. Completely trivial.

The event bus at 26M events/second means no async communication between modules will ever be a bottleneck.

Why these numbers matter for fleet management

Every AI agent framework I've seen (AutoGen, CrewAI, Agno, LangGraph) runs agents as Python objects in a single process. There's no heartbeat because there's no persistent worker registry. There's no routing because there's no fleet.

Agno's "529x faster" benchmark measures how quickly Python objects instantiate. That's a valid metric for their architecture. MagiC measures something different: how many concurrent agents can the orchestrator manage, and how fast can it route tasks to them.

Python's GIL means routing decisions for N concurrent agents share one thread. In Go, each heartbeat runs in its own goroutine (~2KB RAM, true parallelism). The numbers reflect that.

What's coming in v0.5

We're implementing zero-trust worker authentication:

Each worker gets its own mct_<256-bit token> credential (not a shared API key)
Token bound to a specific worker on first registration — can't be reused or impersonated
Org isolation: tasks from Org A can only route to workers in Org A
Immutable audit log: every register/heartbeat/route/complete recorded

No competitor has this at the framework level. AutoGen agents have no identity. CrewAI's shared_memory=True default has caused PII leaks between sessions. LangGraph gets closest with interrupt primitives but still no agent-level auth.

We'll publish the implementation and a write-up when it ships.

Try it

pip install magic-ai-sdk

from magic_ai_sdk import MagiC, Worker, capability

# Define a worker
class SummarizerBot(Worker):
    @capability(name="summarize")
    def summarize(self, task):
        # your LLM call here
        return {"summary": "..."}

# Start server + register worker
with MagiC() as client:
    bot = SummarizerBot(name="SummarizerBot", endpoint="http://localhost:9001")
    bot.start(magic_url=client.base_url)

    result = client.submit_task({
        "type": "summarize",
        "routing": {"required_capabilities": ["summarize"]}
    })
    print(result)

GitHub: kienbui1995/magic

Feedback welcome — especially on the benchmark methodology. We want to publish fair comparisons, not marketing numbers.

Why I Built MagiC — And Why "Managing" AI Agents Is the Real Problem

Kiên Bùi — Mon, 23 Mar 2026 03:47:53 +0000

March 23, 2026

The $2M Logistics Disaster That Changed My Thinking

A global logistics firm deployed two AI agents in early 2025: one for inventory procurement, one for dynamic warehouse pricing.

Late Q4, a data lag caused the procurement agent to see "low stock" and over-order high-value components. Simultaneously, the pricing agent saw the incoming surplus and slashed prices to move volume.

Result: $2M spent on premium freight to ship items they were selling at a loss.

This wasn't a failure of AI logic — it was a failure of AI orchestration. (CIO Magazine, Feb 2026)

Everyone Is Building Agents. Nobody Is Managing Them.

The numbers tell the story:

Gartner predicts 40% of enterprise apps will feature AI agents by end of 2026 — up from less than 5% in 2025 (source)
Deloitte estimates the AI agent market at $8.5B in 2026, growing to $35B by 2030 (source)
CIO Magazine warns of "agent sprawl" — the new Shadow IT (source)

For the average enterprise, this translates to 50+ specialized agents — marketing bots, support bots, data bots, code review bots — each running independently. No coordination. No cost control. No visibility.

Sound familiar?

The Framework Gap

I looked at every major AI agent framework: CrewAI (44k GitHub stars), AutoGen (Microsoft), LangGraph (LangChain ecosystem).

They all solve the same problem: how to BUILD an agent.

But as teams scale to 10, 20, 50+ agents, a different problem emerges: how to MANAGE them working together.

Need	CrewAI	AutoGen	LangGraph
Build agents	Yes	Yes	Yes
Manage any agent from any framework	No	No	No
Cost tracking with auto-pause	No	No	No
Capability-based routing	No	No	No
Human approval gates in workflows	No	No	Partial

This is an infrastructure problem, not an agent-building problem.

We've seen this pattern before: Docker made it easy to run containers. But you still needed Kubernetes to manage fleets of containers in production.

So I Built MagiC

MagiC is an open-source framework for managing fleets of AI agents. Any agent, any framework, any language — orchestrated through one protocol.

The Architecture

         You (API)
          |
     MagiC Server (Go)
    /    |    |    \
ContentBot  SEOBot  LeadBot  CodeBot
(CrewAI)   (Custom) (LangChain) (Go)

MagiC doesn't build agents. Your CrewAI agent becomes a MagiC worker. Your LangChain chain becomes a MagiC worker. They join the same organization and work together.

A Worker in 10 Lines

from magic_ai_sdk import Worker

worker = Worker(name="ContentBot", endpoint="http://localhost:9000")

@worker.capability("summarize", description="Summarize any text")
def summarize(text: str) -> str:
    return f"Summary: {text[:100]}..."

worker.register("http://localhost:8080")
worker.serve()

What MagiC Handles

Routing: Submit a task with required capabilities. MagiC finds the best available worker — by capability match, cost, or load. Overloaded workers are automatically skipped.

Cost Control: Track spending per worker and team. Get alerts at 80% budget. Workers auto-pause at 100%. No more surprise LLM bills.

DAG Workflows: Multi-step pipelines with parallel execution, dependency resolution, and failure handling. Step outputs automatically flow to downstream steps.

      research
       /    \
  content    leads       <- parallel execution
     |         |
    seo        |
      \       /
   [approval gate]       <- human reviews before proceeding
          |
     outreach

Human-in-the-Loop: Approval gates in workflows. Critical steps pause for human review before proceeding.

Circuit Breaker: If a worker fails 3 times consecutively, MagiC stops sending tasks for 30 seconds. Prevents cascading failures.

Persistent Storage: SQLite backend. Your data survives server restarts.

The Technical Details

MagiC is built in Go — the same language behind Kubernetes, Docker, and Traefik. Why Go?

Fast: Goroutines handle thousands of concurrent tasks
Small: Single binary, no runtime dependencies
Proven: The infrastructure world runs on Go

The Python SDK provides a clean developer experience for building workers. Any framework works — CrewAI, LangChain, or plain Python.

Protocol: MCP² (MagiC Protocol)

Transport-agnostic JSON messages. 14 message types covering worker lifecycle, task lifecycle, collaboration, and direct channels. Workers communicate via standard HTTP POST.

{
  "protocol": "mcp2",
  "version": "1.0",
  "type": "task.assign",
  "source": "org_magic",
  "target": "worker_001",
  "payload": { "task_type": "summarize", "input": {"text": "..."} }
}

Numbers

9 modules: Gateway, Registry, Router, Dispatcher, Monitor, Orchestrator, Evaluator, Cost Controller, Org Manager + Knowledge Hub
90 tests passing with Go race detector
Zero external Go dependencies
7.8/10 independently verified technical score
Apache 2.0 license

Who Is This For?

Today:

Development teams running multiple AI agents that need coordination
Teams spending too much on LLM APIs without visibility
Anyone who wants to combine agents from different frameworks

Tomorrow:

Agencies managing AI workflows for clients
Enterprises orchestrating cross-department AI operations
The "Kubernetes for AI agents" use case — as agent sprawl grows, orchestration becomes essential

Get Started

git clone https://github.com/kienbui1995/magic.git
cd magic/core && go build -o ../bin/magic ./cmd/magic
./bin/magic serve

Or with Docker:

docker build -t magic .
docker run -p 8080:8080 magic

5 template workers included — summarizer, translator, classifier, extractor, generator. Clone and run immediately.

GitHub: github.com/kienbui1995/magic

The Shift Is Coming

Deloitte calls 2026 "an inflection point for agent orchestration." CIO Magazine says the era of the "lone-wolf bot is over."

The shift from "build agents" to "manage agents" is inevitable. Just like we went from "run containers" to Kubernetes.

MagiC is my bet on that shift. It's open-source, it's early, and it works. Star the repo if you find it useful.

Kien Bui — Builder of MagiC

References: