DEV Community

Ricardo Frasson
Ricardo Frasson

Posted on

Why I built a multi-agent build pipeline on top of the Claude Agent SDK

I love Claude Code. But on anything bigger than a one-file change, I kept hitting the same wall: the agent starts strong, then drifts. Context fills up with planning chatter, exploration, half-built scaffolding, and by the time it's writing the third file, it's forgotten the acceptance criteria it agreed to in turn one.

The usual workaround is clear the context yourself between phases, save progress to files, manually re-feed what matters. It works, but it's babysitting, and it doesn't scale to the kind of "build me this feature" workflow I actually want.

So I built MagesticAI — an open-source web UI that runs Claude as a pipeline of fresh-session agents instead of one long-running session.

GitHub: github.com/dataseeek/MagesticAI (AGPL-3.0)

Where this came from

MagesticAI didn't start from scratch. It grew out of Aperant by @AndyMik90 — an excellent AGPL-3.0 desktop tool for autonomous multi-session AI coding. Aperant gave me the core idea (fresh-session agents per phase) and a lot of the early scaffolding.

But the first time I cloned it, i was hoping for a cloud version, for Linux servers not Desktop.

What MagesticAI adds on top:

  • A cloud browser-based web UI instead of a desktop app — so you can drive it from any device and run it on a remote server and add users / teams (still need to implement that).

  • An intent-aware merge system for parallel agent work (more on this below)

  • Multi-provider routing beyond Anthropic — Codex, Gemini, OpenAI-compatible endpoints, Ollama, local models

  • A BMad Method integration for scale-adaptive planning (complexity detection, story-based plans, architecture-first for big work)

Big credit to Andy for the original, if you want a slick desktop experience instead of a server-based one, go check Aperant out.

What it does

You write a task description in the browser. MagesticAI then:

  1. Planner turns it into a spec + subtask list (its own fresh session)
  2. Coder implements each subtask in an isolated git worktree (fresh session)
  3. QA Reviewer validates against acceptance criteria (fresh session)
  4. QA Fixer loops back on whatever failed (fresh session)

Every phase runs in a brand-new Claude Agent SDK session. Context bloat from planning doesn't poison coding. Tangents during coding don't poison QA. The handoff between phases is a small set of structured files (spec.md, implementation_plan.json, qa_report.md), not a 50k-token transcript.

The whole thing runs in an isolated git worktree on a magestic-ai/<spec> branch, so nothing touches your working tree until you explicitly merge.

The bit I'm most proud of: intent-aware merging

When you run multiple specs in parallel, you get conflicts. The naive answer is "ask the LLM to merge it." That's expensive and unreliable.

What's in apps/backend/merge/ instead is a layered resolver:

  1. Tree-sitter semantic diff — extracts what each task meant to change (added a function, modified imports, wrapped a component), not just text-level overlap.
  2. Deterministic strategies — append functions, merge imports, combine React props, compose hooks, reorder by dependency. Zero LLM calls for the common cases.
  3. AI resolver as fallback — only genuinely ambiguous conflicts go to Claude, and even then with minimal context: just the conflict region, each task's one-sentence intent, the semantic change, and the baseline. No whole-file dumps.

The design goal: maximum automation, minimum tokens (we try). Most merges never call an LLM at all.

Local models if you want them

MagesticAI also supports Codex and Gemini Clis, and any OpenAI-compatible endpoint — LM Studio, vLLM, OpenRouter, Together, Groq, Ollama. Useful if you want a local model (at least 27B) running the QA loop while Claude does the heavy coding. The provider abstraction lives in apps/backend/providers/, and selection is driven by the model string.

Try it

git clone https://github.com/dataseeek/MagesticAI
cd MagesticAI
npm run install:all

# Backend (port 3101)
cd apps/web-server && source .venv/bin/activate && python -m server.main

# Frontend (port 3100)
cd apps/frontend-web && npm run dev
Enter fullscreen mode Exit fullscreen mode

Then open http://localhost:3100 and add a project folder.

What's next

Honest list of what's not there yet:

  • No formal eval harness (SWE-Bench style). Today, "did it work" = QA agent's verdict + your manual review.

  • No per-subtask test-run receipt pinned to each step. The QA report is per-spec, not per-subtask, and I want to fix that.

If any of that sounds interesting to work on, the issue tracker is open. Contributions welcome — branch from dev, PRs target dev.

A question for you

How are you handling context exhaustion on long features today? discipline, subagents, fresh sessions, something else?

I'd genuinely like to compare notes.


MagesticAI is AGPL-3.0, based on Aperant. Repo: github.com/dataseeek/MagesticAI

Top comments (0)