Alessio Masucci

Posted on Mar 26 • Originally published at dev.to

I Tried to Run Symphony for Real on Rentello. It Broke in Exactly the Right Place.

#ai #automation #linear #programming

A few weeks ago, I published a build diary about porting OpenAI's Symphony to Claude Code.

That piece was about getting the machine to exist.

This one is about what happened when I tried to trust it.

Not in a toy repo. Not with a fake ticket. Not with a carefully staged demo that only exercised the happy path. I wired the workflow into a real project, Rentello, pointed it at a real Linear board, and tried to make Codex, Claude, and Symphony-family orchestrators coexist on the same execution model.

That is when the most useful bug of the whole project appeared.

It wasn't a crash inside the orchestration loop.
It wasn't a retry bug.
It wasn't even a Linear label bug.

It was a hidden architectural assumption inside one innocent file: WORKFLOW.md.

Phase two of the same story

The first article was the infrastructure story.

I had taken OpenAI's Symphony architecture, ported the orchestration ideas to work with Claude Code, and documented the engineering journey: configuration, polling, workspaces, retries, MCP tooling, the terminal dashboard, the CLI quirks, the Linear GraphQL edges. It was the "can I build this?" phase.

This sequel starts after that.

Once the port existed, the obvious next question was: can this become a reliable workflow across agent surfaces?

I didn't just want a Claude-only orchestrator anymore.

I wanted a system where:

planning could happen in Codex App or Codex web
work could be split into Linear issues that were safe for parallel or sequential execution
exec:agent work could be handled by generic agent surfaces
exec:symphony work could be handled by Symphony-family orchestrators
the same repository could remain deterministic whether the active surface was Codex or Claude

That sounds tidy when you say it fast.

In practice, it meant turning "AI can work on tickets" into an actual contract.

The experiment: make the workflow explicit

The first big change was conceptual, not technical.

I stopped treating the workflow as an informal set of conventions and turned it into a real repository contract:

shared execution-owner labels: exec:agent and exec:symphony
wave labels like wave:1, wave:2
a stable issue template
a single persistent ## Workpad comment per issue
a shared machine-readable contract plus shared docs/templates

The important detail here is that I did not want separate "Codex rules" and "Claude rules" at the planning layer.

The workflow itself needed to be generic.

The orchestrator choice should affect how the work is run, not how the work is defined.

So I introduced a shared contract and kept runtime-specific prompts only where they belonged: under .codex/ and .claude/, as adapters to the same underlying rules.

That gave me a clean execution model:

exec:agent means "owned by the non-Symphony agent surface"
exec:symphony means "owned by a Symphony-compatible orchestrator"
blockedBy remains authoritative
wave labels are scheduling metadata, not permission to ignore dependencies

That was the theory.

Then I decided to test it for real.

The smoke test: two issues, one dependency, no excuses

I created a new Linear project called Agent Workflow Smoke Test.

Then I used the repo's planning contract to generate the smallest meaningful DAG I could think of:

MAS-26 — exec:agent, wave:1, research, docs
MAS-27 — exec:symphony, wave:2, research, docs, blockedBy MAS-26

The goal was deliberately low-risk: a docs-only smoke test with a real sequencing edge.

If the contract was sound, I should be able to verify:

planning against the real repo
issue creation in Linear with the right labels and dependency edges
Codex App dispatch respecting exec:agent
Symphony respecting exec:symphony
a single persistent workpad model across the whole system

The first pass went well.

The repo-level checks passed.
The Linear issues were created correctly.
The labels were right.
The blockedBy relation was correct.

Then I moved MAS-26 into Todo and let Codex App handle the bootstrap.

That part worked exactly the way I wanted.

Codex App picked up only the exec:agent issue, moved it to In Progress, and created a single ## Workpad comment. More importantly, the next run reused that same comment and updated it in place instead of spamming the issue with milestone chatter.

That detail matters more than it sounds.

A lot of agent workflows die the death of a thousand comments. If every automation pass creates another "status update," the issue becomes unreadable. Reusing one persistent workpad is the difference between an automation system that looks operationally credible and one that looks like a bot farm.

At that point, the first conclusion felt obvious:

the workflow contract was working.

That conclusion was wrong.

The first false conclusion

Once the Codex App leg worked, I moved to the Symphony leg.

I launched openai/symphony against the repo's WORKFLOW.md.

And immediately noticed something odd: it was looking at the real Rentello project instead of the smoke-test one.

The reason was simple in hindsight.

WORKFLOW.md was doing two very different jobs at once:

it was the shared execution contract
it was the runtime entrypoint for a specific orchestrator, with a specific project slug and a specific launcher configuration

That meant my "shared workflow file" wasn't really shared at all.

It was secretly carrying environment-specific assumptions:

which Linear project to poll
which states count as active
which command to spawn for the agent runtime
what approval policy and sandbox mode to use

So before I even got to the deeper integration problem, I had already hit a design smell:

I was treating a runtime wrapper as if it were a universal contract.

I worked around it temporarily with a copied workflow file that targeted the smoke-test project.

That got me to the real failure.

The real failure: not labels, not blockers, not Linear

The next visible symptom was noisy and misleading.

Symphony would pick up the smoke-test issue, then back off with timeouts and failed runs.

At first glance, it looked like the kind of thing that sends you down the wrong rabbit hole:

maybe the new exec:* label routing was wrong
maybe the issue state transitions were inconsistent
maybe the blockedBy logic had a bug

None of that was the root cause.

The actual problem was much deeper and much simpler:

openai/symphony expected a Codex-compatible app-server runtime, while the repo's WORKFLOW.md was still configured for Claude-style execution.

In other words, the orchestrator and the launcher contract disagreed about what "the agent" even was.

The workflow file still contained a Claude-oriented runtime block. openai/symphony expected something more like:

a Codex app-server command
a compatible approval policy
the sandbox settings the OpenAI implementation expects

So the visible failure was timeout and retry noise.
The real failure was a runtime mismatch.

That distinction matters.

If I had blamed the new workflow model, I would have "fixed" the wrong layer.

The issue was not exec:agent vs exec:symphony.
The issue was not the shared issue template.
The issue was not the workpad model.

The issue was that one literal WORKFLOW.md cannot honestly be the runtime entrypoint for both openai/symphony and a Claude-oriented Symphony port.

The fix that changed the architecture

This was the moment the architecture got better.

The correct lesson was not "make WORKFLOW.md more clever."

It was: stop asking one file to represent two incompatible runtime contracts.

So I split the model into three layers:

a shared execution contract
a shared Symphony instruction body
runtime-specific workflow wrappers

Concretely, the repo now has:

a shared machine-readable contract for labels, states, and workpad expectations
shared docs/templates for planning and issue execution
a shared Symphony body template
WORKFLOW.openai.md
WORKFLOW.claude.md
a small render script to generate the wrappers from the shared body

That was the missing separation all along.

The shared workflow body defines how the issue should be executed.

The runtime wrapper defines how the orchestrator should launch the agent.

Those are not the same concern.

Trying to force them into one file had worked only because I was not yet testing both runtimes seriously enough.

Once I made that split, the system became easier to reason about immediately.

The repository now says, in effect:

the execution rules are shared
the runtime launcher is not

That is a much healthier contract.

The second failure: the GraphQL ghost was still waiting

Of course, fixing the runtime mismatch didn't magically make the full run clean.

After the split, Symphony could start correctly.
It could pick up MAS-27.
It could begin doing real work.

And then it hit another boundary failure: stale assumptions in the Linear GraphQL layer.

The logs told the story:

Field "identifier" is not defined by type "IssueFilter".
Cannot query field "blockedByIssues" on type "Issue".

This was a different class of bug entirely.

Now the runtime was correct, but the orchestrator's GraphQL expectations were out of sync with the actual Linear schema. The result was a partial execution: the issue moved into In Progress, but the run did not complete cleanly, and no workpad comment was created on MAS-27.

That was important to verify because from the outside, a half-working orchestration system can look deceptively healthy.

The dashboard shows activity.
The issue state changes.
The agent session starts.

But the operational contract is still broken if the workpad never appears and the run dies on a schema edge.

What actually worked

This is the part I care about most, because I don't want the story to sound more broken than it really was.

A lot did work:

the shared execution-owner model with exec:agent and exec:symphony
the planning contract based on shared docs/templates
the Linear issue generation for MAS-26 and MAS-27
blocker gating via blockedBy
Codex App automation dispatching only exec:agent
in-place reuse of a single ## Workpad comment on Linear
Symphony respecting the exec:symphony lane
the runtime split into WORKFLOW.openai.md and WORKFLOW.claude.md

That is not a small list.

In fact, the test did exactly what a good smoke test should do:

it validated the architectural core, then exposed the next real boundary failure.

First the runtime wrapper problem.
Then the stale GraphQL problem.

That is progress.

The actual lesson

If you want cross-agent orchestration, the wrong abstraction is "one workflow file for everything."

What you actually need is:

one shared execution contract
one shared behavioral prompt body, if the execution model is the same
separate runtime launch wrappers for each orchestrator/runtime pair

The reason is simple.

Codex and Claude are not just different models. In this kind of system, they are different runtime surfaces with different launcher assumptions, different session protocols, and different orchestration expectations.

The workflow itself should stay stable across those surfaces.

The launcher should not.

That is the architectural correction this test forced me to make.

And in hindsight, it's exactly the kind of correction you only get from trying to run the system for real.

The happy path rarely teaches you where your abstractions are lying.
Live orchestration does.

Where this goes next

The runtime split is in.
The generic exec:agent / exec:symphony workflow is in.
The Codex App leg is validated.

The next cleanup is clear:

patch the stale Linear GraphQL assumptions in the Symphony runtime
rerun the MAS-27 path cleanly
verify the full end-to-end chain with both executor lanes

That will be the point where the system stops being "an interesting orchestration prototype" and starts becoming something I would trust against a real backlog.

And honestly, that is the part I find most interesting now.

Porting the orchestrator was fun.

Finding the seam between shared execution contracts and runtime-specific launch contracts was the part that actually made it usable.

DEV Community