Juan Torchia

Posted on Apr 23 • Originally published at juanchi.dev

Zed Parallel Agents: I Tested Them in My Real Workflow — Here's What Changed (and What Didn't)

#english #typescript #claudecode #productividad

Zed Parallel Agents: I Tested Them in My Real Workflow — Here's What Changed (and What Didn't)

A water pipe has a fixed diameter. You can put ten pumps in parallel, each one pushing harder, and the flow coming out the other end will be exactly the same. The problem was never the water's speed — it was the width of the pipe.

Parallel agents in Zed is basically that. And once you see it that way, the promise of "multiple agents working at the same time" starts to sound very different.

Zed hit 229 points on Hacker News with this feature. The discussion was long, enthusiastic, full of people already using it on real projects. I saw it while reviewing logs from CrabTrap, my LLM-as-a-judge proxy that's been running in production for months. I thought: I have my own setup, I have my own numbers, I can make this comparison with something concrete. What follows is exactly that.

What Zed Parallel Agents Are (and Why the Design Matters)

Zed lets you launch multiple agent instances against the same codebase simultaneously, with separate contexts. Each agent sees its own context window, works on its own branch or set of files, and the results get integrated afterward. It's a "fork and merge" model applied to inference.

The design is elegant. The problem it attacks is real: when you have a big task — refactoring three modules, migrating types, running test coverage in parallel — executing it sequentially in a single agent carries enormous latency cost. One agent does module A, then module B, then module C. With parallel agents, you do all three at once.

My thesis before I started: parallelization solves the wrong problem if your bottleneck is context, not speed.

My Current Setup: Claude Code + CrabTrap + Railway

Before showing the comparison, context matters. I'm not coming to agents from zero.

I have running in production:

Claude Code as my primary development agent, on my Next.js + TypeScript + PostgreSQL stack
CrabTrap as an LLM-as-a-judge proxy that evaluates outputs before they hit production
Everything deployed on Railway, with structured logs that let me measure tokens per real task

This setup evolved over months. I documented part of that process when I compared costs against Google TPU v8 and found that marketing numbers don't hold up against real workloads.

What I measure in each agent session:

# Extract metrics from a Claude Code session
# from Railway logs

railway logs --service crabtrap --since 2h | \
  grep '"type":"agent_turn"' | \
  jq '{
    turn: .turn,
    input_tokens: .usage.input_tokens,
    output_tokens: .usage.output_tokens,
    task: .task_label
  }'

Typical output from a medium refactor session:

{ "turn": 1, "input_tokens": 8420,  "output_tokens": 1203, "task": "context_analysis" }
{ "turn": 2, "input_tokens": 12840, "output_tokens": 2891, "task": "change_proposal" }
{ "turn": 3, "input_tokens": 18220, "output_tokens": 4102, "task": "implementation" }
{ "turn": 4, "input_tokens": 22100, "output_tokens": 891,  "task": "validation" }

The number that matters: input tokens at turn 3 are already at 18k. And this is a small task. For something touching three modules, I'm easily at 40–60k input tokens just from accumulated context.

What Zed Parallel Agents Actually Changes (With Evidence)

I tested Zed against three concrete scenarios from my real codebase.

Scenario 1: TypeScript Type Migration in Independent Modules

I had three modules with no direct dependency between them — authentication, metrics, and the database client — that needed to migrate from any to strict types. With my usual Claude Code flow, I did it sequentially. Estimated total time: ~45 minutes of inference, 3 sessions.

With Zed parallel agents: I launched three simultaneous agents, one per module. Real total time: ~18 minutes. All three modules finished in parallel, integration took 4 minutes of manual review.

This is real speed. No argument there.

Scenario 2: Test Coverage on Code With Cross-Dependencies

Here's where the first problem showed up. I asked two parallel agents to write tests for two services that share a validation helper. The result:

// Agent 1 generated this in validationService.test.ts
// Mocked the helper one way
jest.mock('../utils/validatePayload', () => ({
  validatePayload: jest.fn().mockReturnValue({ valid: true })
}));

// Agent 2 generated this in paymentService.test.ts
// Mocked the same helper a different way
jest.mock('../utils/validatePayload', () => ({
  validatePayload: jest.fn().mockImplementation((data) => {
    if (!data.amount) throw new Error('missing amount');
    return { valid: true };
  })
}));

Two incompatible mocks of the same module. Neither is wrong in isolation — the problem only surfaced during integration. I had to review both files, understand what each agent had assumed, and pick a convention.

The time I saved on inference I spent on review. Net difference: almost zero.

Scenario 3: Refactor Where Context Actually Matters

I wanted to refactor error handling in my API — something that touches middlewares, handlers, and the Railway client at the same time. Here, parallelization just doesn't apply. The agents need to see the state of the code after the previous agent made changes. It's sequential by nature.

I tried it anyway. The result was merge conflicts that took me longer to resolve than the original refactor would have. Lesson burned in permanently.

The Mistakes I Made (and the Bottleneck That Wasn't Speed)

After a week of testing, reviewing my logs gave me an uncomfortable feeling I wasn't expecting.

70% of my agent tasks are "scenario 3" style: work that depends on the system's state after each step. Migrations, architecture refactors, changes that propagate effects. For that 70%, parallel agents don't help — they actually generate coordination overhead.

The remaining 30% — independent tasks in modules with no shared state — genuinely benefits. A lot. The speedup is real and measurable there.

But the bottleneck that was actually killing me wasn't speed. It was degraded context. When an agent hits turn 4 with 22k accumulated input tokens, it starts losing coherence about decisions it made in turn 1. Parallel agents don't touch that problem — in fact, with separate contexts per agent, the problem multiplies: each agent has its own partial view of the system.

This connects to something I documented when I built CrabTrap: the problem with agents in production isn't how many things they can do simultaneously, it's how much coherence they maintain across a long session. Adding lanes to the highway doesn't fix the fact that every driver has a different map.

My CrabTrap setup intercepts and evaluates each output before it gets applied. Zed's parallel agents don't have that layer. For independent tasks, it doesn't matter. For interdependent tasks, it matters a lot.

A concrete number: in my scenario 2 tests, 40% of integration conflicts came from implicit assumptions each agent made about shared state. No agent was "wrong" — they were each incomplete, off on their own.

The Claude Code Pro plan pricing situation factors in here too. If you're using parallel agents with frontier models, the cost scales almost linearly. Three agents in parallel ≈ three times the token cost. For independent tasks where you gain real speed, that might be worth it. For tasks where you end up redoing the integration work, you're paying three times for the same result.

FAQ — Frequently Asked Questions About Zed Parallel Agents

Does Zed parallel agents work with any language model?

Zed lets you configure the model provider, so technically yes. In practice, behavior varies quite a bit. I tested primarily with Claude 3.5 Sonnet. With smaller models, the coordination overhead becomes more obvious because each agent has less capacity to infer the implicit state of the system.

What types of tasks benefit most from parallelization?

Tasks with clear module boundaries and no shared state dependencies. Type migrations in independent modules, test generation for decoupled services, documentation translation, linting and formatting. Anything you could do in separate branches without one agent needing to see what the other did.

Does Zed parallel agents replace a setup like Claude Code + CrabTrap?

They're not direct competitors. Zed gives you parallel speed. CrabTrap gives you coherence validation on the output. If your tasks are independent and you don't need intermediate evaluation, Zed is simpler. If you're working with tasks that propagate effects and you need a judgment layer before applying changes, you need something additional. I'd use them as complements, not substitutes.

How long does integrating results from multiple agents actually take?

It depends almost entirely on the degree of coupling between the tasks. In my scenario 1 (independent modules): 4 minutes of manual review. In my scenario 2 (shared dependency): over 25 minutes of conflict resolution. Integration overhead is the hidden cost that speed benchmarks don't show.

Does parallel agents solve the long-context problem in agent sessions?

No. That's the central point of this whole post. Separate contexts per agent means each one has its own window, but none of them has the complete picture. For tasks where systemic coherence matters, this can actually be worse than a single agent with long context. The degraded context problem — where input tokens accumulate and the agent loses coherence about its own earlier decisions — is still an open problem that parallelization doesn't touch.

Is it worth migrating my current setup to Zed to use parallel agents?

If you already have a flow that works, don't throw it out the window. The honest answer: add Zed for the tasks where it clearly wins (independent modules, clean-boundary work), and keep your existing setup for the rest. It's not a migration, it's an additional tool. Same thing I learned with technical debt decisions when evaluating new tools: the adoption cost isn't just setup time, it's the time to understand where it doesn't apply.

Two Different Problems That Keep Getting Confused

I studied Computer Science at UBA while working full time. Some classes I showed up to straight from the office, still in my work clothes. One of the things that took me the longest to really internalize early on was the difference between throughput and latency. You can have extremely high throughput and still wait forever if the bottleneck is in the wrong place.

Parallel agents in Zed improve the throughput of independent tasks. That's real, measurable, and in the right scenarios it's a genuine speedup. But the problem that actually hurts me in my agent workflows isn't throughput — it's context coherence across long sessions. And for that problem, running more agents in parallel is basically adding more pumps to a pipe with a fixed diameter.

My position after a week of testing: Zed parallel agents earns a place in my toolbox for a specific subset of tasks. It doesn't replace the stack I built. It complements it where it wins, and where it doesn't win, I don't use it. That's the honest comparison that the 229 HN points don't tell you.

If you're figuring out how to measure the real cost of your agent decisions beyond raw speed, the place to start is measuring tokens per task before adding more agents. Speed is a seductive metric. Context is the one that actually matters.

This article was originally published on juanchi.dev