When an AI Agent joins your Yjs room

#webdev #frontend #ai #javascript

Wiring an LLM as a first-class Yjs peer is architecturally sound — but it invalidates three silent assumptions your collaboration stack already makes about peer symmetry: throughput, undo ownership, and presence cadence.

You've tuned a Yjs provider under real collaborative load. You know the feeling before you can name it — one heavy client starts lagging the room, presence updates stutter, and you end up adding a debounce somewhere and calling it done.

Now imagine that client generates text at 3,000 words per minute, never goes offline, and has its own awareness cursor.

That's not a sidebar feature. That's a new class of peer, and your collaboration architecture wasn't designed for it.

The Demo Is Real — But It Skips the Hard Parts

In April 2026, a working demo wired an LLM as a genuine server-side Yjs document peer — same transport as the human editors, same CRDT, its own awareness state. The implementation uses y-prosemirror and the standard awareness protocol directly. If you've shipped TipTap collaboration, you already have every dependency it needs.

The architecture is correct. Making the agent a server-side peer — rather than a client-side bolt-on posting diffs over a REST endpoint — gives you one convergence model instead of two, real presence semantics for the agent, and a clean separation between the LLM streaming layer and the document state layer.

But the demo establishes the peer model. It doesn't stress-test what happens to your existing assumptions once that peer is running.

The Silent Assumption Every CRDT Implementation Makes

Here it is — the assumption baked into the Yjs awareness protocol, the undo manager, and your backpressure strategy, the one nobody wrote down because it was always true until now:

All peers produce operations at roughly human speed.

Not identical speed. Human typists vary. But they land in the same order of magnitude. The entire design space — how often you broadcast awareness, how you scope undo history, whether you need per-peer rate limiting at the application layer — rests on that implicit contract.

An AI agent at 1,000–4,000 words per minute is 25–100× outside that range. It doesn't just stress your transport. It invalidates the mental model.

Here's what actually breaks.

1. Backpressure: The Chokepoint You Don't Have

A central OT server can throttle any client trivially — it's the authority, it controls the queue. A CRDT peer model has no natural chokepoint. That's the tradeoff you accepted when you chose Yjs, and it's usually fine because human peers self-limit.

An agent peer doesn't self-limit. Left unrestricted, its doc.transact() calls will flood the sync cycle and starve human-paced operations of their share of the convergence window. This is write starvation — the same class of problem as database concurrency — and it manifests as cursor lag and dropped presence updates for everyone else in the room.

The fix doesn't belong at the transport layer. It belongs between the LLM's streaming output and the Yjs document write:

// Token bucket between LLM stream and Yjs write
const agentBucket = new TokenBucket({
  capacity: 50,     // max queued ops
  refillRate: 10,   // ops per 100ms — keeps agent below human starvation threshold
});

llmStream.on('token', async (token) => {
  await agentBucket.consume(1);
  ydoc.transact(() => {
    ytext.insert(insertionPoint, token);
  }, agentOrigin);
});

The numbers are illustrative — tune them against your provider and room size. The point is that the rate limit lives at the application layer, scoped to the agent's origin, so human operations always get a guaranteed share of the convergence window regardless of how fast the model is generating.

This is also where the CRDT-vs-OT debate gets re-litigated in 2026. The peer model is still right for human collaboration. For AI agents specifically, you're adding a lightweight central constraint back in — not for correctness, but for fairness.

2. Undo History: The Origin Problem You Probably Already Have

y-undomanager scopes undo history by origin. This is correct behavior and it's documented. But "correct" and "deliberate" aren't the same thing.

If the agent's operations share an origin with the user's, Ctrl+Z becomes a coin flip. If the agent gets its own origin — which it should — you now have a second question: should user-facing undo ever surface agent operations, and if so, in what order relative to the user's own history?

There's no universal answer, but there is a clear principle: give the agent a separate UndoManager with its own trackedOrigins, and expose agent-undo as a distinct UI affordance, not the default Ctrl+Z path.

const userUndoManager = new Y.UndoManager(ytext, {
  trackedOrigins: new Set([userOrigin]),
});

const agentUndoManager = new Y.UndoManager(ytext, {
  trackedOrigins: new Set([agentOrigin]),
});

// User's Ctrl+Z only touches userUndoManager.
// "Reject AI suggestion" calls agentUndoManager.undo().
// These stacks don't interfere.

This is the same design decision you face when adding comment marks or tracked-change marks to ProseMirror — marks that describe content rather than being content need a separate lifecycle from marks the user controls directly. The agent peer is the document-level version of that same pattern.

If you've ever had a user accidentally undo a comment thread someone else left, you've already felt this problem. The fix is the same: make the ownership boundary explicit at the manager level, not implicit in a if (origin === agentOrigin) return buried in a command handler.

3. Presence and Awareness: Coalesce or Drown

The awareness protocol was designed for human-paced cursor updates. A few broadcasts per second per peer is normal; the rendering layer handles it fine.

An agent generating 3,000 wpm produces position changes at a rate no human can visually process. Broadcasting all of them is noise on the wire and in the React render cycle.

Two things to do. First, coalesce awareness updates on a fixed interval for agent peers — not per-operation:

let pendingAwarenessUpdate: ReturnType<typeof setTimeout> | null = null;

function updateAgentAwareness(pos: number) {
  if (pendingAwarenessUpdate) return;
  pendingAwarenessUpdate = setTimeout(() => {
    provider.awareness.setLocalStateField('cursor', { anchor: pos, head: pos });
    pendingAwarenessUpdate = null;
  }, 300);
}

Second, add a type field to the agent's awareness state so the rendering layer can distinguish it from a human cursor without conditional logic scattered across components:

provider.awareness.setLocalState({
  type: 'agent',
  streaming: true,
  name: 'AI Assistant',
  cursor: { anchor: insertionPoint, head: insertionPoint },
});

"AI is writing" and "another person is typing" are different affordances. They deserve different visual treatments and different update rates. Encoding that distinction in the awareness state lets the rendering layer make the right call in one place.

What This Means for Your RFC

The agent-as-peer pattern is the right architecture. Connecting the LLM to Yjs is not the hard part.

The hard part is going back through every assumption your collaboration system makes about peer symmetry and making those assumptions explicit — so you can break them deliberately for the agent peer without breaking them for everyone else.

Concretely: your backpressure strategy assumed no single peer can dominate the convergence cycle, so it needs an application-layer token bucket scoped to the agent's origin. Your undo history assumed all tracked origins belong to the user, so the agent needs a separate UndoManager surfaced as a distinct UI action. Your awareness rendering assumed cursor updates arrive at human speed, so agent presence needs coalescing and a type discriminant in the awareness state.

None of these are hard to implement once you've named them. The risk is shipping the integration without naming them and finding the failure modes through user reports six weeks later when the collaborative load is real and the undo history is a mess.

Treat rate limiting, undo isolation, and presence coalescing as first-class line items in the RFC. Not edge cases caught in code review.

The April 2026 demo and companion repo are at electric.ax/blog/2026/04/08/ai-agents-as-crdt-peers-with-yjs. The y-prosemirror + awareness setup maps directly onto a TipTap stack — worth reading alongside the Yjs UndoManager docs if you're planning the integration.

Why this, why now: The April 2026 demo is the first working implementation of the agent-as-Yjs-peer pattern on a production-equivalent stack (y-prosemirror, awareness protocol, Durable Streams), and it landed just weeks ago. The "agent velocity problem" it surfaces is genuinely new — CRDT literature has no prior answer for asymmetric peer throughput at this scale — and every team currently building collaborative AI editing features will hit the same three failure modes. Writing this now, before the pattern calcifies into bad defaults, is exactly the right time.