Rizwan Saleem

Posted on May 31

Building a Real-Time Collaborative Diff Engine: How We Cut Code Review Time by 47%

#react #typescript #frontend #webdev

Building a Real-Time Collaborative Diff Engine: How We Cut Code Review Time by 47%

By a Senior Engineer | May 2026 | 12 min read
Three years ago, our team of 22 engineers was spending an average of 4.2 hours per week per person in code review. Not reading code - waiting. Waiting for context to load, waiting for comments to sync, waiting for someone else to finish their review thread before adding your own. The async friction was killing our velocity.

So I built something. This post is the full story: the architecture decisions, the numbers, and everything I wish I had known before I started.

The Problem Worth Solving

Most teams accept review latency as inevitable. We did too, until I started tracking it.

After two weeks of instrumentation, the data was uncomfortable:

62% of PR round-trips took longer than 8 hours - not because reviews were hard, but because reviewers couldn't see live co-editing signals
38% of comments were made on lines that had already changed during the review session
Engineers with overlapping timezones still behaved as if they were async-only

The root cause was structural: our review tooling - like almost every tool in the market - is built on a request/response model. You load a snapshot of a diff, leave comments on it, and submit. Meanwhile the branch keeps moving. The diff you reviewed is already stale.

We needed a living diff: one that updated in real time, showed who was looking at which hunk, and merged comment threads intelligently even as new commits landed.

Architecture Overview

The system I built has three layers:

┌─────────────────────────────────────────────────┐
│              Browser (React + Yjs)              │
│  ┌───────────────┐    ┌────────────────────┐    │
│  │  Diff Renderer │    │  Awareness Overlay │    │
│  │  (virtual DOM) │    │  (cursor/presence) │    │
│  └───────┬───────┘    └─────────┬──────────┘    │
└──────────┼──────────────────────┼───────────────┘
           │ WebSocket            │ WebRTC (peer)
┌──────────▼──────────────────────▼───────────────┐
│         Sync Gateway (Node + uWS)               │
│  ┌────────────────────────────────────────┐     │
│  │   Yjs CRDT Document per PR branch      │     │
│  └────────────────────────────────────────┘     │
└──────────┬──────────────────────────────────────┘
           │
┌──────────▼──────────────────────────────────────┐
│         Diff State Store (Redis Streams)        │
│   git diff output → tokenised hunk graph        │
└─────────────────────────────────────────────────┘

The critical design decision was representing the diff itself as a CRDT document using Yjs rather than a static data structure. This meant that when a new commit landed mid-review, we could apply the delta surgically rather than re-rendering the entire diff from scratch.

The Hunk Graph: Diffing the Diff

The most technically interesting piece was the hunk graph. A standard git diff output looks like this:

@@ -14,7 +14,9 @@ function processOrder(order) {
-  const tax = order.total * 0.2;
+  const tax = calculateTax(order.total, order.region);
+  const shipping = resolveShipping(order);
   return { ...order, tax };

The problem: if a reviewer has pinned a comment to line 14 and a new commit shifts that function to line 18, the comment becomes orphaned. Tools like GitHub handle this with a position integer that becomes meaningless after a rebase.

My solution was to treat every hunk as a node in a directed acyclic graph, keyed not by line number but by a content fingerprint + structural context hash:

interface HunkNode {
  id: string;                  // SHA256(content + parentContext)
  content: string[];           // raw diff lines
  parentId: string | null;     // structural parent hunk
  baseLineStart: number;       // original line (informational only)
  headLineStart: number;
  comments: CommentRef[];      // attached comment IDs, not positions
  version: number;             // monotonic counter
}

function computeHunkId(lines: string[], parentContext: string): string {
  const payload = lines.join('\n') + '||' + parentContext;
  return crypto.createHash('sha256').update(payload).digest('hex').slice(0, 16);
}

When a new commit arrived, the diff engine reconciled old hunk IDs against the new DAG. Hunks that survived the commit (same content, shifted position) kept their comment threads. Hunks that were meaningfully changed got flagged as evolved, and comments were marked possibly stale rather than dropped entirely.

function reconcileHunks(
  previousGraph: Map<string, HunkNode>,
  incomingGraph: Map<string, HunkNode>
): ReconciliationResult {
  const survived: HunkNode[] = [];
  const evolved: [HunkNode, HunkNode][] = [];  // [old, new]
  const added: HunkNode[] = [];
  const removed: HunkNode[] = [];

  for (const [id, incoming] of incomingGraph) {
    if (previousGraph.has(id)) {
      survived.push(incoming);
    } else {
      // Check structural similarity - same parent context, different content
      const similar = findSimilarHunk(previousGraph, incoming);
      if (similar) {
        evolved.push([similar, incoming]);
      } else {
        added.push(incoming);
      }
    }
  }

  for (const [id, old] of previousGraph) {
    if (!incomingGraph.has(id) && !evolved.find(([o]) => o.id === id)) {
      removed.push(old);
    }
  }

  return { survived, evolved, added, removed };
}

This reconciliation ran in under 12ms for diffs up to 3,000 lines on a cold Node.js process - fast enough to trigger on every WebSocket message without throttling.

Real-Time Presence: Making Invisible Work Visible

The second core feature was reviewer awareness. We used Yjs's awareness protocol to broadcast which hunks each reviewer had their viewport over, and for how long.

// Client-side awareness setup
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';

const ydoc = new Y.Doc();
const provider = new WebsocketProvider(
  'wss://review-sync.internal',
  `pr-${prId}`,
  ydoc
);

// Broadcast viewport state every 500ms
function broadcastViewport(visibleHunkIds: string[]) {
  provider.awareness.setLocalStateField('viewport', {
    hunks: visibleHunkIds,
    user: currentUser.id,
    timestamp: Date.now(),
  });
}

// Render other reviewers' positions
provider.awareness.on('change', () => {
  const states = Array.from(provider.awareness.getStates().values());
  const otherViewports = states
    .filter(s => s.user?.id !== currentUser.id && s.viewport)
    .map(s => s.viewport);

  renderPresenceOverlay(otherViewports);
});

The UI rendered this as a subtle heat map: hunks that multiple reviewers had viewed recently glowed amber; hunks nobody had looked at stayed neutral. This single feature changed reviewer behaviour measurably - people stopped re-reviewing already-covered territory.

Redis Streams as the Diff Event Bus

The sync gateway needed to handle bursty commit events without losing state. I chose Redis Streams over a full message queue because of the consumer group semantics and the sub-millisecond latency at our scale.

// Publishing a new diff event when CI detects a push
async function publishDiffUpdate(prId: string, commitSha: string) {
  const diff = await git.diff(baseSha, commitSha);
  const hunkGraph = buildHunkGraph(diff);

  await redis.xadd(
    `pr:${prId}:diffs`,
    '*',   // auto-generated stream ID
    'commitSha', commitSha,
    'hunkGraph', JSON.stringify(hunkGraph),
    'timestamp', Date.now().toString()
  );
}

// Consuming in the sync gateway
async function consumeDiffEvents(prId: string) {
  const streamKey = `pr:${prId}:diffs`;
  let lastId = '$';  // start from newest

  while (true) {
    const results = await redis.xread(
      'COUNT', 10,
      'BLOCK', 1000,
      'STREAMS', streamKey, lastId
    );

    if (!results) continue;

    for (const [, messages] of results) {
      for (const [id, fields] of messages) {
        const hunkGraph = JSON.parse(fields[fields.indexOf('hunkGraph') + 1]);
        await applyDiffUpdate(prId, hunkGraph);
        lastId = id;
      }
    }
  }
}

We ran two consumer instances behind a Redis consumer group for redundancy. Failover happened in under 200ms with no message loss.

The Numbers: What Actually Changed

We rolled this out to our team over a 6-week period in Q3 2024. Here is what we measured before and after:

Metric	Before	After	Change
Avg PR round-trip time	9.3 hours	4.9 hours	−47%
Stale comment rate	38%	6%	−84%
Reviews completed same session	21%	61%	+190%
Avg comments per PR	8.4	5.1	−39%
Reviewer overlap utilised	~0%	44%	+44pp

The stale comment reduction was the most significant quality improvement. Fewer stale comments meant fewer "this is already fixed" dismissals - which meant reviewers felt more respected, and authors trusted feedback more.

The drop in average comments per PR surprised me. I expected more comments as reviews got easier. What actually happened: reviewers could see someone else had already noted an issue (via the presence overlay), so they stopped piling on. Review became genuinely collaborative rather than competitive.

Lessons Learned

1. CRDTs are worth the learning curve for collaborative tools. I spent three weeks understanding Yjs before writing a single line of product code. That investment paid off in months of avoided conflict-resolution bugs.

2. Fingerprint content, not position. Any system that anchors metadata to line numbers will rot when the underlying content shifts. Content-addressable hunk IDs were the single best architectural decision I made.

3. Presence changes behaviour without requiring enforcement. We never told reviewers to "cover different hunks." The visibility alone changed how they divided work. Good tooling nudges rather than mandates.

4. Redis Streams beat Kafka at our scale. Below ~50,000 events/day, Redis Streams gives you consumer groups, persistence, and replay without the operational overhead of a Kafka cluster. We only needed Kafka-grade infrastructure when we expanded to 200+ teams.

5. Measure the behaviour you want, not the metric you can see. PR merge time is easy to track. The metric that mattered was "did the reviewer see live feedback on a line they were actively writing a comment about?" - that required custom instrumentation from day one.

What I Would Do Differently

The hunk graph reconciliation works well but is computationally isolated to the gateway. If I rebuilt this today, I would push the reconciliation logic into a shared WASM module deployed identically in the browser and on the server - so the client can speculatively reconcile locally before the server confirms, reducing perceived latency further.

I would also invest earlier in a replay testing harness. Our Redis Streams setup made it easy to record real diff sequences, but we built the replay test runner six months in, after three subtle reconciliation bugs. Those bugs would have been caught in week two with the right harness.

Try It Yourself

The core hunk graph engine is framework-agnostic and can be dropped into any Node.js or Deno backend. Here is the minimal setup to run a local diff reconciliation:

npm install yjs y-websocket ioredis

// minimal-diff-engine.ts
import { buildHunkGraph } from './hunk-graph';
import { reconcileHunks } from './reconcile';

const v1 = buildHunkGraph(await readFile('diff-v1.patch', 'utf8'));
const v2 = buildHunkGraph(await readFile('diff-v2.patch', 'utf8'));

const result = reconcileHunks(v1, v2);

console.log(`Survived: ${result.survived.length}`);
console.log(`Evolved:  ${result.evolved.length}`);
console.log(`Added:    ${result.added.length}`);
console.log(`Removed:  ${result.removed.length}`);

Run it against any two versions of the same PR diff and you immediately see how much structural continuity exists across commits - usually far more than engineers expect.

Call to Action

If you are working on developer tooling, code review infrastructure, or collaborative editing at scale, I want to hear from you - especially if you have hit the edge cases I have not.

Specifically, I am interested in talking to engineers who have:

Tackled CRDT merge conflicts in a code diff context (not just text documents)
Built reviewer assignment algorithms that account for live presence data
Scaled WebSocket sync beyond 10,000 concurrent PR sessions

Drop a comment below, connect with me on LinkedIn, or open a discussion thread on GitHub. The most interesting problems in developer productivity tooling are still unsolved - and they are best solved in public, with a community that has already hit the same walls.

The 47% reduction in review time was not the end. It was proof that the problem is worth taking seriously. If you are working on this space, let's build something better together.

Rizwan Saleem | https://rizwansaleem.co