Jiahang Zhang

Posted on Apr 22

Our release tool crashed on a large PR. Here's the React diff library I ended up writing.

#webdev #opensource #react #performance

I maintain an internal release tool at work. Every release gets a preview page that shows a side-by-side diff of what's going out. A few weeks ago I was on call, and I got paged: a user couldn't load one of the release pages. The tab wasn't just slow — Chrome was giving them the sad-face "aw, snap" crash screen. Refresh. Crash again. Different machine, different network, same crash.

I opened the same page in my browser. Same result: the tab didn't freeze, it actually crashed. Smaller PRs worked fine. The problematic one touched about 85,000 lines — mostly a regenerated lockfile plus a go.sum bump.

Because I was on call and people needed to ship, my short-term fix was embarrassing: I flipped the feature flag that enabled the diff viewer on release pages. That unblocked the user — at the cost of shipping a release tool that couldn't show diffs, which was sort of the whole point of the page. "Your release tool works, it just doesn't do the thing it's for" was not a great follow-up message to the user.

That was the start of looking at the diff viewer we were using.

Why this was happening

We were using react-diff-viewer. It was picked a long time ago because the API is clean and the output looks nice. Small and medium PRs render almost instantly. The issue starts showing up past something like 5,000 lines, because the library renders every row of the diff into the DOM up front — no windowing, no virtualization, no progressive anything. Render time grows linearly with diff size. Memory grows linearly. Somewhere past 50k lines the tab stops being slow and just dies.

That's not a bug. react-diff-viewer was built back when a "big" diff meant a couple thousand lines. It's also effectively unmaintained now — no release since 2022. There's a community fork, react-diff-viewer-continued, which I'm glad exists, but it carries the same core rendering design. It has to — changing that is a rewrite, not a patch.

And it's not just me hitting this. There's a long-standing open issue on the original repo where people have been asking for virtualization and performance fixes for years. It was never solved upstream. It's the kind of issue that's hard to solve without reshaping the whole rendering model.

The state of the ecosystem, and what's actually missing

I want to be clear about something before going further: there's been good recent work on React diff viewers. Some newer libraries have genuinely nicer styling, better theming, more thoughtful details around word-level highlighting. A few even run diff computation in a Web Worker so the main thread stays responsive. These are real improvements.

My point isn't that the ecosystem is bad. It's that the specific thing that paged me that night — the DOM itself giving up once you have tens of thousands of rendered rows — is orthogonal to most of that work. Computing the diff off-thread is great, but if the final step is still to commit 80k DOM nodes in one render, the main thread will still block or the tab will still crash regardless of how fast the diff was computed.

Virtualization isn't a new idea. react-window, react-virtualized, and react-virtuoso have been around for years. It just hasn't been combined with diff rendering in this specific niche, probably because the combination is awkward to retrofit into an existing diff library. That's essentially the whole premise of what I ended up building: not a clever new technique, just the obvious technique applied to the place that needed it.

What I tried before writing anything new

react-diff-viewer-continued — the fork. Same DOM strategy, same crash on the large PR.

react-diff-view — a more modern library with a lower-level API. You feed it a unified diff and it hands you hunks. On paper this was promising, because I could plug hunks into a virtual list myself. In practice, at 50k lines, my benchmark clocks 7.6 seconds initial render, 631 MB memory, and scroll FPS around 12. At 100k it hits 1.3 GB. The library isn't doing anything wrong — the work at that scale is just expensive without windowing.

Rolling my own with react-window — I actually got something working in an afternoon. The problem isn't rendering; it's that a production diff viewer needs things that don't come for free when you retrofit virtualization onto a diff library written without it in mind:

Collapsed unchanged blocks that expand on click (so you can't just map line index to array index)
Syntax highlighting that doesn't recompute on every scroll
Correct scroll positions when someone deep-links to a specific line
Split view with synchronized scroll between two panes None of these are hard individually. Together they're a library.

So I stopped trying to compose something and started writing one.

The design question

The interesting call was whether to diff first or virtualize first.

The obvious answer is "diff first, then virtualize": compute the full diff, then feed the resulting rows into a virtual list. This is what you'd do if you were just gluing existing things together. It's also what makes the 100k case still expensive in the current implementation — diffing two 100k-line strings takes about 7 seconds on an M1 before rendering starts. More on this below; it's the thing I most want to fix.

I went with diff-first anyway because line-level virtualization plus hunk-aware collapse needs to know the full structure up front. If you stream the diff in chunks, you either give up on "collapse unchanged blocks around context" (because you don't yet know what's unchanged), or you do a lot of bookkeeping to stitch hunks together as they arrive. I tried a version of the second approach and threw it away — a few hundred lines of fragile code to save maybe 30% of the diff computation.

So the library computes the full diff, then hands it to a Virtuoso-backed list that renders only the visible rows. Collapse state lives in a separate structure keyed by hunk ID. Scroll-to-line works because every row's offset from the top is known.

This gets you smooth 60 FPS scrolling through a 100k-line diff even though the initial computation took a while. Whether that tradeoff is right depends on your use case. For code review, where people spend minutes scrolling and jumping around, I think it is. For a context where people only glance at the top and leave, it's probably the wrong call.

Benchmark

I wrote a harness against the four libraries above, on 1k / 10k / 50k / 100k line diffs. Numbers below are from a recent run on an M1 Pro, Chrome, warm cache. Full results and the harness are in the repo.

Library	Lines	Initial render	FPS	Memory
react-virtualized-diff	10,000	127 ms	60	9.5 MB
react-diff-viewer	10,000	1,308 ms	58	64.8 MB
react-diff-viewer-continued	10,000	1,304 ms	58	64.8 MB
react-diff-view	10,000	1,434 ms	60	132.6 MB
react-virtualized-diff	50,000	1,536 ms	60	23.4 MB
react-diff-viewer	50,000	timeout (> 60s)	—	—
react-diff-viewer-continued	50,000	timeout (> 60s)	—	—
react-diff-view	50,000	7,613 ms	13	631 MB
react-virtualized-diff	100,000	7,490 ms	60	104 MB
react-diff-view	100,000	16,988 ms	6	1,297 MB

The 100k row is the one I want to be honest about. 7.5 seconds isn't great. It's better than 17 seconds, and it's better than the other two libraries not finishing within a minute at all — but it's still a big number, and it's where the current design shows its seams.

What's still bad, and what I'm working on

At 100k lines, almost all the time is the diff computation itself, not the rendering. Once the diff is computed, scrolling is fine. Getting to first paint is what takes seconds.

On the list:

Move diff computation off the main thread into a Web Worker — render a "computing…" state and stream results in. Some other libraries already do this; I should too.
A coarser diff strategy for extreme cases — line-level only, no word-level highlighting — behind a clear opt-in.
Streaming hunks as they're computed. I still think this is the hard path, but it's tractable if the API is designed around it from the start instead of retrofitted. Also worth flagging: in-page search (Cmd/Ctrl+F) won't find text in rows that haven't been rendered yet. This is a known tradeoff of virtualization. There are workarounds, but it's worth knowing up front — if you're building something like a log viewer where search is the primary interaction, this isn't the right choice.

Using it

Minimum usage is two props:

import { DiffViewer } from 'react-virtualized-diff';

<DiffViewer original={oldText} modified={newText} />

That's it. Two strings, you get a working side-by-side diff with collapsed unchanged blocks and 60 FPS scrolling at any size.

To make migration painless for anyone coming from react-diff-viewer, I also accept most of its prop names — oldValue / newValue, splitView, showDiffOnly, useDarkTheme, highlightLines, renderContent, extraLinesSurroundingDiff, and so on. In most cases you can change the import and keep the rest of your code.

Links:

GitHub: https://github.com/Zhang-JiahangH/react-virtualized-diff
Live demo: https://www.zhangjiahang.com/react-virtualized-diff/
npm: react-virtualized-diff

Closing

This is the first open-source library I've published. I'd like it to be useful beyond my own on-call rotation — if you're hitting the same wall on large diffs, give it a try and tell me what breaks. Issues, PRs, and Discussions are all open on the repo, and I'd rather hear about a real use case that's slow than a benchmark that's fast.

And if someone has already solved this in a way I missed, please tell me. The goal is that nobody else has to redo this work.

DEV Community