DEV Community

Cover image for A reformatted PR shows 80 changed lines but changed nothing — I built a zero-dep diff that sees through it
benjamin
benjamin

Posted on

A reformatted PR shows 80 changed lines but changed nothing — I built a zero-dep diff that sees through it

Someone runs the formatter, the line width changes, and suddenly the pull request touches 80 lines. You scroll through all of it looking for the actual change — and there isn't one. Or there's exactly one, buried in 79 lines of re-wrapping. We've all reviewed that PR.

git diff -w is supposed to save you here, and it half does: it ignores spacing changes. But it's still line-anchored, so it cannot fold reflow — re-wrap a function signature across three lines and git diff -w still shows 1 removed + 3 added, even though not a single token changed. (This is GitHub discussion #20610, "Ignore Format Changes in Diff" — open and unanswered for years.)

So I built logicdiff: a diff that folds away whitespace and reflow, and tells you whether a change is real or just formatting.

$ logicdiff old.js new.js
only formatting differs - no logical change (a line diff would show 80 changed lines)

$ logicdiff a.js b.js
--- a.js
+++ b.js
-42:   const total = price * qty;
+51:   const total = price + qty;

1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
Enter fullscreen mode Exit fullscreen mode

It exits 0 when the change is formatting-only and 1 when it's logical — so CI can flag "this PR is more than a reformat, review carefully." Zero dependencies, and pip install logicdiff gets the same tool in Python with byte-for-byte identical output.

How it folds reflow

The trick is to stop diffing lines and diff tokens. logicdiff tokenizes each file into a stream where a token is a run of [A-Za-z0-9_] or a single punctuation character, and whitespace is dropped entirely. So all of these collapse to the same stream [a, +, b]:

a+b        a + b        a +
                          b
Enter fullscreen mode Exit fullscreen mode

Respacing and line breaks become invisible. Then it runs a plain Myers diff on the token streams. Equal streams ⇒ formatting-only. Different ⇒ the changed tokens get mapped back to their line numbers and shown. Crucially, tokens are compared by text only — their line numbers are metadata — which is exactly why a token that moved to a different line still matches.

It's language-agnostic on purpose: no parser, no grammar, works on any text (code, YAML, logs, DSLs). The trade-off, which I document rather than hide: like git diff -w, whitespace inside string literals is also ignored — "a b" and "a b" read as formatting-only. (difftastic gets this right with per-language tree-sitter parsing, but it's a multi-megabyte binary that needs a grammar per language and has no "is this formatting-only?" exit code. logicdiff is the zero-config, language-agnostic middle ground.)

The fun part: two languages, one diff, to the byte

I wanted a Node build and a Python build that emit identical output — and a diff is a brutal place to attempt that, because Myers has many equal-cost edit scripts and the one you emit depends entirely on tie-breaking. Pick < vs <= in one line and the two languages silently diverge (both "correct", different output).

So the whole thing is pinned:

  • One canonical Myers variant, ported literally to both languages: V seeded V[1]=0, k from -d to d step 2, the down-vs-right choice is exactly k == -d || (k != d && V[k-1] < V[k+1]) (strict <), and V is snapshotted before each round so backtracking reads the right one. No difflib.SequenceMatcher (its heuristics wouldn't match).
  • Files are read as latin-1 — the one encoding where byte ↔ codepoint is a total bijection — so any input (UTF-8, binary, broken) decodes deterministically, token equality is byte equality, and Node's UTF-16 string indexing vs Python's codepoint indexing stops mattering.
  • The tokenizer uses explicit ASCII classes, never \w/\s (those are Unicode-aware in Python but ASCII in JS).

A differential fuzz test runs both builds over 500 random file pairs and gets zero byte differences.

Install

npx logicdiff old new       # Node, zero deps
pip install logicdiff       # Python, zero deps, identical output
Enter fullscreen mode Exit fullscreen mode

MIT licensed, both builds open source:

What's your worst "the diff is all noise" story — a formatter run, a line-ending flip, a mass re-indent? And would a formatting-only CI signal have helped?

Top comments (0)