A reformatted PR shows 80 changed lines but changed nothing — I built a zero-dep diff that sees through it

#showdev #git #cli #opensource

Someone runs the formatter, the line width changes, and suddenly the pull request touches 80 lines. You scroll through all of it looking for the actual change — and there isn't one. Or there's exactly one, buried in 79 lines of re-wrapping. We've all reviewed that PR.

git diff -w is supposed to save you here, and it half does: it ignores spacing changes. But it's still line-anchored, so it cannot fold reflow — re-wrap a function signature across three lines and git diff -w still shows 1 removed + 3 added, even though not a single token changed. (This is GitHub discussion #20610, "Ignore Format Changes in Diff" — open and unanswered for years.)

So I built logicdiff: a diff that folds away whitespace and reflow, and tells you whether a change is real or just formatting.

$ logicdiff old.js new.js
only formatting differs - no logical change (a line diff would show 80 changed lines)

$ logicdiff a.js b.js
--- a.js
+++ b.js
-42:   const total = price * qty;
+51:   const total = price + qty;

1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)

It exits 0 when the change is formatting-only and 1 when it's logical — so CI can flag "this PR is more than a reformat, review carefully." Zero dependencies, and pip install logicdiff gets the same tool in Python with byte-for-byte identical output.

How it folds reflow

The trick is to stop diffing lines and diff tokens. logicdiff tokenizes each file into a stream where a token is a run of [A-Za-z0-9_] or a single punctuation character, and whitespace is dropped entirely. So all of these collapse to the same stream [a, +, b]:

a+b        a + b        a +
                          b

Respacing and line breaks become invisible. Then it runs a plain Myers diff on the token streams. Equal streams ⇒ formatting-only. Different ⇒ the changed tokens get mapped back to their line numbers and shown. Crucially, tokens are compared by text only — their line numbers are metadata — which is exactly why a token that moved to a different line still matches.

It's language-agnostic on purpose: no parser, no grammar, works on any text (code, YAML, logs, DSLs). The trade-off, which I document rather than hide: like git diff -w, whitespace inside string literals is also ignored — "a b" and "a b" read as formatting-only. (difftastic gets this right with per-language tree-sitter parsing, but it's a multi-megabyte binary that needs a grammar per language and has no "is this formatting-only?" exit code. logicdiff is the zero-config, language-agnostic middle ground.)

The fun part: two languages, one diff, to the byte

I wanted a Node build and a Python build that emit identical output — and a diff is a brutal place to attempt that, because Myers has many equal-cost edit scripts and the one you emit depends entirely on tie-breaking. Pick < vs <= in one line and the two languages silently diverge (both "correct", different output).

So the whole thing is pinned:

One canonical Myers variant, ported literally to both languages: V seeded V[1]=0, k from -d to d step 2, the down-vs-right choice is exactly k == -d || (k != d && V[k-1] < V[k+1]) (strict <), and V is snapshotted before each round so backtracking reads the right one. No difflib.SequenceMatcher (its heuristics wouldn't match).
Files are read as latin-1 — the one encoding where byte ↔ codepoint is a total bijection — so any input (UTF-8, binary, broken) decodes deterministically, token equality is byte equality, and Node's UTF-16 string indexing vs Python's codepoint indexing stops mattering.
The tokenizer uses explicit ASCII classes, never \w/\s (those are Unicode-aware in Python but ASCII in JS).

A differential fuzz test runs both builds over 500 random file pairs and gets zero byte differences.

Install

npx logicdiff old new       # Node, zero deps
pip install logicdiff       # Python, zero deps, identical output

MIT licensed, both builds open source:

npm: https://www.npmjs.com/package/logicdiff
PyPI: https://pypi.org/project/logicdiff/
GitHub: https://github.com/jjdoor/logicdiff (Node) · https://github.com/jjdoor/logicdiff-py (Python)

What's your worst "the diff is all noise" story — a formatter run, a line-ending flip, a mass re-indent? And would a formatting-only CI signal have helped?