Someone runs the formatter, the line width changes, and suddenly the pull request touches 80 lines. You scroll through all of it looking for the actual change — and there isn't one. Or there's exactly one, buried in 79 lines of re-wrapping. We've all reviewed that PR.
git diff -w is supposed to save you here, and it half does: it ignores spacing changes. But it's still line-anchored, so it cannot fold reflow — re-wrap a function signature across three lines and git diff -w still shows 1 removed + 3 added, even though not a single token changed. (This is GitHub discussion #20610, "Ignore Format Changes in Diff" — open and unanswered for years.)
So I built logicdiff: a diff that folds away whitespace and reflow, and tells you whether a change is real or just formatting.
$ logicdiff old.js new.js
only formatting differs - no logical change (a line diff would show 80 changed lines)
$ logicdiff a.js b.js
--- a.js
+++ b.js
-42: const total = price * qty;
+51: const total = price + qty;
1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
It exits 0 when the change is formatting-only and 1 when it's logical — so CI can flag "this PR is more than a reformat, review carefully." Zero dependencies, and pip install logicdiff gets the same tool in Python with byte-for-byte identical output.
How it folds reflow
The trick is to stop diffing lines and diff tokens. logicdiff tokenizes each file into a stream where a token is a run of [A-Za-z0-9_] or a single punctuation character, and whitespace is dropped entirely. So all of these collapse to the same stream [a, +, b]:
a+b a + b a +
b
Respacing and line breaks become invisible. Then it runs a plain Myers diff on the token streams. Equal streams ⇒ formatting-only. Different ⇒ the changed tokens get mapped back to their line numbers and shown. Crucially, tokens are compared by text only — their line numbers are metadata — which is exactly why a token that moved to a different line still matches.
It's language-agnostic on purpose: no parser, no grammar, works on any text (code, YAML, logs, DSLs). The trade-off, which I document rather than hide: like git diff -w, whitespace inside string literals is also ignored — "a b" and "a b" read as formatting-only. (difftastic gets this right with per-language tree-sitter parsing, but it's a multi-megabyte binary that needs a grammar per language and has no "is this formatting-only?" exit code. logicdiff is the zero-config, language-agnostic middle ground.)
The fun part: two languages, one diff, to the byte
I wanted a Node build and a Python build that emit identical output — and a diff is a brutal place to attempt that, because Myers has many equal-cost edit scripts and the one you emit depends entirely on tie-breaking. Pick < vs <= in one line and the two languages silently diverge (both "correct", different output).
So the whole thing is pinned:
- One canonical Myers variant, ported literally to both languages: V seeded
V[1]=0,kfrom-dtodstep 2, the down-vs-right choice is exactlyk == -d || (k != d && V[k-1] < V[k+1])(strict<), and V is snapshotted before each round so backtracking reads the right one. Nodifflib.SequenceMatcher(its heuristics wouldn't match). - Files are read as latin-1 — the one encoding where byte ↔ codepoint is a total bijection — so any input (UTF-8, binary, broken) decodes deterministically, token equality is byte equality, and Node's UTF-16 string indexing vs Python's codepoint indexing stops mattering.
- The tokenizer uses explicit ASCII classes, never
\w/\s(those are Unicode-aware in Python but ASCII in JS).
A differential fuzz test runs both builds over 500 random file pairs and gets zero byte differences.
Install
npx logicdiff old new # Node, zero deps
pip install logicdiff # Python, zero deps, identical output
MIT licensed, both builds open source:
- npm: https://www.npmjs.com/package/logicdiff
- PyPI: https://pypi.org/project/logicdiff/
- GitHub: https://github.com/jjdoor/logicdiff (Node) · https://github.com/jjdoor/logicdiff-py (Python)
What's your worst "the diff is all noise" story — a formatter run, a line-ending flip, a mass re-indent? And would a formatting-only CI signal have helped?
Top comments (0)