DEV Community

Snappy Tools
Snappy Tools

Posted on

How Text Diff Works: The Myers Algorithm Explained

When you run git diff, look at a pull request, or use a diff tool to compare files, you're relying on an algorithm to find the minimum set of changes that transforms one text into another. The most common algorithm used is the Myers diff algorithm (1986). Here's how it works conceptually and what that means when you use diff tools.

The problem diff solves

Given two texts A and B, find the shortest edit script (SES) — the minimum number of insertions and deletions to transform A into B.

"Minimum" is the key word. There are many ways to express the changes between two texts. git diff and virtually all other tools aim to show the smallest set of changes, because that's what makes diffs readable.

A "line" that changed = one deletion + one insertion. Two lines swapped = two deletions + two insertions. An entire block moved = deletions from the old position + insertions at the new position.

The LCS approach

An equivalent formulation: find the Longest Common Subsequence (LCS) of A and B. The lines in both texts that appear in both (in order, but not necessarily contiguous) don't need to change — only the lines NOT in the LCS need to be deleted or inserted.

For example:

A: cat sat on the mat
B: cat sat under the mat
Enter fullscreen mode Exit fullscreen mode

LCS: cat sat the mat

The diff: on was deleted, under was inserted.

The Myers edit graph

Eugene Myers' 1986 paper describes the algorithm as a shortest-path search on an "edit graph":

  • Rows represent lines of A (0 to n)
  • Columns represent lines of B (0 to m)
  • Moving right = insert a line from B (cost 1)
  • Moving down = delete a line from A (cost 1)
  • Moving diagonally = lines match — no cost

Finding the shortest edit script is equivalent to finding the shortest path from (0,0) to (n,m) in this graph.

The Myers algorithm searches this graph efficiently using a key insight: it explores paths of increasing "edit distance" (D=0 for exact match, D=1 for one change, etc.) rather than exploring all paths. For most real-world diffs, D is small, so the algorithm terminates quickly.

Complexity: O((N+M)D) where N and M are the file lengths and D is the number of edits. For files with few changes (common in version control), this is much faster than the naive O(N×M) dynamic programming approach.

What this means for reading diffs

Understanding the algorithm explains some behaviours that might otherwise seem arbitrary:

Why context matters: diff tools show ±3 lines of context around changes by default. This helps human readers understand what changed without showing irrelevant parts of the file.

Why moved blocks look like delete+insert: a function moved from line 50 to line 200 shows as all those lines deleted at line 50 and all the same lines inserted at line 200. The algorithm doesn't detect moves — it only finds insertions and deletions. Some tools (like git diff --find-renames) add a post-processing step to detect renames and moves.

Why whitespace-only changes appear: by default, a line changed from text to text (two extra spaces) is a deletion + insertion. Use git diff -w or equivalent to ignore whitespace when that's appropriate.

Why two-sided diffs look different from unified diffs: unified diffs show both sides interleaved with + and - prefixes. Side-by-side diffs show the two files in columns. The underlying algorithm is the same; only the presentation differs.

Semantic vs. textual diff

The Myers algorithm operates on tokens (typically lines, or words). It doesn't understand semantics. A function renamed from getUser() to fetchUser() shows as the entire file changed if those names appear throughout. Tools like semantic diff add language-specific parsing to detect structural changes (function renamed, parameter reordered) rather than textual changes.

For code review, this is why "show only changed lines" in a PR sometimes shows more context than you'd expect — the diff engine doesn't know that two changes in the same function are logically related.

Trying it yourself

To see the diff algorithm in action on arbitrary text, the Text Diff Checker compares two blocks of text and highlights exactly which words or lines changed. Useful for comparing versions of a document, API responses, or configuration files where git diff isn't available.


Diff algorithms are one of those cases where a conceptually simple problem (find what changed) has a beautifully efficient solution. Myers' insight — searching paths of increasing edit distance rather than all paths — is why git diff runs instantly on files with a handful of changes even when those files have thousands of lines.

Top comments (0)