Every developer knows git diff. But the need to compare two pieces of text extends far beyond version-controlled source code. Configuration files from two servers. API responses before and after a change. Database schemas across environments. SQL query outputs. Log files from two different runs.
For these cases, you need a diff tool that works with arbitrary text, not just git-tracked files.
How diff algorithms work
The core algorithm behind most diff tools is the Longest Common Subsequence (LCS) problem. Given two sequences, find the longest subsequence present in both. Everything not in the LCS is either an addition or a deletion.
The classic Myers diff algorithm (used by git) solves this in O(ND) time, where N is the total length and D is the number of differences. For files that are mostly similar (the common case), this is fast.
The output format matters:
Unified diff: Shows changes with context lines. Lines starting with - are removed, + are added. This is what git diff produces.
@@ -1,4 +1,4 @@
server:
- port: 3000
+ port: 8080
host: localhost
- debug: true
+ debug: false
Side-by-side diff: Shows both versions in parallel columns with changed lines highlighted. More readable for large files with many changes.
Inline diff: Shows character-level differences within changed lines. Useful when lines have small changes (like a single number or word).
Real-world diff use cases
Configuration comparison: You have a config file on staging and production. Are they identical? If not, which values differ? This catches deployment issues before they cause outages.
API response validation: Before and after deploying a change, capture the same API response. Diff them to verify only the intended fields changed.
Database migration verification: Export table schemas before and after a migration. The diff confirms that only the intended structural changes were applied.
Document comparison: Legal documents, contracts, and specifications that go through revision. Track exactly what changed between versions without relying on the author's summary.
Log analysis: Two log files from the same process under different conditions. Diffing them highlights the divergence point.
Beyond text: structured diff
For JSON, XML, and YAML, a plain text diff is noisy because formatting differences (key ordering, whitespace, indentation) appear as changes when the semantic content is identical.
Semantic diffing first normalizes the structure (sort keys, standardize whitespace) and then compares the normalized versions. This shows only meaningful changes.
// File A
{"name": "Alice", "age": 30, "city": "NYC"}
// File B
{"city": "NYC", "name": "Alice", "age": 31}
// Text diff shows everything changed (key order)
// Semantic diff shows only: age: 30 → 31
Performance considerations
Diffing small files (under 10,000 lines) is instant with any algorithm. For very large files (millions of lines), the choice of algorithm and implementation matters. The patience diff algorithm (available in git with --patience) produces more intuitive results for files with many similar lines, like sorted data files.
The tool
For quick text comparison without setting up a local tool or uploading files to a third-party service, I built a diff checker that runs entirely in the browser. Paste in two texts, see the differences highlighted side by side or inline.
I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.
Top comments (0)