For many projects, it is important to be able to figure out the changes that were made between two files or objects. This can be if you're recording edits, making incremental changes, or working collaboratively. In this post, I will describe how to do this through the Bash shell and with the difflib python library.
Bash diff Function
In the bash shell, there is a function diff
that can directly compare two files (or directories). The basic format is simple, diff <file1> <file1>
. This will output a string called a "diff" or "patch" describing what changes to make to turn file1
into file2
. (Sidenote: if the files are identical, there will be no output, but if the files are not text files, it will output 0
if they are identical and 1
if they are not).
Options
There are many useful options to add to the diff
command depending on what you need. Some include:
-
--ignore-space-change
: ignore changes in whitespace -
--ignore-blank-lines
: ignore lines that are blank -
--ignore-case
: ignore changes in case -
--minimal
: "Try hard to find a smaller set of changes" - from the Man page -
--recursive
: when diffing directories, search through their sub-directories recursively -
--side-by-side
: output the changes in two columns, makes it much more human-readable. -
--suppress-common-lines
: don't output lines that are identical
Reading the output
Let's use the following example: You have two text files, a.txt
and b.txt
.
a.txt
is:
This is line 1
This is line 2
This is line 3
This is line 4
b.txt
is:
This is a new line
This is line 1
This is line 2 modified
This is line 3
If we run the following command: diff --minimal a.txt b.txt
, we get the output:
0a1
> This is a new line
2c3
< This is line 2
---
> This is line 2 modified
4d4
< This is line 4
There are three sections of this output.
0a1
> This is a new line
The a
in the first line means that you have to add the following lines. In this case, 0a1
means you add line 1 from b.txt
to line 0 (the start of the file) to a.txt
.
2c3
< This is line 2
---
> This is line 2 modified
The c
means that this is a change. Here, line 2 in a.txt
is replaced by line 3 in b.txt
.
4d4
< This is line 4
The d
means that there is a delete. Here, line 4 in a.txt
is deleted.
difflib Python Library
However, if you want to do this in your code, you can't just use the bash shell. In Python, one way to do this is to use the difflib
library. difflib
is part of the Python Standard Library, so you don't need to install anything, just add import difflib
to your file. There are many commands in this library, but the two we will go over is unified_diff
and Differ.compare
. We will be using these commands to compare two strings:
a
:
This is line 1
This is line 2
This is line 3
This is line 4
and b
:
This is a new line
This is bine 1
This is 2a
This is line 3
Note that these strings end with a newline, this is important. If you want to compare two string that don't end in a newline, you will have to add it to use these functions.
unified_diff
unified_diff
is the simpler of the two commands, but does not give as much detail as the other one. The format is simple: difflib.unified_diff(list1, list2)
. The function takes in two list of strings, each ending in a new line. You can easily turn your string into a list like that by running <string>.splintlines(keepends=True)
. This function will output a "generator function", so you will have to turn it into a list
, or immediately iterate through it. For this example, we will use the following code:
diffs = difflib.unified_diff(a.splitlines(keepends=True),b.splitlines(keepends=True))
for line in diffs:
print(line,end='')
(The end=''
is needed because each line of the output already ends in a newline)
Reading the output
For the above example, the output will look like:
---
+++
@@ -1,5 +1,5 @@
-This is line 1
-This is line 2
+This is a new line
+This is bine 1
+This is 2a
This is line 3
-This is line 4
To read this, the first part of describes the lines of the string we are looking at. Here, we are looking at lines 1 to 5 in both strings. Lines starting with -
will be deleted from the first string, and lines starting with +
will be added from the second string. As you can see, this is very compact, but not very detailed, as it only works with adding and deleting whole lines
Differ.compare
Differ.compare
will give much more detailed results than unified_diff
. Like unified_diff
, you have to use splitlines
first, since it takes lists of strings. You also have to create a Differ
object, since it is a function of that class. For this example, we will use the following code:
d = difflib.Differ()
results = d.compare(a.splitlines(keepends=True),b.splitlines(keepends=True))
for line in results:
print(line,end='')
Reading the output
For the above example, the output will look like:
+ This is a new line
- This is line 1
? ^
+ This is bine 1
? ^
- This is line 2
? -----
+ This is 2a
? +
This is line 3
- This is line 4
Like, unified_diff
, lines starting with +
are removed from the first string, and lines starting with -
are removed from the second string. However, we now have these ?
lines. These lines further describe the changes within each lines. If there is a ^
in that line, the character above the ^
in the line above will be replaced by another character above a ^
in the line below. So in the case above, the l
in This is line 1
is replaced by a b
. If there are -
s, that means those characters are deleted from the line. Here, line
is removed from This is line 2
. If there are +
s, that means the characters are added to the line. Here, an a
is added to the end of This is line 2
.
As you can see, this is much more complicated than unified_diff
, but now you can see the exact changes you need to make to each line and where, instead of just completely removing and adding whole lines.
Sources:
https://ss64.com/bash/diff.html
https://docs.python.org/3/library/difflib.html#differ-objects
Top comments (0)