DEV Community

Marc Katz
Marc Katz

Posted on

Finding Differences Two Ways: Bash and Diff

For many projects, it is important to be able to figure out the changes that were made between two files or objects. This can be if you're recording edits, making incremental changes, or working collaboratively. In this post, I will describe how to do this through the Bash shell and with the difflib python library.

Bash diff Function

In the bash shell, there is a function diff that can directly compare two files (or directories). The basic format is simple, diff <file1> <file1>. This will output a string called a "diff" or "patch" describing what changes to make to turn file1 into file2. (Sidenote: if the files are identical, there will be no output, but if the files are not text files, it will output 0 if they are identical and 1 if they are not).

Options
There are many useful options to add to the diff command depending on what you need. Some include:

  • --ignore-space-change: ignore changes in whitespace
  • --ignore-blank-lines: ignore lines that are blank
  • --ignore-case: ignore changes in case
  • --minimal: "Try hard to find a smaller set of changes" - from the Man page
  • --recursive: when diffing directories, search through their sub-directories recursively
  • --side-by-side: output the changes in two columns, makes it much more human-readable.
  • --suppress-common-lines: don't output lines that are identical

Reading the output
Let's use the following example: You have two text files, a.txt and b.txt.
a.txt is:

This is line 1
This is line 2
This is line 3
This is line 4

Enter fullscreen mode Exit fullscreen mode

b.txt is:

This is a new line
This is line 1
This is line 2 modified
This is line 3

Enter fullscreen mode Exit fullscreen mode

If we run the following command: diff --minimal a.txt b.txt, we get the output:

0a1
> This is a new line
2c3
< This is line 2
---
> This is line 2 modified
4d4
< This is line 4
Enter fullscreen mode Exit fullscreen mode

There are three sections of this output.

0a1
> This is a new line
Enter fullscreen mode Exit fullscreen mode

The a in the first line means that you have to add the following lines. In this case, 0a1 means you add line 1 from b.txt to line 0 (the start of the file) to a.txt.

2c3
< This is line 2
---
> This is line 2 modified
Enter fullscreen mode Exit fullscreen mode

The c means that this is a change. Here, line 2 in a.txt is replaced by line 3 in b.txt.

4d4
< This is line 4
Enter fullscreen mode Exit fullscreen mode

The d means that there is a delete. Here, line 4 in a.txt is deleted.

difflib Python Library

However, if you want to do this in your code, you can't just use the bash shell. In Python, one way to do this is to use the difflib library. difflib is part of the Python Standard Library, so you don't need to install anything, just add import difflib to your file. There are many commands in this library, but the two we will go over is unified_diff and Differ.compare. We will be using these commands to compare two strings:
a:

This is line 1
This is line 2
This is line 3
This is line 4

Enter fullscreen mode Exit fullscreen mode

and b:

This is a new line
This is bine 1
This is 2a
This is line 3

Enter fullscreen mode Exit fullscreen mode

Note that these strings end with a newline, this is important. If you want to compare two string that don't end in a newline, you will have to add it to use these functions.

unified_diff
unified_diff is the simpler of the two commands, but does not give as much detail as the other one. The format is simple: difflib.unified_diff(list1, list2). The function takes in two list of strings, each ending in a new line. You can easily turn your string into a list like that by running <string>.splintlines(keepends=True). This function will output a "generator function", so you will have to turn it into a list, or immediately iterate through it. For this example, we will use the following code:

diffs = difflib.unified_diff(a.splitlines(keepends=True),b.splitlines(keepends=True))
for line in diffs:
    print(line,end='')
Enter fullscreen mode Exit fullscreen mode

(The end='' is needed because each line of the output already ends in a newline)

Reading the output
For the above example, the output will look like:

--- 
+++
@@ -1,5 +1,5 @@
-This is line 1
-This is line 2
+This is a new line
+This is bine 1
+This is 2a
 This is line 3
-This is line 4
Enter fullscreen mode Exit fullscreen mode

To read this, the first part of describes the lines of the string we are looking at. Here, we are looking at lines 1 to 5 in both strings. Lines starting with - will be deleted from the first string, and lines starting with + will be added from the second string. As you can see, this is very compact, but not very detailed, as it only works with adding and deleting whole lines

Differ.compare
Differ.compare will give much more detailed results than unified_diff. Like unified_diff, you have to use splitlines first, since it takes lists of strings. You also have to create a Differ object, since it is a function of that class. For this example, we will use the following code:

d = difflib.Differ()
results = d.compare(a.splitlines(keepends=True),b.splitlines(keepends=True))
for line in results:
    print(line,end='')
Enter fullscreen mode Exit fullscreen mode

Reading the output
For the above example, the output will look like:

+ This is a new line
- This is line 1
?         ^
+ This is bine 1
?         ^
- This is line 2
?         -----
+ This is 2a
?          +
  This is line 3
- This is line 4
Enter fullscreen mode Exit fullscreen mode

Like, unified_diff, lines starting with + are removed from the first string, and lines starting with - are removed from the second string. However, we now have these ? lines. These lines further describe the changes within each lines. If there is a ^ in that line, the character above the ^ in the line above will be replaced by another character above a ^ in the line below. So in the case above, the l in This is line 1 is replaced by a b. If there are -s, that means those characters are deleted from the line. Here, line is removed from This is line 2. If there are +s, that means the characters are added to the line. Here, an a is added to the end of This is line 2.
As you can see, this is much more complicated than unified_diff, but now you can see the exact changes you need to make to each line and where, instead of just completely removing and adding whole lines.

Sources:
https://ss64.com/bash/diff.html
https://docs.python.org/3/library/difflib.html#differ-objects

Top comments (0)