loading...

Getting Granular with Git Diff

noelworden profile image Noel Worden ・4 min read

This Week I Learned (15 Part Series)

1) Beginning of a Blog Series 2) Accessing localhost within a Docker Image 3 ... 13 3) Identifying and Removing Hidden Characters 4) Getting Granular with Git Diff 5) Tweaking Logger Outputs on the Fly 6) Keeping Your Hands on the Keyboard: A Few Bash and Git Shortcuts 7) Breaking Down Elixir's `with` Expression 8) How To Write A Custom Elixir Schema Validation 9) The Handful of Commands I Use When Interactive Rebasing with Vim 10) Got 5 Minutes to Spare? Why Not Set up a Custom Commit Message Template? 11) Improving Your Commit Message with the 50/72 Rule 12) Synchronously Looping Over Two Collections in Elixir 13) How to Utilize Enum.any?, with a Refactoring Twist! 14) A Rabbit Hole of Decimal Formatting 15) More Custom Validation Work: Manipulating a Keyword List To and From a Map

This week I learned a little bit more about git. The TL;DR: If the standard git diff command isn't outputting enough detail, try git diff --word-diff or git diff --word-diff-regex=<your regex here>

Git is a tool that, for me, falls into the category of constantly learning. Of course, this is true with just about everything related to software engineering, but with git it feels especially so. There are countless workflows, tools, and GUIs; everyones interactions with git are unique. This last week I was having to manually edit a column of data in a CSV. An example from that column looked something like this:

508585957G

There were a bunch of integers ending in a G, which I needed to drop. The client had been adding the G to make the number unique, but with the app we are building for them, we have other ways to ensure uniqueness. If we drop the G the data becomes cleaner overall, and JOINing tables on that column becomes far less complicated. On the surface, the task of dropping the appended letter seemed easy enough, but what was causing me problems was confirming that I wasn't affecting any other columns in the fairly large CSV (3000+ rows, 20+ columns). After I ran a find-and-replace to drop the G from all the data in that column, I ran a basic git diff on the file, and my output was this:

Entire Line Git Diff

Basically, the comparison was unusable. Because of how the raw CSV file was being diffed by git, and because I changed an attribute in one field, that whole line had changed as git saw it.

Git was representing my one character change as the entire line of the raw CSV changing.

Being overly confident in my find-and-replace abilities, I added the modified CSV to a commit and pushed up the PR. It didn't take too long for a teammate to add a comment to the PR, showing where I had inadvertently altered the data in another column. His comment meant I had two tasks ahead of me: make my find-and-replace more granular, and find a method to get a more detailed diff between the two files. I knew I could figure out the find-and-replace, but I was a bit perplexed by how detailed his diff was when he included a screenshot in his PR comment.

After a quick Slack discussion, it was literally a "well, duh, of course," moment for me. Of course there would be options you could pass to git diff, there are almost always options to pass into commands to get the desired output. But for whatever reason, I just was not in that train of thought and I was taking git-diff at face value.

I'll get to the the actual git options in a minute, but what was surprising to me is that Github displays a more detailed diff than the default git diff I'm running on my machine. That meant if I were to add the changes as a new commit, push that commit up to the PR, the diff view between those two SHAs would be more granular than what I was getting in my terminal. This is that same file diffed with Github:

Github Diff

Of course, there are ways to get a similar output locally. I went with the option --word-diff-regex=, documentation can be found here. I still can't come up with regexes out of thin air, but luckily the web is full of them. Within the first few search results I found a solution so elegant it honestly made me laugh out loud. This diff command:

git diff --word-diff-regex=.

Produced this output:

First Regex

Honestly, I have never come across such simple regex that worked so well! According to the regex helper tool regexer.com the period -or dot- matches any character except line breaks. So using that as a regex is basically forcing the diff tool output to show the diff to the granularity of each character. For me that was great, but my colleague is always looking for more than great, and he passed along a few other regex options that were as good, if not better. The first was this:

git diff --word-diff-regex="[^,]+"

Which produced this:

Second Regex

And there's also this:

git diff --word-diff=color --word-diff-regex="([A-Z]+|[0-9]+)"

Which produced this output:

Third Regex

I'm not going to dive into each of those regexes, regexer.com will do a much better job of explaining. The takeaway is that you can get very detailed git diff comparisons, and --word-diff-regex= is one way to accomplish that. Also, scrolling down the git diff documentation is a nice reminder that the available options for standard git commands are vast.

I'm happy with the output I got from the git-provided option. It's also worth mentioning that there are a few packages and git config add ons that can help with more granular diffing. These are the three options that came up in conversation with the team around this:

diffr
wdiff
diff-highlight


This post is part of an ongoing This Week I Learned series. I welcome any critique, feedback, or suggestions in the comments.

This Week I Learned (15 Part Series)

1) Beginning of a Blog Series 2) Accessing localhost within a Docker Image 3 ... 13 3) Identifying and Removing Hidden Characters 4) Getting Granular with Git Diff 5) Tweaking Logger Outputs on the Fly 6) Keeping Your Hands on the Keyboard: A Few Bash and Git Shortcuts 7) Breaking Down Elixir's `with` Expression 8) How To Write A Custom Elixir Schema Validation 9) The Handful of Commands I Use When Interactive Rebasing with Vim 10) Got 5 Minutes to Spare? Why Not Set up a Custom Commit Message Template? 11) Improving Your Commit Message with the 50/72 Rule 12) Synchronously Looping Over Two Collections in Elixir 13) How to Utilize Enum.any?, with a Refactoring Twist! 14) A Rabbit Hole of Decimal Formatting 15) More Custom Validation Work: Manipulating a Keyword List To and From a Map

Posted on Apr 6 by:

noelworden profile

Noel Worden

@noelworden

Software Engineer in Boulder, CO - Writing code and getting strategically lost in the mountains

Discussion

markdown guide