loading...

Git: Tracking Down a Committed File, with a Real-Time Ah-Ha Moment!

noelworden profile image Noel Worden ・5 min read

This Week I Learned (20 Part Series)

1) Beginning of a Blog Series 2) Accessing localhost within a Docker Image 3 ... 18 3) Identifying and Removing Hidden Characters 4) Getting Granular with Git Diff 5) Tweaking Logger Outputs on the Fly 6) Keeping Your Hands on the Keyboard: A Few Bash and Git Shortcuts 7) Breaking Down Elixir's `with` Expression 8) How To Write A Custom Elixir Schema Validation 9) The Handful of Commands I Use When Interactive Rebasing with Vim 10) Got 5 Minutes to Spare? Why Not Set up a Custom Commit Message Template? 11) Improving Your Commit Message with the 50/72 Rule 12) Simultaneously Looping Over Two Collections in Elixir 13) How to Utilize Enum.any?, with a Refactoring Twist! 14) A Rabbit Hole of Decimal Formatting 15) More Custom Validation Work: Manipulating a Keyword List To and From a Map 16) Stepped Away for a Long Weekend 17) So Many Ways to Update a Map with Elixir! 18) Git: Keeping Files from Being Tracked Without .gitignore 19) Git: Tracking Down a Committed File, with a Real-Time Ah-Ha Moment! 20) Setting up `.iex.exs` to Speed up Your Elixir Debugging (and How to Include It in a Docker Image)

I picked up another git nugget this week. The TL;DR is that you can see what files were included in a commit with:

git log --name-only

If you'd like the whole story, I'll need to set the scene a bit before I get into the good stuff. When I need to reset my database on this project, its not repopulated with a seed file, but instead we use a copy of the production database. This allows us to have the most accurate data to query. Because of this, and how I setup my commands through docker, I'll temporarily have a copy of the production database in my local repo, usually called prod_dump.

This file shows up as an untracked file in git, and every once and a while my muscle memory gets the best of me, I run a git add . by accident and that prod_dump file gets included in the commit. It's happened, I'll reset, remove it, re-commit, and move on with my day. But on this occurrence it wasn't that simple. I was working through some PR comments when it happened and had run a fixup on that particular commit, squashing it into another commit. By the time I realized what I had done, I didn't know which commit contained the addition of prod_dump.

Its a big file, ~500 MB. What clued me into what had happened was that I got an error when I tried to push up the reworked commits:

File prod_dump is 553.85 MB;
this exceeds GitHub's file size limit of 100.00 MB

I had run fixup on four or five files in one go, and I couldn't think of an obvious way to trace it back to which commit contained the huge file. I also couldn't utilize Github, because now the changes were too big to push up. My first attempt was to simply delete the now added prod_dump file, and commit that. I hoped that adding it then removing it would be perceived by git as it never actually being added in the first place. I was wrong, but not surprised. Git is still going to track that the file was added and then deleted, meaning that it would push to Github with the file, and be rejected.

I ran git log, but it only showed the commit, author, date, and message. For some reason I went to VSCode instead of a search engine and started looking around. I don't usually use the VSCode interface for git, but I thought it might have something useful. I right-clicked on the prod_dump file and saw an option labeled Open Timeline.

Clicking on that did pretty much exactly what I expected, showing the git history of that file. Knowing which commit contained the file, I could take the commit I added with the deleted file, squash it into the commit where the file was added, and now as far as git was concerned, the adding of prod_dump never happened and I could push the changes to Github.

Having solved the problem, but not liking the idea that I relied on a GUI to do it, I turned to the search engine. I was able to find the command in short order:

git log --name-only

This command presents a log, like before, but now includes the name of the files included in the commit, here it is in the docs.

The exact commits and PR this happened with have long since been merged in, but I setup a quick example so I could walk through it again with some visuals.

commit e1efa840200dd7b2f56486e0b31ef4d4958c310f (HEAD -> nw/testing-commits)
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:33:48 2020 -0600

    fourth commit, with test_file_c.md

test_file_c.md

commit 64bf0ba2354f9b7ca228399d0357b747d3666d4b
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:33:20 2020 -0600

    third commit, with test_file_b.md

test_file_b.md

commit 603a6dc4334f68b13313e73c6ef024eb1ffdaaf7
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:33:05 2020 -0600

    second commit, prod_dump

prod_dump

commit 10dad7c90038d182f922f32101eb982b916b5060
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:32:16 2020 -0600

    first commit, with test_file_a.md

test_file_a.md

In the above example, running git log --name-only there are four commits, all committing one file each, with the prod_dump as the second commit. If I were to replicate the fixup mistake I described earlier, I would get something that looks like this:

commit 3b8e5e2ae46f5ee704452acb47a2ee20504a3e38 (HEAD -> nw/testing-commits)
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:33:48 2020 -0600

    fourth commit, with test_file_c.md

test_file_c.md

commit 10b5cca013f1163871311a4f48428538a5200426
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:33:20 2020 -0600

    third commit, with test_file_b.md

test_file_b.md

commit c004e03abf6420d21a40ee1679a1aad57215fe9e
Author: Noel Worden <noelworden@gmail.com>
Date:   Fri Jul 24 12:32:16 2020 -0600

    first commit, with test_file_a.md

prod_dump
test_file_a.md

Now, git log --name-only shows that prod_dump is added in the first commit.

As I said earlier, this can also be achieved in VSCode. If you go to the side bar, in Explorer and right-click the file there are several options:

VSCode File Tree with Menu

Click Open Timeline and at the bottom of the side bar a Timeline will open, in this case showing that prod_dump was added with the first commit.

VSCode Timeline View

It's funny how it has taken me this long to get to a point where I needed to track which commit contained a particular file. I guess this was the first time I didn't have Github to help with the process. There's always one more thing to learn when it comes to git.

The Ah-Ha Moment
About two minutes into staring this post I realized that I could save myself from this kind of hassle in the future if I just have git ignore the prod_dump file. I can do this by either the .gitignore file or better yet, referring to my post from last week, add it to my personal git exclude file. Should have done it months ago.


This post is part of an ongoing This Week I Learned series. I welcome any critique, feedback, or suggestions in the comments.

This Week I Learned (20 Part Series)

1) Beginning of a Blog Series 2) Accessing localhost within a Docker Image 3 ... 18 3) Identifying and Removing Hidden Characters 4) Getting Granular with Git Diff 5) Tweaking Logger Outputs on the Fly 6) Keeping Your Hands on the Keyboard: A Few Bash and Git Shortcuts 7) Breaking Down Elixir's `with` Expression 8) How To Write A Custom Elixir Schema Validation 9) The Handful of Commands I Use When Interactive Rebasing with Vim 10) Got 5 Minutes to Spare? Why Not Set up a Custom Commit Message Template? 11) Improving Your Commit Message with the 50/72 Rule 12) Simultaneously Looping Over Two Collections in Elixir 13) How to Utilize Enum.any?, with a Refactoring Twist! 14) A Rabbit Hole of Decimal Formatting 15) More Custom Validation Work: Manipulating a Keyword List To and From a Map 16) Stepped Away for a Long Weekend 17) So Many Ways to Update a Map with Elixir! 18) Git: Keeping Files from Being Tracked Without .gitignore 19) Git: Tracking Down a Committed File, with a Real-Time Ah-Ha Moment! 20) Setting up `.iex.exs` to Speed up Your Elixir Debugging (and How to Include It in a Docker Image)

Posted on by:

noelworden profile

Noel Worden

@noelworden

Software Engineer in Boulder, CO - Writing code and getting strategically lost in the mountains

Discussion

markdown guide