Find Changes Between Two Git Commits Without Cloning

eoinsha profile image Eoin Shanaghy ・3 min read

There are some cases where you want to find out information about changes in your Git repository without having to clone the full repository. This will usually be in your automated build environment. When I used Jenkins, Travis or Circle CI, I had access to the cloned Git repository and could use git log, git ls-remote and git diff without any problem.

Other tools, and I am talking specifically about AWS CodeDeploy, take a different approach. Instead of giving you access to a cloned repo, AWS CodeDeploy gives you a snapshot of your code without the .git folder. This makes it impossible to run checks on what has changed since a previous build or even to determine what has changed in the commit that triggered your build. Some CI environments will give you a "shallow clone" without the full Git history, leaving you with a similar challenge.

I wanted to run these kind of checks to determine which microservices in our monorepo had changed so I knew which ones to build and redeploy. This is a technique described well in this Shippable blog post.

I looked at two options to find out folders which had seen changes since the last successful deployment:

  1. Clone the full repository manually in a CodeBuild step
  2. Use the GitHub API to retrieve information about the commits

The first option was one I wanted to avoid. It meant cloning a potentially large and growing repository at the start of the build. A shallow clone would not be sufficient as it would not capture the history of changes back to the previous release.

The GitHub REST API includes a compare API and a list-commits API. The compare API is limited to 250 commits so that couldn't be relied on. The get-commits API could work but it means making multiple paged requests for a large amount of data just to get the changed paths. After a bit of trial and error, I ultimately abandoned the GitHub API approach.

After some further digging, I came across a StackOverflow post that gave me a third option. It allows me to fetch the two individual commits using the git command and compare then to determine changed filenames. In this example, I'm using the public lodash/lodash repository. Assume we want to compare the changes between the tag 4.0.0 and the HEAD of the master branch, the sequence of commands looks like this:

git init .                                               # Create an empty repository
git remote add origin git@github.com:lodash/lodash.git   # Specify the remote repository

git checkout -b base                                     # Create a branch for our base state

git fetch origin --depth 1 4.0.0                         # Fetch the single commit for the base of our comparison
git reset --hard FETCH_HEAD                              # Point the local master to the commit we just fetched

git checkout -b target                                   # Create a branch for our target state

git fetch origin --depth 1 master                        # Fetch the single commit for the target of our comparison
git reset --hard FETCH_HEAD                              # Point the local target to the commit we just fetched

git diff --name-only base target                         # Print a list of all files changed between the two commits

The directory size with this minimal fetching approach is 4.6M compared to 49M for the full lodash repository.

Comparing two git commits

I'm the CTO at fourTheorem. Follow me on twitter: @eoins

Posted on by:

eoinsha profile

Eoin Shanaghy


CTO @fourTheorem; Co-Author of AI as a Service https://www.manning.com/books/ai-as-a-service


markdown guide