DEV Community

Cover image for Files with the most changes on Git repository
Augusto Pascutti
Augusto Pascutti

Posted on • Edited on

Files with the most changes on Git repository

Reading the history of a repository is useful for multiple things. There are many ways go through it, below we will list the files with most changes and then filter changes made just on them:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head
$ git log --stat -- $(!!)
Enter fullscreen mode Exit fullscreen mode

Other than $(!!) which tells bash to "run the latest successful command" (!!) inside a "sub-shell" ($()), I will detail what we executed below.

Using the history

You can read commits on the current branch using git log, but what if you focus on the changes of a single file?

$ git log -- src/The/Path/To/The/File/With/Most/Changes.js
Enter fullscreen mode Exit fullscreen mode

Explaining the command above:

  • git log shows the changes, from the most recent to older, made to the repository. By default it displays only the commit message of each change;
  • -- tells Git to stop trying to parse options (stuff like -p or --reverse) and start parsing arguments. For git log arguments are paths of the repository (or or more);
  • src/The/Path/To/The/File/With/Most/Changes.js is a file that exists on our hypothetical repository, it makes git log filter changes affecting only that path. You could use other things, instead of a single path pointing to a file:
    • src/** to filter changes made just to inside this path,
    • *.txt to filter changes made to files with the txt extension

Focusing on one or more paths allows you to go deeper on the history of a single part of the project, which could provide a rough idea of what the team can achieve and on what period for example.

How to list files with the most changes?

You know how navigate the history of specific files, now we want to know which files changed the most on our repository. That can be achieved in 3 steps:

  1. List files changed in a commit, for every commit;
  2. Count how many times each file appears on that list;
  3. Display only the top ones

List files changed in a commit

git log has the option --name-only which will display the path to all files changed in a commit. Formatting the commit message to an empty format will only display the files:

$ git log --name-only --pretty="format:"
Enter fullscreen mode Exit fullscreen mode

If you try the command above, you will notice that for every commit an empty line appears. Those empty lines are the commit messages we removed, to get rid of that empty line we can sed '/^\s*$/', making the whole command:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d
Enter fullscreen mode Exit fullscreen mode

Count how many times each file appears in the list

You can use uniq to avoid listing duplicate items, with -c as option you count their occurrences.

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | uniq -c
Enter fullscreen mode Exit fullscreen mode

Since uniq only joins consecutive lines, we need to sort our list before passing it to uniq:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c
Enter fullscreen mode Exit fullscreen mode

The output of the command above will be <count> <path>, so we can use sort with the --reverse option to display the files with the most occurrences:

 $ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r
Enter fullscreen mode Exit fullscreen mode




Limiting output

We can used head to filter only the first lines, or tail to filter only the last ones. The -n <itens> tells how many occurrences we want to limit:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head -n 10
Enter fullscreen mode Exit fullscreen mode




What else?

I usually limit changes made in the last year (`git log --since "1 year ago"). I use this every time I get in touch with a new team, allows me to get to know them better.

I also don't blindly go into the "most changed files" in the project. As I want to know more about the project and people, I try to focus on controllers or models first so I get a grasp on what kind of changes they suffer.

Do you think this will help you? In what way?

Top comments (6)

Collapse
 
pinguinjkeke profile image
Alexander Avakov

Very good article. It helped me a lot. Thanks

Collapse
 
oliworx profile image
Oliver Kurmis

Your command is broken: sed is missing the d to delete the line and the second sort is missing the n option to sort numerical

git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -nr | head -n 20
Enter fullscreen mode Exit fullscreen mode
Collapse
 
augustohp profile image
Augusto Pascutti

Thanks! I've fixed the article

Collapse
 
waylonwalker profile image
Waylon Walker

$(!!) Trick is genius

Collapse
 
augustohp profile image
Augusto Pascutti

It is! It is specially useful while you are figuring the part of a longer command (as above).

I use it a lot with for loops, to lint changed files for example:

$ git diff --name-only
$ for f in $(!!)
do
php -l "$f"
done
Enter fullscreen mode Exit fullscreen mode
Collapse
 
waylonwalker profile image
Waylon Walker

🤯