Reading the history of a repository is useful for multiple things. There are many ways go through it, below we will list the files with most changes and then filter changes made just on them:
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head
$ git log --stat -- $(!!)
Other than $(!!)
which tells bash to "run the latest successful command" (!!
) inside a "sub-shell" ($()
), I will detail what we executed below.
Using the history
You can read commits on the current branch using git log
, but what if you focus on the changes of a single file?
$ git log -- src/The/Path/To/The/File/With/Most/Changes.js
Explaining the command above:
-
git log
shows the changes, from the most recent to older, made to the repository. By default it displays only the commit message of each change; -
--
tells Git to stop trying to parse options (stuff like-p
or--reverse
) and start parsing arguments. Forgit log
arguments are paths of the repository (or or more); -
src/The/Path/To/The/File/With/Most/Changes.js
is a file that exists on our hypothetical repository, it makes git log filter changes affecting only that path. You could use other things, instead of a single path pointing to a file:-
src/**
to filter changes made just to inside this path, -
*.txt
to filter changes made to files with the txt extension
-
Focusing on one or more paths allows you to go deeper on the history of a single part of the project, which could provide a rough idea of what the team can achieve and on what period for example.
How to list files with the most changes?
You know how navigate the history of specific files, now we want to know which files changed the most on our repository. That can be achieved in 3 steps:
- List files changed in a commit, for every commit;
- Count how many times each file appears on that list;
- Display only the top ones
List files changed in a commit
git log
has the option --name-only
which will display the path to all files changed in a commit. Formatting the commit message to an empty format will only display the files:
$ git log --name-only --pretty="format:"
If you try the command above, you will notice that for every commit an empty line appears. Those empty lines are the commit messages we removed, to get rid of that empty line we can sed '/^\s*$/'
, making the whole command:
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d
Count how many times each file appears in the list
You can use uniq
to avoid listing duplicate items, with -c
as option you count their occurrences.
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | uniq -c
Since uniq
only joins consecutive lines, we need to sort
our list before passing it to uniq
:
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c
The output of the command above will be <count> <path>
, so we can use sort with the --reverse
option to display the files with the most occurrences:
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r
Limiting output
We can used head
to filter only the first lines, or tail
to filter only the last ones. The -n <itens>
tells how many occurrences we want to limit:
$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head -n 10
What else?
I usually limit changes made in the last year (`git log --since "1 year ago"). I use this every time I get in touch with a new team, allows me to get to know them better.
I also don't blindly go into the "most changed files" in the project. As I want to know more about the project and people, I try to focus on controllers or models first so I get a grasp on what kind of changes they suffer.
Do you think this will help you? In what way?
Top comments (6)
Very good article. It helped me a lot. Thanks
Your command is broken: sed is missing the d to delete the line and the second sort is missing the n option to sort numerical
Thanks! I've fixed the article
$(!!) Trick is genius
It is! It is specially useful while you are figuring the part of a longer command (as above).
I use it a lot with
for
loops, to lint changed files for example:🤯