What if you find a sensitive data is committed to Git? You should remove that file. What if your repository gets too large and it takes over an hour to clone? You should remove large files to reduce repository size.
However, removing those files and committing that change is not enough. The sensitive data or large files still exist in Git history.
Therefore, you should remove sensitive data or large files from the entire repository history.
How to do that? Use git-filter-repo
.
git-filter-repo
git-filter-repo
is a tool to rewrite entire repository history. It's fast and safe.
Removing a single file
If you want to remove a file called sensitive.md
:
$ git filter-repo --path sensitive.md --invert-paths
Parsed 104 commits
New history written in 0.16 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 58387b2 Modify README
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 4 (delta 0)
Completely finished after 0.38 seconds.
--path
option specifies which path to include to the new history. With --invert-path
option, --path
means which path to exclude from the new history.
Then, a file called [sensitive.md](http://sensitive.md)
is completely removed from the entire history. Therefore, it looks sensitive.md
didn't exist from the initial commit.
Removing all files bigger than a certain size
If you want to remove files whose size is over 100KB, you can use --strip-blobs-bigger-than
option as follows:
$ git filter-repo --strip-blobs-bigger-than 100K 466ms Fri Jan 10 22:23:35 2020
Processed 318 blob sizes
Parsed 106 commits
New history written in 0.10 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 0fc502c Modify README
Enumerating objects: 312, done.
Counting objects: 100% (312/312), done.
Delta compression using up to 4 threads
Compressing objects: 100% (209/209), done.
Writing objects: 100% (312/312), done.
Total 312 (delta 98), reused 312 (delta 98)
Computing commit graph generation numbers: 100% (104/104), done.
Completely finished after 0.31 seconds.
There are many other examples at git-filter-repo
man page.
Why not git filter-branch
?
git filter-branch
command used to be an official way to rewrite history. However, you'll see a warning like below when you try to execute git filter-branch
on Git 2.24.0 or later.
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Compared to git filter-branch
, git-filter-repo
has several advantages:
- Simple
- Fast
- Safe
For example, when removing a file that has modified 100 times, git filter-branch
takes 17 times longer than git-filter-repo
! The repository I used on that test is public here, you can test by yourself. Removing sensitive.md
from this repo, git filter-repo
took 0.84 second and git filter-branch
took 14.49 seconds.
For more details about the git filter-branch
issues, see git filter-branch
man page.
Install
git filter-repo
is not included in the official Git command, so you should install it by yourself. If you use a package manager like Homebrew, you can use those tools.
For more details, see the official installation documentation.
Top comments (0)