DEV Community

Cover image for Cleaning Your Git History: Safely Removing Sensitive Data
Balogh Botond
Balogh Botond

Posted on • Edited on

Cleaning Your Git History: Safely Removing Sensitive Data

Table of Contents

  1. Introduction
  2. Understanding git filter-branch
  3. Step-by-Step Guide

Introduction

While working on integrating third-party API into my project, I made a critical mistake in software development. I accidentally committed my API keys directly in a configuration file, rather than using a .env file that is part of the .gitignore. My repository was still private, but with plans to go public, I needed to remove this information from the git history.

How to remove the commits containing the sensitive information from my history? After some research, I decided to use the git filter-branch command. In the end, it worked as it was expected. I did not lose anything else from my git history only the commits related to the certain config file and the config file itself which I had to recreate with the help of a backup branch.

Whether you're a seasoned developer or new to the realm of version control, this guide will provide a clear, step-by-step approach to ensure to remove safely commits from a branch that are related to a specific file.

Git filter-branch

About git filter-branch, it is a powerful and yet dangerous tool of git. It allows you to modify your commit history in various ways, such as changing commit messages, author details, or, as in my case, removing commits that contain sensitive information like API keys. It works by filtering the entire branch history, committing anew based on the changes you specify.

Please use it with caution. It can lead to data loss or discrepancies in your project's history. It's recommended to use this tool in a controlled environment (like a private repository) and ensure you have complete backups of your data before proceeding. Furthermore, if you're working in a team, coordinating with your colleagues is essential to avoid conflicts or data inconsistencies once the history is altered. Check the documentation and especially safety information before using it. Git filter-branch documentation

Dombledore meme

Step-by-Step Guide

The branch I was working on was named secret_keys in which I committed and pushed the secret_key in the config.js file. That’s where the trouble begins… I did not realise it for a while so many other commits had been pushed to the repo.

Commits to remove

First, I added the .env to my .gitignore and pushed it to the repo. After that, I created the .env file which is not part of the version history, and added the secret key to that file from my config.js I referred to the .env file instead of hardcoding the secret key in the config.js file itself.

.env secret key

In this situation, the secret key is available in the version control. I must get rid of this. Here comes the git filter-branch command. But first, let me create a backup branch for safety: secret_key_backup.

Then I opened a terminal in my project, and made sure that I was on the desired branch:



git checkout secret_keys


Enter fullscreen mode Exit fullscreen mode

Output:



Switched to branch 'secret_keys'
Your branch is up to date with 'origin/secret_keys'.


Enter fullscreen mode Exit fullscreen mode

Here you can see the main commits that must be removed for safety
Main commits to be removed

Next, I ran the git filter-branch command:



git filter-branch --force --index-filter "git rm --cached --ignore-unmatch file_containing_secret_keys.js" --prune-empty --tag-name-filter cat -- branch_name_containing_commits_to_remove


Enter fullscreen mode Exit fullscreen mode
  • git filter-branch: Initiates the process of rewriting the git history.
  • --force: Forces the rewrite of history, necessary if you've already run filter-branch before. Overrides the safety checks to prevent overwriting existing backups.
  • --index-filter: Specifies a filter to rewrite the index at each revision.
  • "git rm --cached --ignore-unmatch file_containing_secret_keys.js": The command that --index-filter runs for each commit.
    • git rm --cached: Removes the specified file from the index (staging area) but not from the working directory.
    • --ignore-unmatch: Prevents errors if the file isn't found in a particular commit.
    • file_containing_secret_keys.js: The specific file you want to remove from your history.
  • --prune-empty: Tells git filter-branch to remove commits that become empty after the filter has been applied.
  • --tag-name-filter cat: This filter is applied to tag names that point to rewritten commits. cat just keeps the original tag name.
  • --: Separator that clarifies the end of the options and the beginning of the branch specification.
  • branch_name_containing_commits_to_remove: Specifies the branch whose history you want to rewrite.

This command is tailored to remove a specific file from the history of a given branch, eliminating all references to that file in your commits.

In my case:



git filter-branch --force --index-filter "git rm --cached --ignore-unmatch config.js" --prune-empty --tag-name-filter cat -- secret_keys


Enter fullscreen mode Exit fullscreen mode

Output:



**WARNING**: git-filter-branch has a glut of gotchas generating mangled history rewrites. Hit Ctrl-C before proceeding to abort, then use an alternative filtering tool such as 'git filter-repo' (https://github.com/newren/git-filter-repo/) instead. See the filter-branch manual page for more details; to squelch this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.

Proceeding with filter-branch...

Rewrite a3a48b09e282854c80bf4ad02a017e249e161fd8 (2/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 6e788e83a338e45b348d93d682b32c816ee2fbff (3/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 7a378a0145bce70bea213ca5f9062138544db5f2 (4/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 0637c9659623644cfceb35be10f2a1fe5c468e04 (5/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 6c421eb99adc6b987cff7f3cada31e9313638072 (6/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 98001e5b97270efa4a8ab5bd0452be56dd76883d (7/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'
Rewrite 2ca4e161a4af2b8f38c46faf848fdbb3e550f23c (8/8) (0 seconds passed, remaining 0 predicted)    rm 'config.js'

Ref 'refs/heads/secret_keys' was rewritten.


Enter fullscreen mode Exit fullscreen mode

Commits removed

In the image, you can see 2 commits left, as the placeholder commits I made were related to the config.js so they were removed as well. If I pull the repo then I would get back to the previous state so the deleted commits can be retrieved from the repo at this point. It will create a merge commit which you could push back and then all your commits would be restored, but we want them to be deleted so do not do that unless you want to play around a bit.

To apply the modification, you need to force push:



git push origin branch_name_containing_commits_to_remove --force


Enter fullscreen mode Exit fullscreen mode

Proceed with caution!
Once you force push this modification, there is NO RETURN!
No return from force push

In my case:



git push origin secret_keys --force                        


Enter fullscreen mode Exit fullscreen mode

Output:



Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 293 bytes | 293.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/Balogh08/GitFilterBranch.git
 + 2ca4e16...e61bbd2 secret_keys -> secret_keys (forced update)


Enter fullscreen mode Exit fullscreen mode

And abracadabra, the commits and the file which once contained the sensitive information are now removed.

Commits removed and pushed

Next, we need to replace the file without the sensitive information. For this, we can use the backup branch to copy the content of config.js and create a new config.js in the branch without the sensitive commits. Then, paste the content from the backup branch. If you did not publish the backup branch, then you can easily delete it. However, if you published it, you must go through the same process with git filter-branch on the backup branch, but without creating a second backup for the backup. Simply deleting the branch from the remote repository is not safe enough, as the keys to the commits containing sensitive information would remain somewhere in the source control system.

What are your thoughts on this solution? What alternatives do you prefer, like git filter-repo? Let me know in the comments.

More about the author at LUNitECH

The article was written with the help of existing documentation, ChatGPT, Grammarly, and other human beings.

Top comments (0)