Countless developers are using Git for version control, but many don’t have enough knowledge about how Git works. Some people even use Git and Github as tools for backup files. This leads to information disclosure in Git repositories. Thousands of new API or cryptographic keys leak via GitHub projects every day.
I have been worked in the field of information security these three years. About two years ago, our company had a severe security issue triggered by the information leak a Git repository.
An employee accidentally leaked an AWS key to Github. The attacker used this key to download more sensitive data from our servers. We put a lot of time into fixing this issue, we tried to find out how much data leaked, analyzed the affected systems and related users, and replaced all the leaked keys in systems.
It is a sad story that any company and developer would not want to experience.
I won’t write more details about it. Instead, I hope more people know how to avoid it. Here are my suggestions for you to keep safe from Git leaks.
Most junior developers don’t have enough security awareness. Some companies will train new employees, but some companies don’t have systematic training.
As a developer, we need to know which kind of data may introduce security issues. Remember these categories of data can not be checked into Git repository:
- Any configuration data, including password, API keys, AWS keys, private keys, etc.
- Personally Identifiable Information (PII). According to GDPR, if a company leaked the users’ PII, the company need to notify users, relevant departments and there will be more legal troubles.
If you are working for a company, don’t share any source code or data related to the company without permission.
Attackers can easily find some code with company copyright on GitHub, which was accidentally leaked to Github by some employees.
My advice is, try to distinguish between company affairs and personal stuff strictly.
When we create a new project with Git, we must set a .gitignore properly. gitignore is a Git configuration file that lists the files or directories that will not be checked into the Git repository.
The project’s gitignore is a collection of useful .gitignore templates, with all kinds of programming language, framework, tool or environment.
We need to know the pattern matching rules of gitignore and add our own rules based on the templates.
No tools could find out all the sensitive data from a Git repository, but a couple of tools and practices can help.
git-secrets and talisman are similar tools, they are meant to be installed in local repositories as pre-commit hooks. Every change will be checked before committed, pre-commit hooks will reject the commit if they detect a prospective commit may contain sensitive information.
gitleaks provides another way to find unencrypted secrets and other unwanted data types in git repositories. We could integrate it into automation workflows such as CICD.
Code review is a best practice for team working. All the teammates will learn from each other’s source code. Junior developer’s code should be reviewed by developers with more experience.
Most unintended changes can be found out during the code review stage.
Enabling branch restrictions can enforce branch restrictions so that only certain users can push to a protected branch in repositories. Gitlab has a similar option.
Setting master to a restricted branch helps us to enforce the code review workflow.
With all the above tools and mechanisms, errors still could happen. If we fix it quickly and correctly, the leak may introduce no actual security issue.
If we find some sensitive data leaked in the Git repository, we can not just make another commit to clean up.
Figure 4: This is self-deception
What we need to do is remove all the sensitive data from the entire Git history.
*Remember to backup before any cleanup, and then remove the backup clone after we confirmed everything is OK*.
--mirror to clone a bare repository; this is a full copy of the Git database.
git clone --mirror git://example.com/need-clean-repo.git
We need git filter-branch to remove data from all branches and commit histories. Suppose we want to remove
./config/passwd from Git:
$ git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch ./config/password' \
--prune-empty --tag-name-filter cat -- --all
Remember to add the sensitive file to .gitignore:
$ echo "./config/password" >> .gitignore
$ git add .gitignore
$ git commit -m "Add password to .gitignore"
Then we push all branches to remote:
$ git push --force --all
$ git push --force --tags
Tell our collaborators to rebase:
$ git rebase
BFG is a faster and simpler alternative to *git filter-branch* for removing sensitive data. It’s usually 10 – 720x faster than *git filter-branch*. Except for deleting files, BFG could also be used to replace secrets in files.
BFG will leave the latest commit untouched. It’s designed to protect us from making mistakes. We should explicitly delete the file, commit the deletion, then clean up the history to remove it.
If the leaked Git repository is forked by others, we need to follow the DMCA Takedown Policy to ask Github to remove the forked repositories.
The whole procedure requires some time to finish, but it’s the only way to remove all the copies.
Don’t make the same mistakes that countless people have made. Try to put some effort to avoid safety accidents.
Use these tools and strategies will help much in avoiding Git leaks.