DEV Community

Cover image for How to Remove Sensitive Data from Your Git History (For Real This Time)
Alan West
Alan West

Posted on

How to Remove Sensitive Data from Your Git History (For Real This Time)

You deleted the file. You committed the deletion. You pushed. You're safe now, right?

Nope. That API key, that .env file, that internal config with your database credentials — it's all still there, sitting comfortably in your git history, waiting for anyone with git log and five minutes of curiosity.

I learned this the hard way about four years ago when a colleague pinged me to let me know our staging database password was visible in a public repo. I'd removed the file three months earlier. Didn't matter. Git remembers everything.

Why Deleting a File Doesn't Actually Delete It

Git is a content-addressable filesystem. Every commit is a snapshot of your entire project at that point in time. When you git rm secrets.env and commit, you're creating a new snapshot without that file — but every previous snapshot still has it.

Anyone can see it:

# Find all commits that touched a specific file, even deleted ones
git log --all --full-history -- path/to/secrets.env

# Show the contents of that file at a specific commit
git show a1b2c3d:path/to/secrets.env
Enter fullscreen mode Exit fullscreen mode

This is by design. Git's whole purpose is to never lose data. That's great for source code. It's terrible for secrets.

The Wrong Fix: git revert

I see people try this constantly. They run git revert <commit> thinking it undoes the damage. It doesn't. A revert creates a new commit that reverses the changes — the original commit with your secrets is still right there in the history.

Same goes for git commit --amend on an already-pushed commit. The old commit object still exists in the reflog and potentially on the remote.

The Right Fix: git filter-repo

The old advice was to use git filter-branch, but it's painfully slow and easy to mess up. The git project itself now recommends git-filter-repo instead.

Here's how to actually purge a file from your entire history:

# Install git-filter-repo (requires Python 3.5+)
pip install git-filter-repo

# Clone a fresh copy — filter-repo requires a fresh clone
git clone --mirror https://github.com/you/your-repo.git
cd your-repo.git

# Remove a specific file from all history
git filter-repo --path secrets.env --invert-paths

# Remove a directory from all history
git filter-repo --path config/internal/ --invert-paths
Enter fullscreen mode Exit fullscreen mode

The --invert-paths flag means "keep everything EXCEPT this path." Without it, you'd keep only the specified path and delete everything else. Ask me how I know.

If you need to scrub a specific string (like an API key that was hardcoded inline rather than in a separate file), you can use blob callbacks:

# Replace a specific string across all history
git filter-repo --replace-text <(echo 'sk-abc123def456==>REDACTED')
Enter fullscreen mode Exit fullscreen mode

After filtering, force-push to your remote:

git push origin --force --all
git push origin --force --tags
Enter fullscreen mode Exit fullscreen mode

Important: Tell Your Team

Force-pushing rewrites history. Every collaborator needs to re-clone or carefully rebase their local branches. If they push their old local copy, all your purged data comes right back. Send a message before you force-push. Coordinate the timing.

What About BFG Repo Cleaner?

BFG Repo Cleaner is another solid option, especially if you're more comfortable with Java tooling. It's faster than the old filter-branch approach and has a simpler interface for common operations:

# Remove files by name from all history
java -jar bfg.jar --delete-files secrets.env your-repo.git

# Replace specific text patterns
java -jar bfg.jar --replace-text passwords.txt your-repo.git

# Then clean up and push
cd your-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push origin --force --all
Enter fullscreen mode Exit fullscreen mode

BFG intentionally doesn't modify your latest commit, only history. This is a safety feature — it assumes your current HEAD is already clean.

Preventing This in the First Place

Rewriting history is painful. Here's how to avoid needing to do it.

1. Use a .gitignore That Actually Works

Don't just add .env — be thorough:

# Environment and secrets
.env
.env.*
*.pem
*.key
*.p12

# Cloud provider configs
.aws/credentials
.gcp-credentials.json

# IDE and OS junk that sometimes contains paths/tokens
.idea/
.vscode/settings.json
.DS_Store
Enter fullscreen mode Exit fullscreen mode

2. Set Up Pre-commit Hooks

pre-commit with secret-detection plugins will catch most accidental commits before they happen:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
Enter fullscreen mode Exit fullscreen mode

Run pre-commit install once, and it'll scan every commit for things that look like secrets — high-entropy strings, known API key patterns, private keys.

3. Use Environment Variables or a Secret Manager

This sounds obvious, but I still review PRs where someone hardcoded a connection string. Use environment variables in development and a proper secret manager (Vault, cloud-native options from your provider, or even pass for personal projects) in production.

The rule is simple: if a value would cause damage in the wrong hands, it doesn't go in version control. Ever.

After the Purge: Rotate Everything

This is the step people skip, and it's arguably the most important one. Rewriting git history removes the secret from your repository. It does NOT remove it from:

  • GitHub's cached copies and event logs
  • Forks of your repo
  • Anyone's local clone
  • Search engine caches
  • Services like the Wayback Machine

If a secret was ever pushed to a public repo, even briefly, assume it's compromised. Rotate the credential immediately. Regenerate the API key. Change the password. Update the certificate.

GitHub's docs are blunt about this: they explicitly say that force-pushing does not remove data from cached views or cloned copies. If you pushed a token to a public repo, treat it as burned.

Quick Reference

  • Already pushed to public repo? Rotate the secret NOW, then clean history
  • Only in local history? Clean history with git filter-repo, then push
  • Want to prevent it? Pre-commit hooks + .gitignore + never hardcode secrets
  • Team repo? Coordinate the force-push, everyone re-clones

Git's perfect memory is a feature, not a bug. But it means the real lesson here isn't about git commands — it's about treating secrets as a fundamentally different category of data that never belongs in a repository in the first place. Build that habit, set up the guardrails, and you won't need to learn git filter-repo the hard way like I did.

Top comments (0)