You deleted the leaked API key. You committed the fix. You pushed. You moved on with your life, feeling like a responsible engineer who had handled a production incident with grace.
Then, six months later, you learn about git log --all -p.
That key you "deleted"? Still there. In commit a4f8c2d. With your name on it. Timestamped. Permanent. And if your repo is public, it has been indexed by search engines, scraped by bots, and quietly catalogued by anyone who clones it.
I've been that person. Twice. Once with an AWS key I thought I'd rotated and once with a Stripe test key I absolutely, definitely, one hundred percent removed from the codebase. Reader: I had not removed it from the codebase. I had only removed it from the working tree.
The Fundamental Misunderstanding About git rm
Here's the thing about git that nobody explains clearly in the tutorials: git never forgets. That's literally the point. Immutable history is a feature, not a bug — right up until the moment you realize your production database URL is woven into seventeen commits from 2022.
When you run git rm secrets.env && git commit -m "oops", you've added a new commit that removes the file from HEAD. Every previous commit still contains the file in full. git checkout a4f8c2d -- secrets.env brings it all back. So does cloning the repo. So does GitHub's commit browser. So does every automated secret scanner that crawls public repos.
Studies from academic researchers and security firms have found millions of active credentials sitting in GitHub repositories — the majority of them in commits where the file was "already deleted." One 2023 study found that tokens discovered in git history had an average age of over 300 days. They weren't abandoned. They were forgotten.
The Tool: secret-time-machine
I built secret-time-machine to answer one question: what secrets have ever touched this repo's history, and are any of them still live?
pip install secret-time-machine
# Scan the current repo
secret-time-machine
# Scan a specific repo path
secret-time-machine --repo /path/to/repo
# CI-friendly JSON output
secret-time-machine --json
# Show remediation steps
secret-time-machine --remediation
It requires no external API calls, no signing up for a service, no sending your code anywhere. Pure Python, git subprocess, and 37 regex patterns. It runs entirely offline.
What the Output Looks Like
secret-time-machine v0.1.0 — The Past is Not Private
Scanning repository: /Users/you/myproject
Enumerating commits... 847 found across all branches
Scanning commits [============================] 847/847
====================================================================
FINDINGS: 4 secrets detected across git history
====================================================================
[HIGH] AWS Access Key ID
Commit : a4f8c2d (2022-09-14) — Jane Doe <jane@example.com>
File : config/deploy.env
Value : AKIA••••••••••••WXYZ
Status : DELETED from current HEAD (still in history — still compromised)
[HIGH] OpenAI API Key
Commit : f91bb03 (2023-03-07) — Jane Doe <jane@example.com>
File : scripts/summarize.py
Value : sk-••••••••••••••••••••••••••••••••••••••pQrS
Status : PRESENT in current HEAD — rotate immediately
[HIGH] GitHub Personal Access Token
Commit : 2c77e1a (2023-11-22) — dependabot[bot]
File : .github/workflows/ci.yml
Value : ghp_••••••••••••••••••••••••••••••••••••
Status : DELETED from current HEAD (still in history — still compromised)
[MEDIUM] Hardcoded Password
Commit : 9d45f7b (2024-01-09) — Jane Doe <jane@example.com>
File : tests/fixtures.py
Value : password=••••••••
Status : DELETED from current HEAD (still in history — still compromised)
====================================================================
SUMMARY: 1 present in HEAD, 3 deleted-but-still-in-history
Exit code: 1
====================================================================
The "DELETED but still in history" label is the one that tends to wake people up.
How It Works Under the Hood
The implementation is deliberately simple. No exotic dependencies, no ML models, no cloud calls.
Enumerate all commits:
git log --all --format="%H"pulls every commit hash across every branch, including merged and orphaned branches.Extract diffs: For each commit,
git diff-tree --no-commit-id -U0 -p <hash>gets the full patch. Only lines starting with+(additions) are examined — this tells you when each secret was first introduced, not just that it existed.Pattern matching: 37 compiled regex patterns run against each added line. Patterns cover AWS (
AKIA[0-9A-Z]{16}), GitHub tokens (ghp_[A-Za-z0-9]{36}), OpenAI keys (sk-[A-Za-z0-9]{48}), Anthropic keys, Stripe, SendGrid, Twilio, JWT tokens, private keys, database URLs (PostgreSQL, MongoDB, Redis), and generic password/secret patterns.Deduplication: Findings are deduplicated by
(pattern_name, file_path, redacted_value)so the same secret introduced in one commit and carried through fifty more only shows up once.Current-HEAD check: For each finding, the tool checks whether the secret still exists in the current working tree. This is the crucial column —
PRESENTmeans rotate now,DELETEDmeans rotate now and rewrite history.
The Remediation Path
Finding secrets is the easy part. Getting rid of them is more involved.
For secrets that are only in history (not current HEAD), you have two main options:
git filter-repo (the modern approach):
pip install git-filter-repo
git filter-repo --path secrets.env --invert-paths
BFG Repo Cleaner (faster for large repos):
java -jar bfg.jar --delete-files secrets.env
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force-with-lease
Both require a force push and re-cloning by all collaborators. If the repo is public, you must also rotate the credentials — history rewriting does not help you if the key was already scraped before you rewrote it.
secret-time-machine --remediation prints these steps automatically for every finding.
What I Learned Building This
Most leaked secrets are "deleted" ones. In every repo I tested during development, the majority of findings were in commits where the file had already been removed. Developers fix the symptom without understanding the disease.
Git's
--allflag is doing heavy lifting. Scanning onlymainmisses feature branches, hotfix branches, and anything merged and deleted. A surprising number of secrets live exclusively on branches nobody thinks about anymore.Scanning additions-only is the right call. If you scan the full diff content, you get massive noise from context lines and deletions. Only
+lines tell you when a secret was born into the repo.Redaction is a UX decision, not just a security one. Showing full values in terminal output is useful for developers debugging their own repos but would be irresponsible in a shared CI log. The default shows enough characters to identify the credential without making the tool itself a leaking vector.
Exit code 1 on findings makes CI integration trivial. One line in a GitHub Actions workflow catches new secrets before they ever land on main.
Go Scan Your Repos
Seriously. Run this on your personal projects first, then your work repos (with permission). The number of developers who have never looked at their git history through a security lens is genuinely alarming.
pip install secret-time-machine
cd your-repo
secret-time-machine
If you get zero findings: great, you're either very careful or very lucky. If you get findings: now you know, and now you can fix it.
The repo is at https://github.com/LakshmiSravyaVedantham/secret-time-machine. Issues, PRs, and additional regex patterns for edge-case credential formats are all welcome. The past is not private, but at least now you can see exactly what's in it.
Top comments (0)