Most developers know they shouldn't commit API keys. Most secret scanners will catch an AWS key sitting in your current codebase. What they won't catch is the key you deleted three commits ago -- which is still fully recoverable by anyone who clones your repo and runs git log -p.
That gap is what I built leakscan to address.
The Problem With Current-State-Only Scanners
When you delete a secret from a file and commit, the removal is recorded in git history. But the original commit that introduced the secret is still there. Every clone of your repository carries that history. Anyone -- a future contributor, a malicious actor, a job applicant reviewing your public code -- can recover those secrets.
# This recovers secrets you "deleted" months ago
git log -p | grep -A2 "AKIA\|sk-\|ghp_"
Most scanners only look at your working tree. leakscan traverses every commit.
What leakscan Does
leakscan is a Python CLI that scans for leaked secrets across:
- Local file trees (parallel, 8 threads)
- Full git history across any branch
- Public GitHub repos by URL
- All repos and gists for a GitHub user or org
It ships with 55+ regex patterns covering AWS, GitHub, GitLab, Stripe, OpenAI, Anthropic, Slack, Twilio, Discord, Telegram, npm, PyPI, and more. On top of regex, it runs Shannon entropy scoring on .env, YAML, and INI files to catch high-entropy values that don't match a known pattern.
Shannon Entropy: Catching the Unknowns
Not every leaked secret follows a known format. A randomly generated 32-character database password won't match any regex. Shannon entropy measures the randomness of a string -- secrets tend to have high entropy because they're generated to be unpredictable.
The entropy scorer in leakscan is scoped to value-bearing lines in config files, not general source code, to keep the false positive rate low. You can disable it with --no-entropy if you're scanning code that has intentionally high-entropy strings (e.g., compiled output).
Live Verification
Finding a secret is only half the picture. leakscan can verify whether a found secret is still active by making a live API call:
secrets scan . --verify
Currently supports: GitHub, GitLab, Stripe, OpenAI, Anthropic, HuggingFace, SendGrid, Slack, npm, Replicate.
A revoked or rotated secret shows as INACTIVE in the output. This matters in triage -- you want to know if you have an active exposure or just a historical artifact.
CI/CD Integration
The tool is built to run in pipelines without manual configuration.
# GitHub Actions
- name: Scan for secrets
run: secrets scan . --severity HIGH --no-entropy --format sarif --output results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
Exit code 1 on any CRITICAL or HIGH finding, so the build fails automatically. SARIF output integrates with the GitHub Security tab and GitLab SAST.
Baseline mode handles the "known findings" problem in CI:
# First run: save current state
secrets scan . --save-baseline .secrets.baseline
# Subsequent runs: only alert on NEW secrets
secrets scan . --baseline .secrets.baseline
This stops CI from constantly alerting on findings you've already triaged and accepted (test fixtures, example configs with placeholder values, etc.).
Pre-commit Hook
cd your-git-repo
secrets install-hook
The hook runs on every commit and uses the baseline automatically if present. Inline suppression is supported: add # nosec, # gitleaks:allow, or # secretscanner:allow to any line to skip it.
Output Formats
| Format | Use case |
|---|---|
| Terminal (default) | Interactive review with severity colors |
| JSON | Programmatic consumption, SIEM ingestion |
| CSV | Spreadsheet review, audit exports |
| SARIF 2.1.0 | GitHub Security tab, GitLab SAST |
| Markdown | Disclosure reports to security teams |
Architecture
The codebase is intentionally modular:
scanner/
cli.py entry point (click)
engine.py file walker, parallel scanner, git history
patterns.py 55+ regex patterns
entropy.py Shannon entropy scorer
verifier.py live API verification (10 services)
baseline.py save/load/compare baseline fingerprints
reporter.py terminal/JSON/CSV/SARIF/disclosure output
ignorefile.py .secretignore parser with ** glob support
github/
fetcher.py GitHub API client: repos, gists, commit history
Each module is independently testable. The full pytest suite is in /tests.
Installation
pip install leakscan
Where This Fits vs. Existing Tools
Tools like Gitleaks, Detect-Secrets, and TruffleHog are excellent. leakscan is a Python-native alternative with a focus on Git history scanning, live verification, and baseline-aware CI. If your team is already Python-heavy, a pip install is a lower-friction entry point than distributing a Go binary.
What's Next
- Expanded verifier coverage (Twilio, Mailchimp, Shopify)
- GitHub Actions marketplace action
- PyPI download metrics and badge
The repo is at github.com/Vasishta03/secret-scanner. Contributions, pattern additions, and feedback welcome.
Top comments (0)