DEV Community

Cover image for I Built a Pre-Commit Secret Scanner Because GitHub's Is Too Late
siyadhkc
siyadhkc

Posted on

I Built a Pre-Commit Secret Scanner Because GitHub's Is Too Late

Let me start with a confession.

I have accidentally committed a .env file. Not to a private internal repo with no consequences. To a public GitHub repo. With a real Stripe test key in it.

I caught it within minutes, deleted the file, rotated the key, pushed a new commit. Classic developer scramble. Nothing happened — at least as far as I know.

But that "as far as I know" lived in my head for a while.

That experience is what eventually led me to build env-guard — a Python CLI tool that scans your project for secrets and installs as a git pre-commit hook, blocking the commit entirely if it finds something suspicious. The secret never leaves your machine. No scramble. No rotation. No anxiety.


The problem with GitHub Secret Scanning (it's a timing problem)

GitHub has a feature called Secret Scanning. It's legitimately good — it covers dozens of credential formats across major providers and will notify you when it finds something. I'm not here to trash it.

But it has one fundamental flaw that no amount of good engineering can fix:

It runs after you push.

Think about what that means. By the time GitHub scans your code and sends you an alert, your secret has already:

  1. Left your machine
  2. Traveled over the internet
  3. Landed on GitHub's servers
  4. Been indexed, if the repo is public

Automated bots scrape public GitHub repos constantly. We're talking seconds after a push, not minutes. GitHub will alert you, but the key is already out there. You're now rotating credentials, auditing API usage logs, and hoping whoever grabbed it hasn't done anything with it yet.

Without env-guard:
  code → commit → push → GitHub scans → alert → key already exposed ❌

With env-guard:
  code → commit blocked → fix locally → push clean code ✅
Enter fullscreen mode Exit fullscreen mode

Think of GitHub Secret Scanning as your last line of defense. env-guard is your first.

The secret never enters git history at all. There's nothing to rotate, nothing to audit, and nothing to lose sleep over.


Who this is actually for

If you work with any of the following on a regular basis, env-guard is for you:

  • API keys — OpenAI, Stripe, Twilio, SendGrid, any SaaS you integrate with
  • Cloud credentials — AWS access keys, GCP service accounts, anything that costs money if misused
  • Database connection strings — postgres://, mongodb://, redis:// — all of these contain passwords
  • Private keys — RSA, EC, OpenSSH — the kind that give access to your servers

Basically: if you work on real software that talks to real services, you have secrets somewhere in your local environment. And humans make mistakes. Even careful ones.

I've seen this happen to senior engineers. It's not a skill issue. It's a missing safety net issue.


What it detects

env-guard ships with 54 detection rules covering every major category of credential you'd encounter in a typical backend or fullstack project:

Category What it catches
AWS Access Key ID, Secret Access Key, Session Token
Google API Key, OAuth Client Secret, Service Account JSON
GitHub Personal Access Token, OAuth Token, App Token
Stripe Live and Test Secret Keys, Publishable Keys
OpenAI / Anthropic API Keys
Slack Bot Token, User Token, Webhook URL
Twilio Account SID, Auth Token
SendGrid API Key
Databases PostgreSQL, MySQL, MongoDB, Redis connection strings
Private Keys RSA, EC, PGP, OpenSSH private key blocks
Django SECRET_KEY
Generic Password assignments, token assignments, API key assignments
And more Razorpay, NPM, PyPI, Heroku, Netlify, Cloudinary, and others

Each rule carries a severity level — HIGH, MEDIUM, or LOW — so you're not staring at an undifferentiated wall of warnings. A live Stripe secret key is HIGH. A generic-looking token assignment might be MEDIUM. The output is human-readable and actionable.

$ env-guard scan .

env-guard scan report
Path    : .
Scanned : 42 files
Found   : 1 potential secret(s)

  ✖ HIGH  Stripe Live Secret Key
  File : config/settings.py:12
  Code : STRIPE_KEY = "sk_live_abc123..."

──────────────────────────────────────────────────
  1 HIGH
──────────────────────────────────────────────────

Scan failed — secrets detected. Do not commit.
Enter fullscreen mode Exit fullscreen mode

File. Line number. What was found. Severity. No ambiguity, no guesswork — you know exactly where to go and what to fix.


Getting started in 60 seconds

Install it:

pip install env-guard
Enter fullscreen mode Exit fullscreen mode

Requires Python 3.8+. Run a scan on your current project right now:

env-guard scan .
Enter fullscreen mode Exit fullscreen mode

Go ahead. Run it on a real project you're working on. You might be surprised what's sitting there.


The pre-commit hook — the part that actually changes your habits

Ad-hoc scanning is useful for audits, but the real shift happens when env-guard becomes invisible. When you don't have to think about running it. When it just works in the background every single time you commit.

That's what install-hook does:

env-guard install-hook
Enter fullscreen mode Exit fullscreen mode

Run that once inside your repo. From that point forward, every git commit triggers an automatic scan before anything gets written to git history:

env-guard: scanning for secrets...
env-guard: commit blocked. Remove secrets before committing.
env-guard: to skip this check (NOT recommended): git commit --no-verify
Enter fullscreen mode Exit fullscreen mode

If nothing is found, the commit proceeds normally — you won't even notice it ran. If something is found, the commit is hard-blocked with a clear message telling you exactly where the problem is.

The --no-verify escape hatch exists because I didn't want to be heavy-handed. Sometimes you're committing a test fixture that contains a fake-looking credential. The escape hatch is there — but it requires a conscious decision, and that friction is intentional.

To remove the hook if you ever need to:

env-guard uninstall-hook
Enter fullscreen mode Exit fullscreen mode

Handling false positives

Every scanner has them. A SHA-256 hash that structurally resembles an API key. A documentation example with a placeholder that matches a pattern. A test fixture with a fake credential for unit tests.

env-guard handles this with a .envguardignore file in your project root — same concept as .gitignore, one pattern per line:

# Ignore specific files
tests/fixtures/sample.env

# Ignore by extension
*.log

# Ignore entire directories
docs/
Enter fullscreen mode Exit fullscreen mode

Because it's a committed file, your whole team gets the same exclusions automatically. No per-developer configuration needed.


CLI flags for different workflows

A few flags that cover the common situations:

# Only surface the critical findings
env-guard scan . --severity HIGH

# Machine-readable output for scripting
env-guard scan . --format json

# Scan without failing the process (reporting mode)
env-guard scan . --no-fail
Enter fullscreen mode Exit fullscreen mode

The --no-fail flag is worth calling out specifically. If you're adding env-guard to an existing large codebase, you probably don't want to immediately block everything and deal with 50 findings on day one. Run with --no-fail first, triage what's real vs. false positive, set up your .envguardignore, then remove the flag when you're confident. Gradual adoption is a completely valid path.


For teams: CI/CD as a second checkpoint

Pre-commit hooks protect your own machine. But what about:

  • External contributors who clone the repo and don't run install-hook
  • New team members who haven't set up their environment yet
  • Automated processes that generate config files with credentials

For that, env-guard works in CI pipelines too. Here's a GitHub Actions workflow you can drop into any repo:

name: Secret Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - run: pip install env-guard
      - run: env-guard scan .
Enter fullscreen mode Exit fullscreen mode

This gives you the same local protection applied to every PR, from every contributor, on every push. env-guard locally + this workflow in CI = defense in depth that actually makes sense.


env-guard vs GitHub Secret Scanning — use both

env-guard GitHub Secret Scanning
When it runs Before commit, on your machine After push, on GitHub's servers
Blocks the secret Yes — commit is blocked No — secret is already pushed
Works offline Yes No
Custom ignore rules Yes, .envguardignore Limited
Free Yes Yes (public repos)
Requires GitHub No Yes

These aren't competing tools. They protect different parts of the pipeline. Run env-guard locally to stop secrets before they ever leave your machine. Keep GitHub Secret Scanning on as a fallback for anything that slips through — external contributors, edge cases, human error. Use both.


The thing I keep thinking about

There's a concept in security called "shift left" — the idea that you should catch problems as early in the development process as possible, because the earlier you catch something, the cheaper it is to fix and the less damage it causes.

A leaked credential caught before the commit? Zero cost. Rotate nothing. Tell no one. Move on.

A leaked credential caught after a push to a public repo? Rotate the key, audit usage logs, send an incident report if it's a team environment, and spend a few days with that low-grade anxiety of "did anything actually get accessed?"

env-guard is a shift-left tool. The entire design philosophy is: don't give the secret a chance to travel. Keep it local. Catch it at the source.


Try it

pip install env-guard
env-guard scan .
Enter fullscreen mode Exit fullscreen mode

If you want the hook:

env-guard install-hook
Enter fullscreen mode Exit fullscreen mode

Source on GitHub: github.com/siyadhkc/env-guard

If it catches something on the first scan — good. That's the point. And if you hit a false positive that's annoying, open an issue. I want the pattern library to be genuinely useful, not just technically comprehensive.


What's coming next

A few things on the roadmap:

  • Pre-push hook option — a second checkpoint right before the push, for teams who want both
  • More detection rules — 54 is solid but the ecosystem keeps growing
  • Project-level config file — custom severity thresholds, rule toggles, team-specific settings beyond just ignores

If this is useful to you, star the repo and watch for updates. And if you've ever had a bad day because of an exposed credential — you're not alone, and this is for you.


Built with Python. MIT licensed. Contributions welcome.

Top comments (1)

Collapse
 
truong_bui_eaec3f963bbe21 profile image
Truong Bui

The timing insight here is the real point — GitHub Secret Scanning is a last line, not a first line, and by the time it fires the damage window is already open. Pre-commit is the right place to catch this.

There's an adjacent timing problem that doesn't get talked about enough: third-party packages. A developer can run env-guard perfectly and never commit a secret themselves, but still end up running a package that has one baked in. We scanned 508 public MCP servers (the JSON-based tool packages that AI agents connect to) at MCPSafe (mcpsafe.io) and hardcoded credentials were the single most common finding — 22% of servers had them. API keys, OAuth tokens, database URLs sitting in server source code that gets installed on the user's machine and run with whatever permissions the agent has.

Pre-commit hooks protect what you write. Pre-install scanning protects what you run. Both gaps exist, both are worth closing, and the "shift left" framing applies to the supply chain too — catch the bad package before the agent runs it, not after it has exfiltrated something.