YUICHI KANEKO

Posted on Apr 24 • Edited on May 19

KeyGate: A Fast Pre-Commit Guardrail Against Secret Leaks

#git #security #showdev #tooling

Accidentally committing an API key, password, or private key is still one of the easiest ways to create a serious security incident.

The risk gets worse as development speeds up: larger diffs, faster iteration, and more code drafted by AI coding agents before a human reviews every line.

That is why I built keygate: a fast local pre-commit guardrail that scans only staged added lines and blocks likely secrets before they enter Git history.

pipx install keygate
keygate activate

That's it. keygate now runs automatically before every git commit.

GitHub: https://github.com/kanekoyuichi/keygate
PyPI: https://pypi.org/project/keygate/
License: MIT

Why I built it

Accidentally writing an API key directly into code during development happens to everyone. The real problem is that once you git commit it, the value becomes part of Git history permanently.

Even if you git rm it or force-push, the old SHA can still be used to retrieve it
Once pushed to GitHub, bots can scrape it within seconds
An AWS key can lead to a massive bill; an OpenAI key can drain your usage quota almost instantly

I needed a tool to stop this at the moment of commit. Existing tools like Gitleaks and TruffleHog are excellent, but they focus on full repository scanning and CI workflows. I wanted something optimized specifically for the local pre-commit experience.

More importantly, as we move into a world where AI agents write code, the need for an automatic check right before a commit only increases.

The AI agent angle

AI coding agents like Claude Code or Codex can generate large diffs quickly. The safest assumption is not that the agent is malicious, but that speed increases the chance of unnoticed sensitive values reaching a commit.

Specifically, AI agents tend to create situations like:

Generating code that references .env or config examples and including their values
Expanding sample values from READMEs or test fixtures as-is
Inferring and completing api_key or password-looking values from surrounding context
Producing large diffs in a single pass, before a human has a chance to review every line

A local guardrail becomes more valuable in that workflow, not less. That is why keygate is designed so that whether the code was written by a human or an AI, it applies the same check right before the commit.

Rather than a tool that only works for developers who carefully read the README, keygate provides JSON output and an agent-specific execution mode so that agents themselves can read the scan results and suggest fixes.

What keygate detects

keygate combines multiple signals instead of relying on a single regex:

Rule-based detection (known formats)

AWS access keys (AKIA* / ASIA* / AROA*)
OpenAI API keys (sk-*)
GitHub tokens (ghp_*, fine-grained PATs)
Slack tokens (xoxb-* / xoxp-*)
Stripe keys (sk_live_* / rk_live_* / pk_live_*)
SendGrid keys (SG.*.*)
JWTs and PEM private keys (RSA / OpenSSH)
URL credentials (postgres://user:pass@host, etc.)

Values like pk_live_* (which are meant to be public) or already-masked URL credentials like postgres://user:***@host are treated as WARN rather than immediately BLOCK. The goal is to catch dangerous things without blocking every documentation-friendly string.

Entropy detection

Strings longer than 20 characters with Shannon entropy above 4.0–4.5

Context scoring

Variable names like api_key, password, secret_token are tiered into HIGH and MID
Paths like .env, config.yaml, settings.py are tiered similarly
Assignment syntax (NAME = "..." / export NAME=...)

How scoring works

Instead of a binary match, keygate aggregates independent signals into a final score:

Signal	Points
Regex rule match	+50 to +100
High entropy	+20
Keyword (HIGH): `secret`, `password`, `api_key`, etc.	+25
Keyword (MID): `token`, `credential`, `auth`	+15
Assignment syntax `NAME = "..."`	+15
Very sensitive path (`.env`, etc.)	+20
Sensitive path (`settings/`, `config/`, etc.)	+15
Test file	-10
`example`, `dummy`, etc.	-20

There is also a combo bonus: even when no regex rule matches, if multiple signals fire together, an additional bonus applies:

keyword(HIGH/MID) + entropy → +15
keyword(HIGH) + entropy + assignment syntax → additional +15

This means an unknown secret format can still reach BLOCK if it has a suspicious variable name, random-looking characters, and assignment syntax.

When a known regex rule does match, the combo bonus is not stacked on top — the rule's own weight is used instead. This keeps the score explainable and avoids inflating it unnecessarily.

The final verdict:

block at 70+
warn at 40–69
ignored below 40

Example output

When a likely secret is found, the commit is stopped:

[BLOCK] High confidence secret detected

File: config.py:12
Rule: aws-access-key
Score: 100

Reason:
AWS Access Key detected; sensitive context detected

Remediation:
  - Remove the key from the code
  - Rotate the AWS credentials immediately
  - Use environment variables or AWS IAM roles instead

To ignore:
  Add comment: # keygate: ignore reason="..."

Each finding includes:

File — the file and line number
Rule — which detection rule fired
Score — severity (70+ blocks, 40–69 warns)
Remediation — concrete steps to fix it

At the top of the output, a machine-readable summary line is also emitted:

[KEYGATE] status=block findings=1

This makes it easy for scripts or agents to parse the outcome without needing JSON mode.

Detection accuracy (internal evaluation)

Measured against a labeled corpus of 100 samples (50 known secrets + 50 benign strings):

Metric	Result
Recall (real secrets detected)	100.0%
Precision (detected items that were real secrets)	80.6%
F1	89.3%
True Positives	50
False Negatives (missed secrets)	0
False Positives (benign strings flagged)	12
True Negatives	38

The primary goal was to get False Negatives to zero. Missing a real secret is far more dangerous than an occasional extra prompt.

The 12 false positives included: masked URL credentials, placeholders, Stripe publishable keys, and empty API_KEY= assignments. These are not real secrets, but they look enough like secrets that surfacing them before commit is intentional — they can be suppressed individually with inline ignores, allowlists, or a baseline.

Built for developers and coding agents

keygate provides JSON output alongside human-readable CLI output:

keygate scan --format json
keygate scan --json
keygate scan --profile agent

--format json outputs only JSON to stdout
--json is an alias for the above
--profile agent is a fixed mode for AI agents that always returns JSON

The JSON schema is stable: schema_version, status, summary, findings[]. Each finding includes rule_id, policy, score, verdict, file, line, message, and a masked snippet when available.

This is not JSON bolted on as an afterthought. It is designed from the start so that an agent can re-run the scan, parse the output mechanically, and propose fixes — closing the loop after a commit is blocked.

keygate also has a Claude Code plugin, so Claude can scan staged changes for secrets automatically before commits.

Handling false positives without breaking flow

A secret scanner is only useful if developers can live with it every day.

keygate includes three escape hatches for expected findings:

1. Inline ignore (per line)

api_key = "dummy-key-for-testing"  # keygate: ignore reason="test data"

reason is required — so the intent is always documented in the code.

2. Allowlist (project-wide)

In keygate.toml:

[allowlist]
paths = ["vendor/*", "third_party/*"]
patterns = ["dummy", "example"]

Note: adding tests/* to the allowlist wholesale is not recommended — it would suppress real secrets that accidentally end up in test files.

3. Baseline (freeze existing findings)

keygate baseline create

This saves the current findings to .keygate.baseline.json as SHA-256 fingerprints. From that point, the same finding at the same location is suppressed. The raw secret value is never stored, so the baseline file is safe to commit.

{
  "version": 1,
  "entries": [
    {
      "fingerprint": "e5282a7860678bc768d280eb3e77d2ca8a44286357c743dd024d74fe0605fe09",
      "file_path": "src/app/config.py",
      "line_number": 42,
      "rule_id": "url-credentials",
      "created_at": "2026-04-22T09:30:00+00:00"
    }
  ]
}

To add new findings to an existing baseline: keygate baseline update.

If the baseline is committed to the repository, a new team member who runs pipx install keygate && keygate activate will automatically pick up the same baseline.

How it is different from Gitleaks or TruffleHog

keygate is not a replacement for full repository, history, CI, or cloud secret scanning.

It is intentionally narrower: a lightweight local guardrail for the moment right before a commit is created.

Tool	Best for
keygate	Fast local pre-commit checks on staged changes
Gitleaks	Full repository, history, CI, and configurable rule scanning
TruffleHog	Deep secret discovery and verification workflows

Use keygate when you want a small commit-time check that developers will actually keep enabled.

What keygate intentionally does not do

These were explicit non-goals during design:

Full repository scanning (not the job of a pre-commit hook)
LLM-based judgment (offline, fast, and deterministic behavior takes priority)
External API validation (no checking whether a token is actually valid)
IDE plugins, SaaS integrations, or automatic secret rotation

The primary constraint is completing within 200–500ms locally, every single commit. No LLM calls or external API lookups. For server-side protection, keygate is meant to complement — not replace — pre-receive hooks and CI-level scanning.

Disclaimer

keygate is a last-line-of-defense net for human error, not a substitute for proper secret management.

It does not guarantee complete detection (unknown formats and obfuscated values may pass through)
False positives are not zero (managed via allowlist / baseline / inline ignore)
git commit --no-verify bypasses it trivially (for organizational enforcement, combine with server-side controls)
The correct practice is to keep secrets out of the repository entirely, using environment variables, secret managers, or KMS

Quick start

pipx install keygate
cd your-project
keygate activate

From that point on, every normal git commit gets a fast local secret check automatically.

You can also scan manually:

git add .
keygate scan

keygate scans git diff --cached — staged changes only.

DEV Community