Gitleaks: Open-Source Secret Scanning for Git Repos in 2026

#webdev #devops #cloud #astro

Hardcoded secrets in Git are a category of mistake that never gets less embarrassing. An AWS access key pushed to a public repo gets scraped by bots within minutes and burned spinning up crypto miners on your account — this is documented behavior, not theoretical risk. The fix is automated scanning, and Gitleaks is the open-source tool most teams reach for when they don't want to pay a commercial scanner's per-developer rate.

We pulled Gitleaks into several sample repos to see how it actually behaves: where it shines, where it produces noise, and how the CLI flow compares to commercial alternatives like GitGuardian. This is the writeup.

What Gitleaks Actually Catches

Gitleaks ships with a default ruleset of well over 100 regex patterns covering the usual suspects: AWS access keys, GitHub personal access tokens, Slack webhooks, Stripe live keys, Google API keys, private SSH keys, and JWT-shaped strings. The patterns are written in TOML and live in the repo at config/gitleaks.toml. You can read the full ruleset in about 20 minutes if you want to know what's actually being matched.

The detection logic has two parts: a regex match plus an entropy check. A string that matches the AWS access key pattern but has low entropy — like AKIAIOSFODNN7EXAMPLE from AWS documentation — gets flagged but can be filtered out via allowlist. High-entropy strings that match a pattern are real findings. You can also write your own rules: the TOML format lets you specify a regex, a description, an entropy threshold, and optional allowlists per rule.

What it does not catch: secrets that don't match any known pattern. A custom API key your internal service issues — say, a 32-character hex string with no distinguishing prefix — will slide past unless you add a rule for it. That limitation applies to every regex-based scanner, including the commercial ones, though some paid tools layer ML-based detection on top for unknown patterns.

Three Ways to Run Gitleaks

The CLI has two main commands, and they cover different scopes.

gitleaks detect scans your entire git history. Run this on a freshly inherited repo when you want to know whether anyone ever committed an AWS key. It walks every commit on every branch and reports findings with the commit SHA, file path, line number, and a redacted preview of the matched string. On a medium repo with tens of thousands of commits it finishes in a couple of minutes.

gitleaks protect scans only uncommitted changes. This is the pre-commit hook variant: run it with --staged and it checks what git diff --cached is about to commit. It runs fast enough — well under a second on a typical diff — that wiring it into .husky/pre-commit or the pre-commit framework is painless.

The third deployment is CI. The official gitleaks/gitleaks-action wraps the binary and reports findings as PR annotations. The action is free for public repos; private repos require a license at a per-developer rate. If you don't want the licensing dependency, you can run the static Go binary directly in any CI runner — that path stays MIT-licensed regardless of repo visibility.

Running gitleaks detect on a repo that has already had secrets committed will return findings even after you've rotated the keys. The history still contains them. To actually purge a leaked secret from git history you need git filter-repo or BFG Repo-Cleaner — and then force-push, with all the coordination that implies. Rotation alone is not enough if the repo is public.

Gitleaks vs GitGuardian: When to Pay

The honest comparison: Gitleaks covers roughly 80% of what GitGuardian does for the secret-detection use case, and the remaining 20% is mostly enterprise plumbing.

What Gitleaks gives you for free: solid default rules, full history scanning, pre-commit integration, CI integration, SARIF output for GitHub code scanning, and full control over detection rules. The community keeps the default ruleset reasonably current — new patterns get added when major providers introduce new token formats.

What GitGuardian adds on top: a centralized dashboard across all your repos, automatic key revocation workflows with select cloud providers, ML-based generic secret detection that catches unknown patterns, an incident triage UI, and SOC 2 / compliance reporting bundled with audit-friendly logs. Pricing scales with seat count; for a small team the bill is in the low tens to low hundreds of dollars per month depending on tier, and it grows roughly linearly with headcount.

The decision usually breaks on team size and incident frequency. If you have fewer than 20 developers and your secret leaks are rare, Gitleaks plus a documented rotation runbook is enough. If you're past 50 developers across many repos and incidents happen monthly, the dashboard and triage features start paying for themselves in coordination time saved.

One trap worth naming: don't run Gitleaks once, find nothing, and call it done. Run it on every PR via CI, and have a pre-commit hook so developers catch their own mistakes before the secret ever lands on a branch. A scanner that runs only after the fact is doing about a third of the job.