The Day I Found a Security Hole in Our Vault Setup
The "Oh Shit" Moment
I was writing a Python script to inventory service accounts across our 50+ Vault namespaces when something caught my eye. Teams were creating auth mounts with weird names - stuff we never approved.
Turns out, our wildcard policies had a massive flaw.
What We Screwed Up
Our policy looked innocent enough:
path "auth/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
We thought: "Let teams manage auth in their namespace. What could go wrong?"
Everything. Everything could go wrong.
That wildcard meant teams could create any auth mount type, not just the standard AppRole we supported. So they did:
Custom AppRole mounts: auth/my-special-approle/
Random Kubernetes auth (we don't even use K8s auth)
LDAP configs that bypassed our central auth
Experimental mounts nobody remembered creating
Out of 50+ namespaces, 15% had rogue auth mounts we didn't know existed.
Why This Actually Mattered
Monitoring blindspot: Our Splunk dashboards looked for auth/approle/. These custom mounts were invisible.
Support hell: Teams configured Vault Agents wrong, got auth failures, opened tickets. We couldn't help because their setup didn't match our docs.
Future nightmare: Try migrating 50 namespaces when everyone's doing their own thing.
How I Found It
Simple inventory script:
for namespace in all_namepaces:
auth_mounts = vault_client.sys.list_auth_methods()
for mount in auth_mounts:
if mount not in ['approle/', 'token/']:
print(f"WTF is this: {namespace}/{mount}")
The output was... concerning.
Checked Splunk to see if anyone was actually using these:
index=vault_audit request.path="auth/*/login"
| stats count by request.namespace request.path
40% had zero logins in 90 days. Dead mounts from old experiments.
The Fix
Step 1: Stop the bleeding - locked down policies immediately:
`Old (bad)
path "auth/*" { capabilities = ["create", "read", "update", "delete"] }
New (specific)
path "auth/approle/*" { capabilities = ["create", "read", "update", "delete"] }`
Step 2: Reached out to teams, made migration plans
Step 3: Still migrating production stuff months later (it takes time)
What I Learned
Wildcards are dangerous. Be explicit. Always.
Your monitoring only catches what you're looking for. Inventory everything, not just what you expect.
Standards aren't real until you enforce them. Documentation doesn't count if the system allows chaos.
Fixing production takes forever. We're still cleaning this up.
The Bigger Issue
This also exposed that our parent/child namespace model was overly complex. We eventually flattened everything - but that's Part 2.
If You Run Vault
Check your policies right now:
vault policy read your-policy | grep "*"
Every wildcard is a potential problem. Can you be more specific?
Then actually inventory what exists in your Vault. I bet you'll find surprises.
Next up: Why we ditched nested namespaces and went flat. Plus the monitoring system I built to catch this stuff automatically.
Drop a comment if you've hit similar issues. I know I'm not the only one.
Top comments (0)