- 150,000 secrets — 25 separate vaults, inconsistent standards, some in plain text in version control
- 25 → 6 vault systems consolidated by a team of 10 engineers
- 20,000 automated secret rotations per month — up from rare manual rotation
- 90% fewer secrets found exposed in CI/CD pipelines
- 5,000 microservices, 5,000 databases, 400+ third-party integrations all secured
- Active pursuit of secretless authentication — replacing long-lived credentials with ephemeral tokens
150,000 secrets. 25 separate vaults. Hundreds of teams managing their own credentials in their own ways, some in plain text in version control. At Uber's scale — 5,000 microservices, 5,000 databases, 500,000 analytical jobs per day — secrets sprawl is not a compliance problem. It is an incident waiting to happen. A team of ten engineers decided to fix it.
The Story
Secrets sprawl is the entropy of infrastructure security. Left to its own devices, every team builds its own vault, stores credentials however is convenient, and shares secrets in whatever way is fastest. At a startup with ten engineers, this is manageable. At Uber, it becomes a systemic security risk. By the time the Secrets team began their consolidation project, the company had 150,000 secrets scattered across 25 separate vault systems — operated by different teams, with different security standards, different rotation practices, and inconsistent access controls. Some secrets were in plain text in codebases. Others lived in databases that had never been audited for credential exposure.
The Secrets team's strategy had two phases. Phase 1 was consolidation: take ownership of all vault infrastructure, standardise on a small number of canonical vault systems (one per cloud provider plus one on-premises HashiCorp Vault — an open-source secrets management tool providing a secure, centralised store for tokens, passwords, certificates, and encryption keys), and migrate all secrets from the 25 fragmented vaults into these six. Phase 2 was the platform: building a Secret Management Platform on top of the consolidated vaults — a metadata model, lifecycle automation, unified API, and real-time scanning — that turned six vaults into a governed, auditable, self-service system.
Problem
Secrets Sprawl: 150,000 Credentials, No Visibility
Uber's infrastructure had grown faster than its secrets governance. 25 vault systems operated by different teams meant no single team had visibility into the company's complete credential inventory. Shadow IT vaults with no central oversight created audit gaps. Secrets were shared insecurely, rotated rarely, and sometimes stored in version control. With cyberattacks targeting credential exposure rising industry-wide, the status quo was untenable.
Cause
Scale + Decentralisation = Governance Collapse
At Uber's scale, decentralised secrets management doesn't produce diversity and resilience — it produces inconsistency and risk. There was no way to answer basic security questions: who has access to which credentials? When were they last rotated? Are any credentials in source code? The scale that made the problem urgent also made it hard to fix without a dedicated team and platform.
Solution
Consolidation + Secret Management Platform
Phase 1 consolidated 25 vaults into 6 centrally managed vaults. Phase 2 built the Secret Management Platform: a metadata model (a structured representation of a secret's properties — owner, purpose, rotation schedule, associated services, expiry date, security classification — that enables automated governance and incident response), a unified API abstracting across all vault types, a Cadence-orchestrated Secret Lifecycle Manager, real-time scanning across git/Slack/CI pipelines, and self-service developer tooling.
Result
20,000 Automated Rotations Per Month, 90% Fewer Exposed Secrets
A team of 10 engineers now drives 20,000 automated monthly secret rotations — up from manual rotation that happened rarely. Secrets exposed in CI pipelines dropped by 90%. The platform generates a complete inventory of all 150,000 secrets on demand, enabling rapid response to security incidents. Uber is actively pursuing secretless authentication — replacing long-lived credentials with ephemeral, automatically-issued tokens wherever possible.
The Fix
The Secret Lifecycle Manager
The Secret Lifecycle Manager (SLM) is the operational core of the Secret Management Platform. Built on Cadence (Uber's open-source distributed workflow engine, designed for long-running, fault-tolerant business processes — the same engine that powers Uber's ride dispatch and payment workflows), SLM orchestrates the complete lifecycle of every secret: creation, distribution to consuming services, periodic rotation, and eventual decommissioning. Cadence's durable workflow model means secret rotation operations are fault-tolerant — if a rotation workflow fails midway through, it resumes from where it left off rather than leaving credentials in a half-rotated, inconsistent state.
- 25 → 6 — vault systems consolidated from 25 team-operated vaults with inconsistent standards to 6 centrally managed vaults
- 20,000 — automated secret rotations per month, up from rare manual rotation requiring coordination between teams
- 90% — reduction in secrets found exposed in CI/CD pipelines, achieved through real-time scanning with automatic revocation on detection
- 10 — engineers on the Secrets team that built and now operates the entire platform serving 5,000 microservices
# Simplified Secret Lifecycle Manager rotation workflow (conceptual)
# Real implementation uses Cadence's durable workflow primitives
# Key property: each step is recorded; workflow resumes from last success on failure
from cadence.workflow import workflow_method
class SecretRotationWorkflow:
@workflow_method
async def rotate_secret(self, secret_id: str):
"""
Cadence ensures this completes even if individual steps fail.
20,000 rotations/month run without manual oversight because
the workflow is resumable, not re-runnable from scratch.
"""
# Step 1: Generate new credential from upstream provider
new_credential = await self.generate_new_credential(secret_id)
# Step 2: Write new credential to canonical vault
# Old version remains readable during transition
await self.write_to_vault(
secret_id=secret_id,
value=new_credential,
version='new'
)
# Step 3: Signal consuming services to reload credential
consuming_services = await self.get_consumers(secret_id)
for service in consuming_services:
await self.signal_reload(service, secret_id)
# Step 4: Verify all services are using new credential
await self.verify_rotation_complete(secret_id, consuming_services)
# Step 5: Expire old credential in upstream provider
await self.revoke_old_credential(secret_id)
# Step 6: Update metadata — last_rotated, next_rotation_due
await self.update_metadata(secret_id, rotated_at=now())
# Cadence schedules next rotation based on policy
Real-time scanning: catching secrets before they ship
One of the most impactful platform features was real-time scanning across Uber's code repositories, CI pipelines, and internal Slack messages. The scanner looks for patterns matching API keys, database passwords, and private key formats. When detected, it automatically revokes the exposed credential and alerts the owning team. Before the platform, a credential committed to git might live there for months. Now, exposure is measured in seconds. The 90% reduction in secrets found in pipelines reflects this detection-and-revocation automation. The root cause of secrets appearing in version control was not developer carelessness — it was the absence of a convenient alternative. When the secure path (self-service API, Kubernetes injection) is easier than the shortcut (put it in the config file), the shortcut disappears.
Secretless authentication: the next frontier
The logical endpoint of Uber's secrets management journey is secretless authentication — where services don't hold long-lived credentials at all. Instead, they use their identity (a Kubernetes service account, a SPIFFE/SPIRE identity, a cloud provider IAM role) to dynamically request short-lived tokens at runtime. When a token expires in 1 hour, there is nothing to steal, nothing to rotate, nothing to audit. Uber is actively building toward this model using the SPIFFE/SPIRE framework — an open standard for issuing cryptographic workload identities that are automatically issued, cryptographically verifiable, and short-lived. As more of Uber's infrastructure adopts SPIFFE-based authentication, the number of long-lived secrets that need to be managed shrinks toward zero.
Migration coordination: the organisational challenge
Migrating a secret from an old vault path to the new centralised platform sounds like a database copy. In practice it is a distributed coordination problem across hundreds of teams. Every service reading the secret needs to be updated simultaneously (or with a dual-read transition period). Uber built tooling to discover all consumers of a vault path, generate migration checklists, and track completion status. Services that hadn't migrated within the target window were flagged for the owning team. The tooling made a migration problem tractable across an organisation of thousands of engineers.
Architecture
The Secret Management Platform sits as an orchestration layer above Uber's six canonical vault systems. Applications and services no longer talk directly to specific vaults — they interact with the platform's unified API or use Kubernetes-native injection (secrets automatically mounted into pods at deployment time as environment variables or mounted files, with no awareness of which vault they came from). The platform maintains the metadata catalog, handles lifecycle automation via the Secret Lifecycle Manager, runs real-time scanning, and provides developer self-service tools.
Before: 25 Fragmented Vaults, No Governance
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
After: Unified Secret Management Platform
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Lessons
Consolidate ownership before building automation. Uber's two-phase approach — consolidate 25 vaults into 6, then build the platform — was the right sequence. Building a governance platform on top of 25 independent vaults would have required integrating 25 different systems. Consolidation first is harder organisationally but dramatically simpler technically.
A metadata model (a structured record of each secret's properties — owner, purpose, associated services, rotation policy, security classification) is the prerequisite for all other automation. Without metadata, you cannot automate rotation (you don't know the rotation policy), you cannot generate inventory (you don't know what secrets are for), and you cannot respond to incidents (you don't know which services are affected). Build the metadata model before building any automation on top of it.
Real-time scanning with automatic revocation changes the economics of credential exposure. When exposure is detected in seconds and the credential is automatically revoked, a developer accidentally committing a credential to git causes a 30-second incident rather than a multi-month exposure. The scanning + revocation loop is the highest-leverage security improvement for teams still relying on manual credential hygiene.
Use durable workflow systems for secret rotation, not scripts or cron jobs. Rotation is a multi-step process with real failure modes at each step. A workflow system providing exactly-once execution and automatic resume on failure makes rotation reliable enough to run at scale without manual oversight. A cron job that fails halfway through a rotation leaves credentials in an unknown state.
Self-service developer tooling is what makes centralised governance scale. A centralised Secrets team without self-service tooling becomes a bottleneck — every credential operation requires a ticket. A centralised Secrets team with self-service tooling becomes a platform team — they build and maintain the guardrails, and developers operate within them autonomously. The goal is governance at scale, not control at the cost of velocity.
Engineering Glossary
Cadence — Uber's open-source distributed workflow engine, designed for long-running, fault-tolerant business processes. Provides durable execution semantics: each step in a workflow is recorded, and if the workflow fails partway through, it resumes from the last successful step rather than restarting from scratch. Used by the Secret Lifecycle Manager to make 20,000 monthly rotations reliable without manual oversight.
HashiCorp Vault — an open-source secrets management tool providing a secure, centralised store for tokens, passwords, certificates, and encryption keys. One of the six canonical vault systems in Uber's consolidated infrastructure, used for on-premises workloads.
Metadata model — a structured representation of a secret's properties: owner, purpose, rotation schedule, associated services, expiry date, security classification. The architectural cornerstone of Uber's Secret Management Platform — enables automated governance, inventory queries, and incident response that would otherwise require manual coordination across hundreds of teams.
Secret Lifecycle Manager (SLM) — Uber's Cadence-based automation system that orchestrates the complete lifecycle of secrets: creation, distribution to consuming services, periodic rotation, and decommissioning. Drives 20,000 automated rotations per month.
Secretless authentication — a security model where services don't hold long-lived credentials at all, instead using their cryptographic workload identity (via SPIFFE/SPIRE) to dynamically request short-lived tokens at runtime. Tokens expire in hours or minutes; there is nothing to steal, rotate, or audit.
Shadow IT vault — a secret storage system created by an individual team outside the knowledge of the central Secrets team. No security baseline review, no rotation policy, no access audit, no inventory. The most dangerous aspect of Uber's pre-platform state — you cannot rotate credentials you don't know about; you cannot audit access to vaults you don't know exist.
SPIFFE/SPIRE — an open standard for issuing cryptographic workload identities. Every service is assigned a unique SPIFFE identity that is automatically issued, cryptographically verifiable, and short-lived. The foundation of Uber's secretless authentication initiative.
Unified abstraction layer — a single API and SDK that presents a consistent interface for secret CRUD operations regardless of which underlying vault (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, HashiCorp Vault) stores the secret. Decouples application code from vault topology so that vault migrations are transparent to consuming services.
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack →
(Interactive diagrams, source links, and the full reader experience)
TechLogStack — built at scale, broken in public, rebuilt by engineers.
Top comments (0)