DEV Community

Sarwar
Sarwar

Posted on • Originally published at expirypulse.dev

The Credential Nobody Owned

Full disclosure: We built ExpiryPulse, so we obviously have a perspective on this problem. But this article isn't a pitch, it's a collection of real incidents, real data, and real takeaways that apply whether you use our tool, someone else's, or a well-maintained spreadsheet. We just think the problem is worth talking about honestly.


It always starts the same way

Someone gets paged at 2 AM. A site is down, an API is throwing 500s, or a payment flow just stopped working. The team scrambles. Thirty minutes of checking logs, restarting services, and ruling out deployment issues. Then someone finally asks: when does the cert expire?

The answer: yesterday.

It's a story that repeats across the industry — not because teams are careless, but because credentials are uniquely easy to lose track of. They're set up once, they work silently for months, and the only reminder they exist is the moment they stop working.

This isn't a small-team problem

What makes certificate and credential expiry so interesting as a failure mode is that it doesn't discriminate by company size, budget, or engineering talent. Some of the most well-resourced technology organizations in the world have been caught by it.

Google Bazel: December 2025. On Boxing Day, an expired SSL certificate took down bcr.bazel.build and releases.bazel.build, breaking builds for Bazel users globally. The outage lasted roughly 13 hours. The root cause? A DNS record for an unused subdomain was removed a month earlier, which silently prevented the certificate's auto-renewal from completing. No alerts were fired. Nobody noticed until users started filing GitHub issues.

IPinfo: September 2025. An expired TLS certificate caused a complete API outage for 2 hours and 36 minutes — at a service handling over 100 billion requests per month. Their cert-manager had been logging renewal errors, but no monitoring or alerting was configured for those specific logs. The team later described it as one of the most severe outages in their 12-year history.

Epic Games: 2021. An expired wildcard certificate brought down Fortnite, Rocket League, the Epic Games Store, and a host of backend services. The cert was deployed across hundreds of services. It took 12 minutes to identify the cause but nearly 5.5 hours to fully recover, because the initial outage triggered a cascade of secondary failures across their infrastructure.

Microsoft Teams: 2020. A Monday morning. Millions of users worldwide couldn't access Teams. The cause: Microsoft forgot to renew an authentication certificate. Three hours of downtime for one of the world's most widely used collaboration platforms.

The list goes on. Spotify, LinkedIn (twice), Ericsson (affecting millions of mobile users across Europe and Japan), and even the U.S. government — which at one point had 80 expired certificates rendering federal websites inaccessible or insecure.

These aren't obscure startups cutting corners. These are organizations with dedicated security teams, infrastructure budgets in the hundreds of millions, and established processes. And they still got bitten.

Why does this keep happening?

The pattern across nearly every incident is remarkably consistent. It's usually not a technology failure — it's a visibility failure.

Ownership is unclear. The person who set up the certificate moved to another team, left the company, or simply forgot. Nobody else knows the cert exists until it expires.

Alerts depend on systems that can fail. Google's Bazel team relied on automated renewal — but the automation failed silently when a precondition changed. IPinfo's cert-manager was logging errors that nobody was watching. Fortmatic's team never received AWS's expiry notification because their email group was misconfigured.

Credentials don't announce themselves. Unlike a failing server or a crashed process, a valid certificate generates zero signal. It works invisibly until it doesn't. There's no degradation curve, no warning in your APM dashboard, no slow build of errors. One second it's fine, the next it's expired.

Tracking methods drift. A spreadsheet works great when someone maintains it. A calendar reminder works great until someone dismisses it. Automation works great until the underlying assumption changes. Every tracking method requires ongoing attention, and that attention competes with a hundred other priorities.

I've been on both sides of this

I'm not writing about this problem from a distance. I work in federal IT, and I've been caught by the exact failure modes described above — twice.

The first was a Microsoft Graph API key powering an internal application. The app had worked fine for months. Then one day it just stopped — no gradual degradation, no warning, just a hard stop. When we dug in, the key had expired. The dev team assumed the Entra ID team was tracking the renewal. The Entra team assumed the dev team owned it. Neither side was wrong for thinking that — the ownership had simply never been made explicit. The key expired in a gap between two teams who each thought the other was watching. It's the most common version of this story: not negligence, just an honest assumption that went unchecked.

The second was a customer-facing website I managed that used Let's Encrypt for SSL. Auto-renewal was configured. It had worked before. So I didn't think about it — which is exactly what automation is supposed to let you do. Then one day my phone rang. The customer was calling to tell me their site was showing a browser security warning. The cert had expired. The auto-renewal had broken silently at some point, and because I had no monitoring on top of the automation, I didn't catch it. The customer finding out before I did — that's the part that stung.

These weren't catastrophic, company-ending events. They were the quiet, everyday version of the same problem — the kind that happens in IT departments everywhere and never makes a headline, but still erodes trust, wastes hours, and keeps you up at night wondering what else you might be missing.

Both incidents came down to the same root cause: there wasn't a single place where I could see what was expiring and when. The information existed somewhere, scattered across admin consoles and config files, but it wasn't visible in a way that could warn me before things broke.

These experiences and more from colleagues is why ExpiryPulse exists. Not because I think people can't manage credentials, but because I kept running into the same gap between "it's set up" and "someone's actually watching it."

What actually helps

We've thought about this problem a lot (it's why we built ExpiryPulse), but the principles apply regardless of the tool.

Centralize visibility. The single biggest improvement any team can make is knowing what they have. If your certificates, API keys, and credentials live across cloud consoles, password managers, Slack threads, and one person's mental model — you have a visibility problem. Whether you consolidate into a spreadsheet, a wiki page, or a purpose-built tool, having one place to look makes everything else easier.

Assign ownership explicitly. Every credential needs an owner and a backup. "The team knows" is how things get lost. When someone leaves or switches roles, credential ownership should be part of the handoff — right alongside access and documentation.

Set up layered alerts. A single reminder isn't enough. Things get snoozed, dismissed, or lost in inbox noise. Multiple alerts at different intervals (30 days, 14 days, 7 days, 1 day) with escalation to a second person dramatically reduces the chance of something slipping through.

Expect automation to fail. Auto-renewal is wonderful — until it isn't. Every team that got burned by a silent auto-renewal failure had assumed the automation was working. Monitor the monitors. If your cert-manager hasn't successfully renewed in a while, you should know about it.

Prepare for the 47-day world. SSL/TLS certificate lifespans are about to drop to just 47 days by 2029 — meaning roughly 8 renewals per year per cert instead of 1. Manual tracking that barely works now won't survive that cadence. We wrote a full breakdown of what's coming and how to prepare: 47-Day SSL Certificates Are Coming. Is Your Team Ready?

Where ExpiryPulse fits

We'd be disingenuous if we didn't mention our own tool, so here's the honest version.

ExpiryPulse is designed for small-to-mid IT teams — the ones managing 10 to a few hundred credentials who don't need (or can't justify) a $50K enterprise certificate lifecycle management platform, but who've outgrown the spreadsheet.

You add your credential information, never the actual credential. It can be SSL certs, API keys, tokens, licenses, professional certifications — anything with an expiry date, and ExpiryPulse handles the rest: a single dashboard for visibility, automated email alerts at multiple intervals, escalation when things get critical, and team features for shared accountability.

It's free to start with up to 5 credentials and it's straightforward to set up. If it fits your needs, great. If you'd rather use something else or keep your spreadsheet tight, that's fine too — the important thing is that you're tracking.

The takeaway

Expired credentials aren't a competence issue, nor are they an infrequent one. They're a systems issue. The smartest engineers at the largest companies in the world still get caught by them, because the failure mode is silence — and silence is the hardest thing to monitor.

The question isn't whether you're good enough to track your credentials manually. The question is whether you want to bet your uptime on never forgetting, never getting distracted, and never having an assumption break silently in the background.

If the last few years have taught us anything, it's that the answer — from Google to Spotify to the sysadmin managing 50 certs — is the same: build a system with backups, don't rely solely on memory or assumptions.


ExpiryPulse is a credential expiry tracking tool for individuals and IT teams. Free tier available at expirypulse.dev.

Related: 47-Day SSL Certificates Are Coming. Is Your Team Ready? — What the CA/Browser Forum's decision means for your team.

Top comments (0)