It was a Tuesday morning in June 2021. LinkedIn — a platform used daily by hundreds of millions of professionals — went partially down. Not because of a DDoS attack, a bad deploy, or a database failure. Their SSL certificate had expired.
The issue was resolved within hours, but the damage was done: broken links, frustrated users, and a very public reminder that one of the most preventable failures in infrastructure still catches well-resourced engineering teams off guard. LinkedIn was not alone. Microsoft Teams suffered a similar SSL expiry incident in 2020. Spotify has had certificate-related hiccups. Even government sites regularly show up in breach reports because of expired certs.
If it can happen to them, it can happen to you.
What SSL Certificates Actually Are (and Why They Expire)
An SSL/TLS certificate is a cryptographically signed document that proves your server is who it says it is. It binds your domain name to a public key, and a trusted Certificate Authority (CA) vouches for that binding.
There are three main validation levels:
- DV (Domain Validation) — Cheapest and fastest. CA only verifies you control the domain. Used by most personal sites and small services. 90-day Let's Encrypt certs fall here.
- OV (Organization Validation) — CA verifies the organization's legal existence. Common for company sites.
- EV (Extended Validation) — Strictest vetting. Used by banks and payment platforms.
Historically, SSL certificates were issued for 1–2 year terms. In 2020, Apple, Google, and Mozilla enforced a hard cap of 398 days for certificates trusted in their browsers. Then Let's Encrypt popularized 90-day certificates, arguing shorter lifespans reduce the damage window if a certificate is compromised.
The result: certificates expire faster than ever, and the margin for error is shrinking.
Why Manual Tracking Fails
When a team has two or three certificates, the spreadsheet approach works fine. Someone adds a row, sets a calendar reminder, done.
Then the company grows. Suddenly you have:
- A wildcard cert for
*.yourdomain.com - A separate cert for
api.yourdomain.commanaged by a different team - A staging cert someone set up and forgot about
- A cert for a third-party integration endpoint you technically own
- Let's Encrypt auto-renew that "should be working" but nobody has verified in six months
The spreadsheet becomes stale. Calendar reminders get snoozed. The person who set up the cert leaves the company. Auto-renewal fails silently because the DNS challenge no longer resolves correctly after a migration.
This is not a people problem. It is a systems problem. Manual tracking does not scale.
The Alert Timeline That Actually Works
After dealing with enough SSL-related incidents, the SRE community has largely converged on a tiered alerting strategy:
| Days Until Expiry | Alert Type | Who Gets Notified |
|---|---|---|
| 60 days | Awareness ping | Primary engineer / infra team |
| 30 days | Action required | Team lead + primary engineer |
| 14 days | Escalation | Manager + entire team |
| 7 days | All-hands | Engineering leadership |
| 1 day | Emergency | PagerDuty / on-call rotation |
The 60-day notification is intentionally low-urgency. It gives the responsible party time to renew without pressure. By the time you hit 7 days, something has already gone wrong in your process — the earlier alerts were missed or ignored. The 1-day alert should be treated like a production incident.
The key insight: alert early enough that the first notification is never urgent. If your team is routinely panicking at 7 days or fewer, your alert window is too short.
Checking SSL Expiry: Code Examples
Using openssl CLI
# Check expiry date for a domain
echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null \
| openssl x509 -noout -dates
# Output:
# notBefore=Jan 1 00:00:00 2025 GMT
# notAfter=Mar 31 23:59:59 2025 GMT
To get the number of days remaining:
EXPIRY=$(echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null \
| openssl x509 -noout -enddate | cut -d= -f2)
EXPIRY_EPOCH=$(date -d "$EXPIRY" +%s 2>/dev/null || date -jf "%b %d %T %Y %Z" "$EXPIRY" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( (EXPIRY_EPOCH - NOW_EPOCH) / 86400 ))
echo "Days until expiry: $DAYS_LEFT"
Using Node.js
const tls = require('tls');
function checkSSLExpiry(hostname, port = 443) {
return new Promise((resolve, reject) => {
const socket = tls.connect({ host: hostname, port, servername: hostname }, () => {
const cert = socket.getPeerCertificate();
socket.end();
const expiryDate = new Date(cert.valid_to);
const now = new Date();
const daysRemaining = Math.floor((expiryDate - now) / (1000 * 60 * 60 * 24));
resolve({ hostname, expiryDate, daysRemaining });
});
socket.on('error', reject);
});
}
checkSSLExpiry('yourdomain.com').then(info => {
console.log(`${info.hostname}: ${info.daysRemaining} days remaining`);
if (info.daysRemaining <= 7) {
console.error('CRITICAL: Certificate expires in less than 7 days!');
} else if (info.daysRemaining <= 30) {
console.warn('WARNING: Certificate expires soon.');
}
});
Using Python
import ssl
import socket
from datetime import datetime, timezone
def check_ssl_expiry(hostname: str, port: int = 443) -> dict:
context = ssl.create_default_context()
with socket.create_connection((hostname, port), timeout=10) as sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
cert = ssock.getpeercert()
expiry_str = cert['notAfter']
expiry_date = datetime.strptime(expiry_str, '%b %d %H:%M:%S %Y %Z').replace(tzinfo=timezone.utc)
days_remaining = (expiry_date - datetime.now(tz=timezone.utc)).days
return {'hostname': hostname, 'days_remaining': days_remaining}
result = check_ssl_expiry('yourdomain.com')
print(f"{result['hostname']}: {result['days_remaining']} days remaining")
Automating Checks with a Cron Job
A simple cron-based approach for teams managing a small number of domains:
#!/bin/bash
# /usr/local/bin/check-ssl-certs.sh
DOMAINS=("yourdomain.com" "api.yourdomain.com" "dashboard.yourdomain.com")
ALERT_EMAIL="infra-team@yourcompany.com"
WARN_DAYS=30
for DOMAIN in "${DOMAINS[@]}"; do
EXPIRY=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "${DOMAIN}" 2>/dev/null \
| openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -z "$EXPIRY" ]; then
echo "ERROR: Could not retrieve cert for ${DOMAIN}" \
| mail -s "SSL Check Failed: ${DOMAIN}" "$ALERT_EMAIL"
continue
fi
DAYS_LEFT=$(( ($(date -d "$EXPIRY" +%s) - $(date +%s)) / 86400 ))
if [ "$DAYS_LEFT" -le "$WARN_DAYS" ]; then
echo "SSL cert for ${DOMAIN} expires in ${DAYS_LEFT} days (${EXPIRY})" \
| mail -s "SSL Warning: ${DOMAIN} expires in ${DAYS_LEFT} days" "$ALERT_EMAIL"
fi
done
Add to crontab to run daily at 8 AM:
0 8 * * * /usr/local/bin/check-ssl-certs.sh
This gets you to a functional baseline. The limitation: it only works when your cron runner is healthy, and it has no concept of alert escalation or historical tracking.
External Monitoring as a Safety Net
Self-hosted cron jobs are a good first layer. They are not sufficient on their own. The machine running your cron job could be the same machine whose cert expires. Or the job runs but silently fails because your SMTP relay is down.
External monitoring services check your SSL certificates from outside your infrastructure, on a schedule, and alert you through independent channels (email, Slack, PagerDuty, SMS). This separation is the point — if your infrastructure has a problem, you still get notified.
AlertSleep is one example: it monitors SSL certificates continuously, tracks expiry dates across all your domains, and fires alerts at configurable thresholds — without requiring you to manage any infrastructure for the monitoring itself. For teams that want visibility without operational overhead, this kind of external check is a meaningful complement to internal automation.
Managing SSL at Scale: 50+ Certificates
When you cross the threshold of managing 50 or more certificates, new problems emerge.
Build a certificate inventory. Know which cert covers which domain, when it was issued, when it expires, who owns renewal, and whether it auto-renews. A simple internal wiki page is better than nothing. A proper certificate management tool is better still.
Wildcard certificates need special attention. A *.yourdomain.com wildcard might cover dozens of subdomains. If it expires, all of them break simultaneously. The blast radius of a wildcard expiry is much larger than a single-domain cert.
Treat auto-renewal as a process, not a guarantee. Let's Encrypt auto-renewal via certbot or ACME clients is reliable under normal conditions. It fails when DNS records change, when ports 80/443 are firewalled during the renewal window, or when the renewal configuration drifts after infrastructure changes. Verify that auto-renewal is actually succeeding, not just scheduled.
Use centralized alerting. Sending expiry alerts directly to individual engineers does not work at scale. Route all SSL alerts to a shared channel (Slack #infra-alerts) and a ticketing system. Coverage should not depend on any single person being available.
Closing Thoughts
SSL certificate expiration is a solved problem. The tools exist, the alert timelines are well-established, and the failure modes are well-documented. What makes it persistent as an incident cause is the gap between knowing what to do and actually having it in place.
The LinkedIn outage in 2021 was not a failure of knowledge. It was a failure of process. Somewhere in the chain, a certificate slipped through without the right person getting the right alert at the right time.
The fix is not complicated: external monitoring as your safety net, tiered alerts with enough lead time to act calmly, and an inventory that does not live in one person's head.
The goal is to make certificate expiry the most boring part of your infrastructure. An alert fires at 60 days, someone renews, done. No incident, no postmortem, no Tuesday morning scramble.
Setting up SSL monitoring for the first time? AlertSleep's SSL monitoring handles the external check layer and alert routing out of the box — worth a look before you build your own.
Top comments (0)