Alan West

Posted on Apr 4

Why SSH Key Management Is Broken and How Certificates Fix It

#linux #security #ssh #devops

Native OpenSSH support since 2010

If you manage more than a handful of servers, you already know the pain. Every new developer who joins the team needs their public key added to authorized_keys on every server they need access to. Someone leaves? Good luck remembering all the places their key was authorized. And let's not talk about the first time you SSH into a new server and have to decide whether to trust that host fingerprint you definitely didn't verify.

SSH certificates solve all of this, and they've been baked into OpenSSH since version 5.4 (2010). Yet almost nobody uses them. Let's fix that.

The Root Problem: Trust Doesn't Scale with authorized_keys

The traditional SSH model works fine for one person with three servers. But it falls apart fast:

Onboarding is manual. Every new team member's public key needs to land on every relevant server. That's N users × M servers worth of configuration.
Offboarding is terrifying. Did you remove that key from every server? Are you sure? What about that one bastion host nobody remembers setting up in 2021?
Host verification is a joke. Be honest — when was the last time you actually verified an SSH host fingerprint? You type yes and move on like the rest of us.
No expiration. SSH keys live forever by default. That contractor who helped out for two weeks three years ago? Their key might still be valid.

This isn't a hypothetical problem. It's a daily operational headache.

How SSH Certificates Work (The 30-Second Version)

Instead of distributing individual public keys everywhere, you create a Certificate Authority (CA) — really just another SSH key pair. Then:

The CA signs user and host public keys, producing certificates.
Servers trust the CA, not individual user keys.
Clients trust the CA, not individual host fingerprints.

That's it. One trust anchor instead of a web of individual keys. If you've used TLS certificates, this mental model maps directly.

Step-by-Step: Setting Up SSH Certificates

Step 1: Create the CA Key Pair

You need two CAs in practice — one for signing user certificates and one for signing host certificates. Keeping them separate limits blast radius if one is compromised.

# Generate the User CA key (keep this extremely safe)
ssh-keygen -t ed25519 -f user_ca -C "user-ca@yourorg"

# Generate the Host CA key
ssh-keygen -t ed25519 -f host_ca -C "host-ca@yourorg"

These private keys are the crown jewels. Treat them like you'd treat a root TLS CA key — offline storage, restricted access, the works.

Step 2: Sign Host Keys (No More "Trust This Fingerprint?")

Grab each server's existing host public key and sign it:

# Sign the server's host key with your Host CA
# -s: signing key, -I: certificate identity (for logging)
# -h: this is a HOST certificate (not user)
# -n: principals (hostnames this cert is valid for)
# -V: validity period
ssh-keygen -s host_ca \
  -I "webserver-prod-01" \
  -h \
  -n "webserver-prod-01.example.com,10.0.1.50" \
  -V +52w \
  /etc/ssh/ssh_host_ed25519_key.pub

This produces /etc/ssh/ssh_host_ed25519_key-cert.pub. Copy it back to the server.

Then tell sshd to present the certificate. Add to /etc/ssh/sshd_config:

# Present our signed host certificate to connecting clients
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

Restart sshd and that server now proves its identity cryptographically.

Step 3: Tell Clients to Trust the Host CA

On each client machine (or in a shared config), add the Host CA's public key to known_hosts:

# Add to ~/.ssh/known_hosts or /etc/ssh/ssh_known_hosts
# The @cert-authority directive tells SSH to trust any host
# certificate signed by this CA for matching hosts
@cert-authority *.example.com ssh-ed25519 AAAA...your-host-ca-public-key...

Now when you SSH to any .example.com host, the client verifies the host certificate against the CA. No more blindly typing yes. No more manually managing known_hosts files.

Step 4: Sign User Keys (The Big Win)

This is where it gets really good. Instead of copying user public keys to every server:

# Sign a developer's public key with the User CA
# -n: principals (usernames they can log in as)
# -V: valid for 8 hours — short-lived certs are the sweet spot
ssh-keygen -s user_ca \
  -I "alice@yourorg" \
  -n "alice,deploy" \
  -V +8h \
  ~/.ssh/id_ed25519.pub

This creates id_ed25519-cert.pub alongside their existing key. SSH uses it automatically.

Step 5: Tell Servers to Trust the User CA

Add to /etc/ssh/sshd_config on every server:

# Trust any user certificate signed by our User CA
TrustedUserCAKeys /etc/ssh/user_ca.pub

Restart sshd. That's it. Alice can now SSH into any server without her public key existing in any authorized_keys file. The server validates her certificate against the CA.

Offboarding Is Now Trivial

When someone leaves the team, you just stop signing new certificates for them. If you're using short-lived certificates (and you should be — 8-24 hours is a good range), their access expires automatically.

For immediate revocation, OpenSSH supports a revocation list:

# Create or update the revoked keys file
ssh-keygen -k -f /etc/ssh/revoked_keys -s user_ca id_ed25519.pub

Then in sshd_config:

RevokedKeys /etc/ssh/revoked_keys

Is it as seamless as short-lived certs expiring on their own? No. But it's still one operation versus hunting through dozens of authorized_keys files.

The Gotchas I Ran Into

After rolling this out across a few environments, here's what bit me:

Principals matter. If you sign a user cert with -n alice but the user tries to log in as deploy, it fails. Think carefully about which principals each person needs.
Clock skew kills you. Certificates have validity windows. If your server's clock is off by a few minutes and the cert's valid after timestamp is in the server's future, authentication fails silently. Use NTP. Seriously.
Don't lose the CA key. If you lose the CA private key, you need to redistribute trust from scratch. If someone else gets it, they can sign certificates for anyone. Protect it accordingly.
AuthorizedPrincipalsFile is your friend. Without it, any user certificate signed by the CA can log in as any principal. You almost certainly want to restrict this per-server.

# In sshd_config, restrict which principals can log into this server
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u

Then create /etc/ssh/auth_principals/deploy containing the principals allowed to log in as the deploy user.

Inspecting Certificates

When things go wrong (and they will during setup), you can inspect any certificate:

ssh-keygen -L -f id_ed25519-cert.pub

This dumps the certificate details — principals, validity window, CA fingerprint, and any extensions. It's the first thing to check when authentication fails unexpectedly.

Prevention: Making This Sustainable

SSH certificates are only better than authorized_keys if you actually automate the signing workflow. A few practical tips:

Automate cert signing. Don't make developers email you their public keys. Build or use a signing service. Projects like step-ca from Smallstep or HashiCorp Vault's SSH secrets engine handle this well.
Keep cert lifetimes short. 8-24 hours for user certs. This dramatically reduces the risk window if a key is compromised.
Use separate CAs for users and hosts. I mentioned this earlier but it bears repeating.
Automate host cert renewal. Host certs with a 52-week lifetime should get renewed automatically, not when someone notices SSH connections failing.
Log the certificate IDs. The -I identity string shows up in auth logs. Use meaningful identifiers so you can trace access.

Is It Worth the Effort?

For a solo developer with two servers? Probably not. Stick with authorized_keys.

For a team of any size managing more than a few machines? Absolutely. The upfront cost is maybe an afternoon of setup. The ongoing payoff is dramatically simpler access management, proper host verification, and offboarding that doesn't keep you up at night.

SSH certificates have been available for over fifteen years. The tooling is mature, the documentation is solid, and the operational benefits are real. The only mystery is why more teams aren't using them yet.

Top comments (11)

KamalMostafa • Apr 6

what is your versions for ssh-keygen and sshd ?

Alan West • Apr 6

openssh 9.9 on both. the certificate stuff works from 8.x onwards though so version shouldnt matter much unless youre on something really old.

KamalMostafa • Apr 6

FYI, I had an error and I'm running OpenSSH_9.6p1 Ubuntu-3ubuntu13.15, OpenSSL 3.0.13.

Alan West • Apr 6

9.6 definitely supports certificates so its not a version issue. ubuntu splits sshd config across /etc/ssh/sshd_config.d/ drop in files and sometimes those override what you put in sshd_config. check if theres anything in there overriding TrustedUserCAKeys. also whats the actual error you get? would help narrow it down.

KamalMostafa • Apr 7

My issue was even before running the SSH daemon. it was in Step 2: Sign Host Keys. its now fixed. I suggest you use static file system paths for all files this will hep replicate them easily different versions behaves in odd ways and they have different configuration paths. hope this helps.

José David Ureña Torres • Apr 4

Great article!

Mykola Kondratiuk • Apr 11

the authorized_keys sprawl is the real problem. certs solve it cleanly but nobody migrates until something actually breaks.

Alan West • Apr 12

Exactly. The migration incentive problem is real -- certs require upfront investment in a CA workflow, and authorized_keys "works" until it doesn't. I've seen teams finally switch only after an incident where a departed employee's key was still active six months later.

Mykola Kondratiuk • Apr 12

that scenario is basically free CA adoption, just very expensive. the post-incident key audit is what usually sells it - nothing convinces a security team faster than "we have no idea who still has access."

Alan West • Apr 13

Ha, yeah. "Free CA adoption, just very expensive" is a perfect way to put it. The irony is that the audit itself becomes the business case --once someone has to explain to leadership why a former contractor still had prod access, the CA budget materializes overnight.

Mykola Kondratiuk • Apr 13

exactly - nothing like a "wait, is john still in github?" moment to unlock a pki budget. the audit writes itself.

View full discussion (11 comments)