Mahafuzur Rahaman

Posted on May 29

SSH Key Management at Scale: Generating, Rotating, and Revoking Keys Across Teams

#automation #devops #infrastructure #security

Most teams treat SSH keys like passwords from 2010 — created once, never rotated, and scattered everywhere. Here's how to fix that.

You onboard a new engineer. They generate an SSH key, paste the public key into five servers, and get to work. Six months later they leave the company. You remember to remove their key from two of the five servers. Maybe three.

This is how breaches happen. Not through sophisticated attacks — through forgotten keys on forgotten servers, quietly waiting.

SSH key management sounds boring until it isn't. This article covers everything you need to do it properly: key generation best practices, how to organize keys across teams, rotation strategies that won't break production, and clean revocation when someone leaves.

Why SSH Key Management Breaks Down

SSH keys feel low-maintenance because they mostly work silently. That silence is the problem.

Unlike passwords, keys don't expire by default. Unlike OAuth tokens, there's no central dashboard showing you who has access to what. Unlike certificates, there's no built-in revocation mechanism.

The result is what security teams call key sprawl: hundreds of authorized_keys entries across dozens of servers, with no inventory, no ownership records, and no expiry dates. Surveys consistently find that large organizations have more SSH keys than employees — often by an order of magnitude.

Key sprawl creates three risks:

Orphaned access — keys belonging to former employees, contractors, or decommissioned systems still granting entry
Unknown exposure — no one knows which keys can reach which servers
Audit failure — you can't prove compliance if you can't show who had access to what, when

The fix isn't a new tool. It's a discipline — applied consistently.

Part 1: Generating Keys the Right Way

Choose the Right Algorithm

Not all SSH key types are equal in 2024. Here's where things stand:

Algorithm	Key Size	Recommendation
`ed25519`	256-bit (fixed)	✅ Use this. Fast, secure, compact.
`ecdsa`	256/384/521-bit	⚠️ Fine, but ed25519 is better
`rsa`	2048–4096-bit	⚠️ Legacy systems only. Use 4096-bit minimum.
`dsa`	1024-bit	❌ Never. Broken and disabled in modern OpenSSH.

For anything modern, ed25519 is the answer.

ssh-keygen -t ed25519 -a 100 -C "alice@example.com" -f ~/.ssh/id_ed25519

Flags explained:

-t ed25519 — algorithm
-a 100 — number of KDF rounds for the passphrase (higher = slower to brute-force)
-C "alice@example.com" — comment; use email or a descriptive label
-f ~/.ssh/id_ed25519 — output file path

For legacy systems that only accept RSA:

ssh-keygen -t rsa -b 4096 -a 100 -C "alice@example.com" -f ~/.ssh/id_rsa_legacy

Always Use a Passphrase

A passphrase encrypts the private key on disk. Without it, anyone who copies your key file has full access to everything that key unlocks. With a passphrase, they also need to know the secret to decrypt it.

The common objection: "but then I have to type it every time." The answer: ssh-agent.

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519

ssh-agent holds your decrypted key in memory for the session. You type the passphrase once; the agent handles the rest. On macOS, keychain integration means you only type it once per login — or per reboot.

One Key Per Context, Not One Key for Everything

A single key that unlocks every server is a single point of failure. Instead, scope keys to contexts:

~/.ssh/
├── id_ed25519_personal       # Personal projects
├── id_ed25519_work           # Work infrastructure
├── id_ed25519_client_acme    # Client: ACME Corp
├── id_ed25519_deploy         # CI/CD deploy key (no passphrase, scoped permissions)
└── id_ed25519_prod           # Production servers (extra strong passphrase)

Wire these up in ~/.ssh/config so the right key is used automatically:

Host *.acme.internal
    IdentityFile ~/.ssh/id_ed25519_client_acme
    IdentitiesOnly yes

Host bastion.prod.example.com
    IdentityFile ~/.ssh/id_ed25519_prod
    IdentitiesOnly yes

IdentitiesOnly yes prevents SSH from trying other keys in your agent — important when servers have MaxAuthTries set low.

Part 2: Organizing Keys Across a Team

The Baseline: A Git-Managed Key Registry

For small to medium teams (under ~50 engineers), a Git repository containing public keys and server manifests is a practical starting point.

ssh-keys/
├── users/
│   ├── alice.pub
│   ├── bob.pub
│   └── carol.pub
├── servers/
│   ├── web-prod.txt       # Lists which users have access
│   ├── db-prod.txt
│   └── bastion.txt
└── deploy-keys/
    ├── github-actions.pub
    └── jenkins.pub

Rules:

Public keys only — never commit private keys
Every key has an owner and a date in a comment: ssh-ed25519 AAAA... alice@example.com 2024-01
PRs required to add or remove keys — creates an audit trail
A simple script syncs authorized_keys on servers from the registry

This isn't enterprise-grade, but it's infinitely better than ad-hoc key distribution with no inventory.

Structuring `authorized_keys` With Restrictions

authorized_keys supports per-key restrictions that limit what a key can do, even after it's been granted access. Use them.

# Full access
ssh-ed25519 AAAA... alice@example.com

# Read-only deploy key — can only run one specific command
command="/usr/local/bin/deploy.sh",no-pty,no-agent-forwarding,no-x11-forwarding ssh-ed25519 AAAA... deploy-key

# Tunnel-only key — can only forward one specific port
restrict,port-forwarding,permitopen="db.internal:5432" ssh-ed25519 AAAA... tunnel-key

# IP-restricted key
from="203.0.113.0/24" ssh-ed25519 AAAA... office-access-key

These restrictions are enforced server-side, regardless of what the client attempts.

Tools for Larger Teams

Once you're managing keys across dozens of servers and dozens of engineers, manual management doesn't scale. Consider:

HashiCorp Vault SSH Secrets Engine
Vault can act as an SSH Certificate Authority, issuing signed, short-lived certificates instead of static keys. Engineers authenticate to Vault, receive a certificate valid for (say) 8 hours, and use it to access servers. No long-lived keys. No key sprawl. Full audit log. This is the gold standard for larger teams.

Teleport
Open-source access plane for SSH, Kubernetes, and databases. Handles key/certificate lifecycle, session recording, and access policies in one tool.

AWS EC2 Instance Connect / GCP OS Login
Cloud-native solutions that push temporary public keys to instances for the duration of a connection. No persistent authorized_keys at all.

Smallstep
Open-source certificate authority with SSH support. Easier to self-host than Vault if certificates are the only goal.

Part 3: Rotation — The Step Most Teams Skip

Key rotation means replacing existing keys with new ones on a scheduled basis. It limits the exposure window if a key is compromised without your knowledge.

When to Rotate

Scheduled: Annually at minimum, quarterly for sensitive systems
Triggered: After a security incident, after a team member's access level changes, after a laptop is lost or stolen, after a suspected compromise
On offboarding: Always — see Part 4

How to Rotate Without Breaking Things

Rotation fails when it's done carelessly. The safe approach is additive first, then remove.

Step 1: Generate the new key

ssh-keygen -t ed25519 -a 100 -C "alice@example.com-2024-rotation" -f ~/.ssh/id_ed25519_new

Step 2: Add the new key alongside the old one

cat ~/.ssh/id_ed25519_new.pub | ssh user@server "cat >> ~/.ssh/authorized_keys"

Step 3: Verify the new key works

ssh -i ~/.ssh/id_ed25519_new user@server "echo connected"

Step 4: Remove the old key

# On the server, edit ~/.ssh/authorized_keys and delete the old key's line
ssh -i ~/.ssh/id_ed25519_new user@server "sed -i '/OLD_KEY_COMMENT/d' ~/.ssh/authorized_keys"

Step 5: Update all references — ~/.ssh/config, CI/CD secrets, documentation.

Automating Rotation at Scale

For many servers, do this with Ansible:

- name: Add new SSH key
  authorized_key:
    user: "{{ item.user }}"
    key: "{{ lookup('file', 'keys/new/{{ item.user }}.pub') }}"
    state: present
  loop: "{{ team_members }}"

- name: Remove old SSH key
  authorized_key:
    user: "{{ item.user }}"
    key: "{{ lookup('file', 'keys/old/{{ item.user }}.pub') }}"
    state: absent
  loop: "{{ team_members }}"

Run the "add" play first, verify access, then run the "remove" play. Never both in a single run without testing in between.

Part 4: Revocation — When Someone Leaves

This is where key management most visibly fails. An engineer leaves; their key stays. Weeks or months later, an audit finds it still granting access to production systems.

The Offboarding Checklist

When anyone loses access (resignation, termination, end of contract):

[ ] Identify all keys belonging to this person
[ ] List all servers and services they had access to
[ ] Remove keys from all authorized_keys files
[ ] Rotate any shared/service account keys they had access to
[ ] Revoke access to key management tools (Vault, etc.)
[ ] Remove from any team-level access groups
[ ] Document the revocation with timestamp

The hardest part is step two: knowing everywhere they had access. This is why the Git-managed key registry matters — it's your inventory.

Doing It Fast With Ansible

# Revoke a specific user's key everywhere
ansible all -m authorized_key -a "user=ubuntu key='{{ lookup('file', 'keys/alice.pub') }}' state=absent"

Run against your entire inventory. Done in seconds.

Using `authorized_keys` Comments as Metadata

Make revocation easier by putting searchable metadata in key comments:

ssh-ed25519 AAAA... alice@example.com|team:backend|added:2024-01-15|expires:2025-01-15

A simple script can scan all authorized_keys files and flag keys past their expiry date — giving you automated rotation reminders and an audit trail.

Part 5: Auditing What You Have

Before you can manage your keys, you need to know what exists.

Scan Your Servers

# Find all authorized_keys files on a server
find /home /root -name "authorized_keys" 2>/dev/null

# List all keys with their fingerprints
while read key; do
    echo "$key" | ssh-keygen -l -f -
done < ~/.ssh/authorized_keys

Inventory Your Local Keys

# List all key fingerprints in your local .ssh directory
for key in ~/.ssh/*.pub; do
    echo -n "$key: "
    ssh-keygen -l -f "$key"
done

Check Key Age

If your keys have date metadata in comments, a quick grep tells you what's overdue:

grep -r "authorized_keys" /home/*/  | awk -F'|' '/expires/ {print $4, $0}' | sort

The One Habit That Changes Everything

Audit your SSH keys on a schedule. Put it in the calendar. Once a quarter, run through every server, list every authorized key, verify every key has a known owner, and remove anything that doesn't.

It takes an hour. It's the single highest-value SSH security activity most teams never do.

The goal isn't a perfect system from day one — it's incremental improvement: better key generation today, an inventory this week, automated revocation next month.

SSH key management isn't exciting. But discovering a former employee's key on a production database server at 2 AM definitely is.

Found this useful? Follow for more practical deep-dives into security and infrastructure fundamentals.