DEV Community

Cover image for Solved: I’ve always wanted to make something like this. Does anyone know how to
Darian Vance
Darian Vance

Posted on • Originally published at wp.me

Solved: I’ve always wanted to make something like this. Does anyone know how to

🚀 Executive Summary

TL;DR: Manual SSH key management across server fleets leads to ‘key sprawl,’ inconsistency, and security risks. This article provides three solutions: a quick bash script for immediate control, a robust Ansible playbook for declarative and scalable management, and advanced identity-aware proxies like HashiCorp Boundary or Teleport for large-scale, compliance-driven environments.

🎯 Key Takeaways

  • The core issue with SSH key management is a ‘process problem’ leading to ‘key sprawl’ and a lack of a single source of truth, not a flaw in SSH itself.
  • Ansible offers a declarative, idempotent, and scalable solution for managing SSH authorized keys, allowing teams to define the desired state and automate its enforcement.
  • For high-security and large-scale environments, identity-aware proxies and bastion hosts (e.g., HashiCorp Boundary, Teleport) provide superior access control through short-lived SSH certificates, SSO integration, and comprehensive session auditing.

Tired of manually managing SSH keys across your fleet of servers? Learn three practical solutions, from a quick bash script to a robust Ansible playbook, to automate SSH access control and end the chaos for good.

I Saw That Reddit Post. Let’s Talk About Actually Solving SSH Key Hell.

I remember it vividly. It was 3 AM, and a PagerDuty alert dragged me out of bed. The site was down. Hard down. After a frantic 20 minutes, we found the cause: a junior engineer, trying to be helpful, had run a “cleanup” script on our main application server, prod-web-01. In the process, he wiped the ~/.ssh/authorized\_keys file for the service account our deployment system used. He had no idea. We had no central management. It was a mess of ssh-copy-id and manual text file edits. That night, I swore I’d never let a team I lead manage SSH keys by hand again. It’s a ticking time bomb, and that Reddit thread about wanting to build a tool to solve it hit a little too close to home.

The “Why”: You Don’t Have a Key Problem, You Have a Process Problem

Look, the core issue isn’t SSH itself. It’s fantastic. The problem is what we call “key sprawl.” When you have two servers, manually copying keys is fine. When you have twenty, or two hundred, it’s a nightmare. Every new hire, every departure, every role change requires someone to manually SSH into a dozen machines to add or remove a public key. It’s inconsistent, impossible to audit, and dangerously insecure. The root cause is a lack of a single source of truth for who should have access to what.

So, let’s stop treating the symptom and fix the disease. Here are three ways to do it, from a quick patch to a permanent cure.

Solution 1: The Quick Fix (The “Get Me Through The Week” Bash Script)

I’m not proud of this, but I’ve done it. Sometimes you just need to get control, right now. This is the duct tape solution. It’s a simple shell script that iterates over a list of servers and a list of public keys, ensuring they’re all in place. It’s better than doing it by hand, but not by much.

First, create two files:

servers.txt (a list of hosts to manage)

prod-web-01
prod-web-02
staging-db-01
util-server-01
Enter fullscreen mode Exit fullscreen mode

keys/ (a directory where you store public keys named by user, e.g., darian.pub, jane.pub)

Now, the script itself, let’s call it sync_keys.sh:

#!/bin/bash

# WARNING: This is a destructive script. It REPLACES the authorized_keys file.

KEY_DIR="./keys"
SERVER_LIST="servers.txt"
REMOTE_USER="admin" # The user on the remote server whose keys we're managing

# Combine all .pub files into one temporary file
TEMP_KEYS_FILE=$(mktemp)
echo "### Assembled keys on $(date) ###" > "${TEMP_KEYS_FILE}"
for keyfile in "${KEY_DIR}"/*.pub; do
  cat "${keyfile}" >> "${TEMP_KEYS_FILE}"
  echo "" >> "${TEMP_KEYS_FILE}" # Add a newline for safety
done

# Loop through servers and replace the authorized_keys file
while IFS= read -r server; do
  echo "--- Syncing keys for ${server} ---"
  scp -o StrictHostKeyChecking=no "${TEMP_KEYS_FILE}" "${REMOTE_USER}@${server}:/tmp/new_authorized_keys"
  ssh -o StrictHostKeyChecking=no "${REMOTE_USER}@${server}" "mv /tmp/new_authorized_keys ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
  echo "Done."
done < "${SERVER_LIST}"

# Clean up
rm "${TEMP_KEYS_FILE}"
Enter fullscreen mode Exit fullscreen mode

Darian’s Warning: This is a hack. It’s not idempotent (it runs every time), it has minimal error checking, and it overwrites the entire authorized\_keys file. If someone adds a key manually for a critical service, this script will wipe it out on the next run. Use with extreme caution.

Solution 2: The Permanent Fix (The “Ansible Is Your Best Friend” Method)

This is the way. For 90% of teams, this is the right balance of power, simplicity, and maintainability. Using a configuration management tool like Ansible gives you a declarative, auditable, and repeatable way to manage access. You define the *state* you want, and Ansible makes it happen.

Your setup will look something like this:

hosts.ini (Your server inventory)

[webservers]
prod-web-01 ansible_user=darian
prod-web-02 ansible_user=darian

[databases]
staging-db-01 ansible_user=darian
Enter fullscreen mode Exit fullscreen mode

manage_ssh_keys.yml (The Ansible Playbook)

---
- name: Manage SSH Authorized Keys
  hosts: all
  become: yes

  vars:
    # Define users and their public keys. This can be loaded from a separate vars file.
    users:
      - username: jane
        key: "ssh-rsa AAAA... jane@laptop"
      - username: darian
        key: "ssh-rsa BBBB... darian@workstation"
      - username: service_deploy
        key: "ssh-rsa CCCC... deploy-key"

  tasks:
    - name: Ensure users exist on the system
      ansible.builtin.user:
        name: "{{ item.username }}"
        state: present
      loop: "{{ users }}"

    - name: Distribute authorized keys for specified users
      ansible.posix.authorized_key:
        user: "{{ item.username }}"
        key: "{{ item.key }}"
        state: present
        exclusive: no # Set to 'yes' to remove all other keys
      loop: "{{ users }}"
Enter fullscreen mode Exit fullscreen mode

To run it, you just execute: ansible-playbook -i hosts.ini manage_ssh_keys.yml

The beauty here is that your playbook becomes the source of truth. Need to revoke access for someone who left? Just remove them from the users list and re-run the playbook. It’s idempotent, safe, and scales beautifully.

Solution 3: The ‘Nuclear’ Option (Identity-Aware Proxies & Bastion Hosts)

Alright, so you’re running a massive fleet, you have strict compliance needs (SOC 2, HIPAA), and you need session recording and auditing. This is where you graduate from managing static keys entirely and move to short-lived certificates and identity-aware proxies.

Tools like HashiCorp Boundary or Teleport change the game completely. Here’s the gist:

  • Engineers don’t have permanent keys on the target servers.
  • They authenticate to a central service (the proxy/bastion) using their company SSO (like Okta or Google Workspace).
  • The service grants them a temporary, short-lived SSH certificate that gives them access to specific servers for a limited time (e.g., 8 hours).
  • All sessions are logged and can even be recorded for playback.

This is the pinnacle of secure access management. You’re no longer managing keys; you’re managing identities and policies. Onboarding and offboarding is as simple as adding or removing a user from an SSO group.

Pro Tip: This approach is powerful but comes with its own operational overhead. You now have a new piece of critical infrastructure to manage. Don’t jump to this just because it’s cool; adopt it when your scale and security requirements genuinely demand it.

Which Path Should You Choose?

Here’s my honest breakdown to help you decide.

Solution Effort to Implement Scalability Best For…
Bash Script Low (1-2 hours) Poor Emergency cleanup or managing < 5 non-critical servers. A temporary patch.
Ansible Playbook Medium (Half a day) Excellent 90% of teams. From startups to mid-size enterprises. The default choice.
Bastion / Proxy High (Days to weeks) Infinite Large companies with strict compliance, auditing, and security needs.

So, to the person on Reddit who wanted to build “something like this” – you’re on the right track. The instinct to automate this pain away is what separates senior engineers from junior ones. My advice? Skip the bash script unless your hair is on fire. Spend a day learning the basics of Ansible. It will pay you back a thousand times over, not just for SSH keys, but for every other repetitive server task you’re still doing by hand.


Darian Vance

👉 Read the original article on TechResolve.blog


Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Top comments (0)