Mahafuzur Rahaman

Posted on May 28

Understanding known_hosts and Host Key Verification: What It Protects Against and How TOFU Works

#networking #beginners #tutorial #security

That "authenticity of host can't be established" message isn't just noise. Here's what's actually happening — and why blindly typing "yes" is a security mistake.

Every developer has seen this:

The authenticity of host 'example.com (203.0.113.1)' can't be established.
ED25519 key fingerprint is SHA256:abc123xyz...
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Almost everyone types yes without reading it. Then they move on.

This message is SSH trying to protect you from one of the most dangerous attacks in network security: the man-in-the-middle attack. Understanding what's happening here — and what the ~/.ssh/known_hosts file actually does — will change how you think about every SSH connection you make.

The Problem SSH Is Solving

When you connect to ssh user@example.com, how do you know you're actually talking to example.com?

You can't rely on the IP address — IP addresses can be spoofed or rerouted. You can't rely on DNS — DNS can be poisoned. You can't rely on the network path — traffic can be intercepted at any point between you and the server.

Without verification, an attacker positioned between you and the server could intercept the connection, pose as the server, decrypt everything you send, re-encrypt it, and forward it along. You'd type your password or authenticate with your key and never know the attacker saw every keystroke.

This is a man-in-the-middle (MITM) attack. It's not theoretical. It happens on compromised networks, corporate proxies, malicious Wi-Fi hotspots, and misconfigured infrastructure.

SSH's defense is host key verification. Every SSH server has a unique cryptographic identity — its host key. Before you exchange any sensitive data, the server proves it holds the private key corresponding to a public key you've previously verified. If the keys don't match, SSH warns you — loudly.

What a Host Key Actually Is

When OpenSSH is installed on a server, it automatically generates a set of host key pairs. These live in /etc/ssh/:

ls /etc/ssh/ssh_host_*

/etc/ssh/ssh_host_ed25519_key       # Private key (600 permissions, root only)
/etc/ssh/ssh_host_ed25519_key.pub   # Public key (shared with clients)
/etc/ssh/ssh_host_rsa_key
/etc/ssh/ssh_host_rsa_key.pub
/etc/ssh/ssh_host_ecdsa_key
/etc/ssh/ssh_host_ecdsa_key.pub

The private key never leaves the server. The public key is what the server presents during the SSH handshake.

When you connect for the first time, the server presents its public key. SSH calculates a fingerprint of that key and shows it to you — that's the SHA256:abc123xyz... in the prompt. If you confirm, SSH stores the public key in your ~/.ssh/known_hosts file. On every future connection, SSH checks that the server presents the same key. If it doesn't, SSH refuses to connect and shows a stern warning.

Trust On First Use (TOFU)

The model SSH uses is called Trust On First Use, or TOFU.

The logic:

First connection: no existing record. SSH shows you the fingerprint and asks you to verify it.
You confirm (type yes). The key is stored as trusted.
All future connections: SSH silently verifies the key matches. If it does, you connect. If it doesn't, you get a warning.

TOFU is a pragmatic compromise. The theoretically correct approach would be to verify the server's fingerprint through a separate, trusted channel every single time — checking it against a known-good record before accepting the connection. In practice, almost no one does this for every server.

TOFU's weakness is that first connection. If an attacker intercepts your very first SSH connection to a server, you might accept their key and never know. After that point, the attacker is locked out (the wrong key is now stored) but they've already seen your first session.

For this reason, the first connection to a sensitive server should ideally involve fingerprint verification through an out-of-band channel — the cloud provider's console, a configuration management tool, or a colleague who can confirm the key directly on the server.

How to Verify a Fingerprint Before Connecting

On the server (accessed through another channel — the cloud console, serial port, etc.):

ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub

256 SHA256:abc123xyz... root@server (ED25519)

Compare this fingerprint to what SSH showed you during the first connection prompt. If they match, the connection is genuine.

Anatomy of `~/.ssh/known_hosts`

The known_hosts file is a simple text database. Each line represents a trusted server:

example.com,203.0.113.1 ssh-ed25519 AAAA...base64encodedkey...
|1|hashedhostname= ssh-ed25519 AAAA...base64encodedkey...

Fields:

Hostnames/IPs: A comma-separated list of names and addresses that identify this server. Can be a plain hostname, an IP, or both.
Key type: ssh-ed25519, ecdsa-sha2-nistp256, ssh-rsa, etc.
Public key: Base64-encoded public key.

Hashed vs. Plain Hostnames

Notice the second line above starts with |1| — that's a hashed hostname. Many systems hash known_hosts entries by default (controlled by HashKnownHosts yes in ~/.ssh/config or the system config).

Hashing means that if someone gets read access to your known_hosts file, they can't easily see which servers you connect to. The hash is a one-way function of the hostname — SSH can check if a hostname matches, but an attacker can't reverse-engineer the hostname list.

Whether to use hashing is a privacy/convenience trade-off:

Hashed: More private, but you can't grep for a hostname to check if it's stored
Plain: Easier to manage manually, readable by any text editor

For most users, either is fine. Hashed is slightly better practice.

Checking a Known Host Manually

# Check if a specific host is in known_hosts
ssh-keygen -F example.com

# With hashed hosts (searches by computing the hash)
ssh-keygen -F 203.0.113.1

The Warning You Should Never Ignore

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key was just changed.

This message means the key presented by the server doesn't match the one stored in known_hosts. SSH refuses to connect.

Two explanations — one benign, one dangerous:

Benign: The server was rebuilt, migrated to a new IP, the OS was reinstalled, or an admin regenerated the host keys. The server is legitimate but its key genuinely changed.

Dangerous: Someone is intercepting your connection and presenting their own key — a man-in-the-middle attack.

What to do:

Don't just clear the entry and reconnect. Investigate first.
Verify through an out-of-band channel — the cloud provider console, a colleague, or direct physical/serial access.
On the server, check the current key fingerprint: ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub
If the key genuinely changed for a legitimate reason, update your known_hosts.
If you can't confirm it's legitimate, don't connect.

Removing a Stale Entry

If you've verified the key change is legitimate:

ssh-keygen -R example.com
ssh-keygen -R 203.0.113.1  # Also remove by IP if both were stored

Then reconnect. SSH will present the new key and ask you to confirm.

Common Scenarios and How to Handle Them

Scenario 1: Newly Provisioned Cloud Server

You spin up a new EC2 instance. You want to SSH in. Best practice:

Get the fingerprint from the AWS console under EC2 → Instance → Actions → Monitor and troubleshoot → Get system log (the host key fingerprints are printed on first boot)
Or use EC2 Instance Connect through the browser to get to the console, then run ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub
Compare against what SSH shows you on first connection

Takes 60 extra seconds. Closes the TOFU window completely.

Scenario 2: Server Rebuilt With Same IP

Old key is in known_hosts. New server, new key. SSH screams at you.

ssh-keygen -R the-server-ip

Reconnect, verify the new fingerprint if sensitive, proceed.

Scenario 3: Ephemeral Infrastructure (Containers, Auto-Scaling)

You're SSHing into containers or ephemeral VMs that share an IP but have different keys each time. Standard known_hosts checking breaks here.

For truly ephemeral infrastructure:

Host container-dev
    HostName 10.0.0.5
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

StrictHostKeyChecking no skips the prompt. UserKnownHostsFile /dev/null prevents any key from being stored. This is acceptable for ephemeral local dev environments — not for anything production or sensitive.

A better solution for ephemeral production infrastructure is SSH certificates, where the CA's public key is trusted rather than individual host keys. The host presents a signed certificate; the client trusts anything signed by the CA.

Scenario 4: Automating SSH in Scripts

Scripts that run ssh non-interactively will hang on the first-connection prompt.

The right approach: pre-populate known_hosts before the script runs.

# Add a host key to known_hosts programmatically
ssh-keyscan -H example.com >> ~/.ssh/known_hosts

ssh-keyscan fetches the host's public key without connecting. The -H flag hashes the hostname.

Important: ssh-keyscan alone doesn't verify authenticity — it just retrieves whatever key the server presents. It's vulnerable to MITM if used on an untrusted network. For maximum security, compare the retrieved key against a known-good fingerprint before adding it.

For CI/CD pipelines, a common pattern is to pre-populate known_hosts during pipeline setup with known, verified fingerprints from a trusted source (your infrastructure code, a Vault secret, etc.) rather than using ssh-keyscan blindly.

Managing `known_hosts` at Team Scale

Individual known_hosts management is fine for personal use. For teams, it creates inconsistency — different engineers have different records, first connections happen under different network conditions, and there's no central source of truth.

Option 1: Distribute a Shared `known_hosts` File

Maintain a team known_hosts file in your infrastructure repository. Engineers include it via ~/.ssh/config:

GlobalKnownHostsFile /etc/ssh/ssh_known_hosts ~/.ssh/known_hosts

Or configure system-wide:

# /etc/ssh/ssh_config
GlobalKnownHostsFile /etc/ssh/ssh_known_hosts

The team file lives at /etc/ssh/ssh_known_hosts and is managed by configuration management (Ansible, Puppet, Chef).

Option 2: SSH Certificates (The Proper Solution)

With SSH certificate authorities, you configure every server to trust the CA's public key:

# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/trusted_ca.pub

And every client to trust host certificates signed by the CA:

# ~/.ssh/known_hosts
@cert-authority *.example.com ssh-ed25519 AAAA...ca-public-key...

Now instead of tracking individual host keys, you trust the CA. Servers present CA-signed host certificates. The TOFU problem goes away entirely — you verify the CA once, and all future connections are verified cryptographically.

This is how large organizations solve the known_hosts problem at scale.

Security Configuration Reference

Key settings in ~/.ssh/config related to host verification:

# Never skip host key checking (this is the default — keep it)
StrictHostKeyChecking yes

# Accept new host keys automatically, but warn if they change
# (A reasonable middle ground for non-sensitive environments)
StrictHostKeyChecking accept-new

# Hash stored hostnames (privacy)
HashKnownHosts yes

# Use a separate known_hosts for untrusted/ephemeral hosts
Host dev-ephemeral
    StrictHostKeyChecking no
    UserKnownHostsFile ~/.ssh/known_hosts_ephemeral

StrictHostKeyChecking values:

Value	Behavior	Use When
`yes`	Reject unknown hosts. Require manual verification.	Production, sensitive systems
`accept-new`	Accept and store new keys. Reject changed keys.	Internal dev infrastructure
`no`	Accept any key. Never warn.	Ephemeral local-only containers

accept-new is the pragmatic middle ground for most teams — you still get protection against key changes (the dangerous case) while avoiding the friction of manually confirming every new host.

The Five-Minute Audit

Right now, open your known_hosts file:

cat ~/.ssh/known_hosts | wc -l   # How many entries?
ssh-keygen -F example.com         # Is a specific host stored?

Consider:

Are there entries for servers that no longer exist?
Are there entries for IP addresses you don't recognize?
Are hostnames hashed or plain text?

Clean up stale entries with ssh-keygen -R hostname. It's low-effort hygiene with real security value.

The Bottom Line

The known_hosts file and host key verification are SSH's defense against impersonation. TOFU is an imperfect but practical trust model — its weakness is the first connection, its strength is that every connection after that is cryptographically verified.

Three habits make all the difference:

Verify fingerprints on first connection to sensitive servers through an out-of-band channel
Investigate host key change warnings rather than clearing the entry and proceeding
Use accept-new as your default StrictHostKeyChecking value unless you need stricter controls

That warning message isn't noise. It's SSH doing its job. Learn to read it, and you'll catch the attacks it's designed to surface.

Follow for more practical SSH and infrastructure security content.

DEV Community

Understanding known_hosts and Host Key Verification: What It Protects Against and How TOFU Works

That "authenticity of host can't be established" message isn't just noise. Here's what's actually happening — and why blindly typing "yes" is a security mistake.

The Problem SSH Is Solving

What a Host Key Actually Is

Trust On First Use (TOFU)

How to Verify a Fingerprint Before Connecting

Anatomy of `~/.ssh/known_hosts`

Hashed vs. Plain Hostnames

Checking a Known Host Manually

The Warning You Should Never Ignore

Removing a Stale Entry

Common Scenarios and How to Handle Them

Scenario 1: Newly Provisioned Cloud Server

Scenario 2: Server Rebuilt With Same IP

Scenario 3: Ephemeral Infrastructure (Containers, Auto-Scaling)

Scenario 4: Automating SSH in Scripts

Managing `known_hosts` at Team Scale

Option 1: Distribute a Shared `known_hosts` File

Option 2: SSH Certificates (The Proper Solution)

Security Configuration Reference

The Five-Minute Audit

The Bottom Line

Top comments (0)

That "authenticity of host can't be established" message isn't just noise. Here's what's actually happening — and why blindly typing "yes" is a security mistake.

The Problem SSH Is Solving

What a Host Key Actually Is

Trust On First Use (TOFU)

How to Verify a Fingerprint Before Connecting

Anatomy of ~/.ssh/known_hosts

Hashed vs. Plain Hostnames

Checking a Known Host Manually

The Warning You Should Never Ignore

Removing a Stale Entry

Common Scenarios and How to Handle Them

Scenario 1: Newly Provisioned Cloud Server

Scenario 2: Server Rebuilt With Same IP

Scenario 3: Ephemeral Infrastructure (Containers, Auto-Scaling)

Scenario 4: Automating SSH in Scripts

Managing known_hosts at Team Scale

Option 1: Distribute a Shared known_hosts File

Option 2: SSH Certificates (The Proper Solution)

Security Configuration Reference

The Five-Minute Audit

The Bottom Line

Anatomy of `~/.ssh/known_hosts`

Managing `known_hosts` at Team Scale

Option 1: Distribute a Shared `known_hosts` File