DEV Community

The Good Shell
The Good Shell

Posted on

Four things that will get your Cosmos validator slashed before you earn a single block reward

The most dangerous moment in a Cosmos validator setup is not the on-chain registration. It is the ten minutes before it, when your priv_validator_key.json is sitting unprotected on the validator host and you are about to run create-validator for the first time.
Most guides walk you through the steps. Fewer of them tell you the specific things that will get you jailed or slashed if you skip them. These are four of them, from running validators on Cosmos Hub mainnet.

1. NVMe is not optional, it is the difference between signing blocks and missing them

Every guide lists "4TB SSD" as a hardware requirement. What most of them do not emphasize is that SATA SSDs and standard HDDs will cause I/O bottlenecks under load that manifest directly as missed blocks.
The chain data on Cosmos Hub has grown significantly. Under normal operation, the node is continuously reading and writing to disk. During governance-triggered upgrades, that load spikes. If your disk cannot keep up, the node falls behind on block processing and starts missing signatures.
NVMe specifically matters because the throughput difference between NVMe and SATA SSD is not marginal. It is the difference between a node that stays in sync under pressure and one that starts accumulating missed blocks at exactly the moment you can least afford it.
RAM is the second one people underestimate. You need 64GB. The 32GB setups work fine in normal operation. They fail during upgrades, when memory spikes well above the normal operating baseline. Running out of memory at upgrade height is a jailing event.

  1. Never set DAEMON_ALLOW_DOWNLOAD_BINARIES=true in Cosmovisor This feels counterintuitive. Cosmovisor's auto-download feature sounds useful, you stage the upgrade in governance, and Cosmovisor downloads and swaps the binary automatically at the right block height. The problem is what happens when the download fails. If the binary cannot be fetched at upgrade height, the node halts immediately. You are now racing to manually place the binary before the jailing threshold kicks in. On Cosmos Hub, that window is approximately 500 blocks, around 16 minutes at normal block times. The safer pattern is to always pre-place upgrade binaries manually in the Cosmovisor upgrade directory before the governance proposal passes. You monitor the proposal, you compile and verify the binary, you put it in place. Cosmovisor finds it already there and does the swap cleanly. DAEMON_ALLOW_DOWNLOAD_BINARIES=false forces you into this pattern. It removes the failure mode where an auto-download kills your uptime at exactly the worst moment.

3. The migration double-sign window is where most slashing events happen

Double-sign slashing is permanent. It does not unjail. The tombstone is final.
The scenario that causes it most often is not a configuration mistake during initial setup. It is a validator migration: moving from one host to another. The sequence that causes it:
Old node is stopped. New node is started. Old node process was not actually stopped, or was restarted by a systemd restart policy, or a snapshot was used and the old node resumed from a state that did not reflect the stop.
Both nodes are now signing with the same key. Double-sign event. Tombstone.
The protection is simple but must be deliberate. When migrating: stop the old node, wait for a minimum of 10 confirmed blocks with no signing activity from that key, then start the new node. Never start the new node and then stop the old one. Never assume a stop command worked without verifying it.
Setting double_sign_check_height to a non-zero value in config.toml (10 to 20 blocks is standard) adds a second layer. The node will check recent block history before signing and refuse to sign if it detects a potential double-sign situation.

4. The sentry architecture is what keeps your validator IP off the public internet

A validator without sentry nodes has its IP address visible in the P2P network. That is a DDoS target. Taking your validator offline long enough to miss 5% of blocks in a sliding window triggers jailing on Cosmos Hub.
The sentry pattern is straightforward: two or more public-facing full nodes handle all external P2P connections. The validator node only connects to the sentries, never to the broader network. Its IP is never gossiped to peers.
On the validator node, this means pex = false and persistent_peers pointing only to the sentry node IDs. On the sentry nodes, the validator node ID is listed in private_peer_ids so its address is never shared with the network.
Run sentries in at least two different geographic regions and on different providers. A DDoS that takes down one sentry is neutralised if the second is on a separate network.

These four are the ones that cause the most production incidents on Cosmos validators: the hardware under-specification, the auto-download failure mode, the migration double-sign window, and the missing sentry layer. The rest of the setup, Go installation, gaiad build, state sync, TMKMS configuration, on-chain registration, is more mechanical.
If you want the full setup with all the configuration files and commands from start to production, I wrote a detailed guide covering the complete process:
Cosmos Validator Setup: The Ultimate Step-by-Step Guide for 2026
Happy to answer questions in the comments if you are working through any of these.

Top comments (0)