DEV Community

Tosh
Tosh

Posted on

Running a Midnight Node: Setup, Sync, and Monitoring

Running a Midnight Node: Setup, Sync, and Monitoring

I set up a Midnight node on a spare Ubuntu box last month and hit pretty much every failure mode there is — 0 peers for 20 minutes, stuck on block 1 for an hour, then a surprise OOM kill that corrupted the database. This guide documents what actually works, including the "stuck on block 1" issue that seems to catch everyone off guard the first time.

Hardware Requirements

Before starting, confirm your hardware meets these minimums:

Component Minimum Recommended
CPU 4 cores (x86_64 or ARM64) 8+ cores
RAM 8 GB 16 GB
Storage 100 GB SSD 500 GB NVMe SSD
Network 10 Mbps stable 100 Mbps
OS Ubuntu 22.04 LTS Ubuntu 22.04 LTS

Why SSD matters: Midnight, like Cardano, performs heavy ledger operations during sync. Mechanical drives will make initial sync take 10-20x longer and may cause peer disconnections as your node fails to keep up.

RAM consideration: The proof server (if running locally) requires an additional 4-8 GB during proof generation. On a node dedicated to block validation only, 8 GB is sufficient.


Installation via Docker (Recommended)

Docker is the supported deployment method. It handles dependency management and makes upgrades straightforward.

Step 1: Install Docker

# Ubuntu 22.04
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
# Log out and back in for group changes
Enter fullscreen mode Exit fullscreen mode

Step 2: Pull the Midnight node image

docker pull midnightntwrk/midnight-node:latest
# Or pin a specific version (recommended for production):
docker pull midnightntwrk/midnight-node:0.14.0
Enter fullscreen mode Exit fullscreen mode

Step 3: Create a working directory and configuration

mkdir -p ~/midnight-node/{data,config,logs}
cd ~/midnight-node
Enter fullscreen mode Exit fullscreen mode

Create docker-compose.yml:

version: '3.8'
services:
  midnight-node:
    image: midnightntwrk/midnight-node:0.14.0
    container_name: midnight-node
    restart: unless-stopped
    ports:
      - "9944:9944"   # RPC WebSocket
      - "9933:9933"   # RPC HTTP
      - "30333:30333" # P2P
    volumes:
      - ./data:/data
      - ./config:/config
      - ./logs:/logs
    environment:
      - RUST_LOG=info
    command: |
      --base-path /data
      --chain testnet
      --port 30333
      --rpc-port 9933
      --ws-port 9944
      --rpc-cors all
      --unsafe-rpc-external
      --name "my-midnight-node"
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"
Enter fullscreen mode Exit fullscreen mode

Step 4: Start the node

docker compose up -d
docker compose logs -f midnight-node
Enter fullscreen mode Exit fullscreen mode

Understanding the Initial Sync

When you start a fresh node, it goes through three phases:

Phase 1: Peer Discovery (seconds to minutes)
Your node announces itself to the network and discovers peers. You'll see logs like:

INFO sync  💤 Idle (0 peers), best: #0 (0x0000…0000)
INFO network  Discovered new external address
INFO network  Connected to peer 12D3Koo...
Enter fullscreen mode Exit fullscreen mode

If you see Idle (0 peers) for more than 5 minutes, you have a connectivity issue (see troubleshooting below).

Phase 2: Header Sync (minutes to hours)
Your node downloads block headers first, much faster than full blocks:

INFO sync  ⚙️  Syncing 847.3 bps, target=#485231 (8 peers)
Enter fullscreen mode Exit fullscreen mode

"bps" here means blocks per second. Rates of 100-1000 bps are normal during header sync.

Phase 3: Block Execution (hours to days depending on chain age)
Full block execution is the slow part — every transaction is re-executed and state is computed:

INFO sync  ⚙️  Syncing 12.1 bps, target=#485231 (8 peers)
INFO state  Applied block #102847
Enter fullscreen mode Exit fullscreen mode

Rates of 5-50 bps during execution are normal. Don't panic.

How long does full sync take?

For Midnight testnet (still relatively young):

  • Hardware minimum: 6-12 hours
  • Hardware recommended: 2-4 hours
  • With fast NVMe SSD and strong CPU: 1-2 hours

Monitoring Block Height

Method 1: RPC polling

Once your node is running, poll its current best block:

# Check current synced block
curl -s -H "Content-Type: application/json" \
  -d '{"id":1,"jsonrpc":"2.0","method":"chain_getBlock","params":[]}' \
  http://localhost:9933 | python3 -c "
import json, sys
d = json.load(sys.stdin)
block = d['result']['block']['header']
print(f\"Block: #{int(block['number'], 16)}\")
print(f\"Hash: {block['parentHash'][:20]}...\")
"
Enter fullscreen mode Exit fullscreen mode

Method 2: WebSocket subscription

For real-time monitoring:

# Install wscat if needed
npm install -g wscat

wscat -c ws://localhost:9944 << 'EOF'
{"id":1,"jsonrpc":"2.0","method":"chain_subscribeNewHeads","params":[]}
EOF
Enter fullscreen mode Exit fullscreen mode

Method 3: Simple sync check script

Save as check_sync.sh:

#!/bin/bash
LOCAL_BLOCK=$(curl -s -H "Content-Type: application/json" \
  -d '{"id":1,"jsonrpc":"2.0","method":"chain_getHeader","params":[]}' \
  http://localhost:9933 | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(int(d['result']['number'], 16))
" 2>/dev/null)

# Compare against a known good peer (replace with actual peer RPC)
# PEER_BLOCK=$(...)

echo "Local best block: #$LOCAL_BLOCK"
echo "Time: $(date)"

# Check if node is making progress by comparing with previous run
PREV_FILE="/tmp/midnight_prev_block"
if [ -f "$PREV_FILE" ]; then
  PREV_BLOCK=$(cat "$PREV_FILE")
  PROGRESS=$((LOCAL_BLOCK - PREV_BLOCK))
  echo "Progress since last check: +$PROGRESS blocks"
fi
echo $LOCAL_BLOCK > "$PREV_FILE"
Enter fullscreen mode Exit fullscreen mode

Run on a schedule:

chmod +x check_sync.sh
watch -n 10 ./check_sync.sh
Enter fullscreen mode Exit fullscreen mode

Diagnosing "Stuck on Block 1"

This is the most common issue new node operators hit. Symptoms:

INFO sync  💤 Idle (0 peers), best: #1 (0x1234…abcd)
# or
INFO sync  ⚙️  Syncing 0.0 bps, target=#485231 (3 peers)
Enter fullscreen mode Exit fullscreen mode

Your node has peers but isn't advancing. Here's the diagnostic tree:

Check 1: Are peers actually connected?

curl -s -H "Content-Type: application/json" \
  -d '{"id":1,"jsonrpc":"2.0","method":"system_peers","params":[]}' \
  http://localhost:9933 | python3 -c "
import json, sys
peers = json.load(sys.stdin)['result']
print(f'Connected peers: {len(peers)}')
for p in peers:
    print(f\"  {p['peerId'][:20]}... best: #{p['bestNumber']}\")
"
Enter fullscreen mode Exit fullscreen mode

If peers show bestNumber: 1, your entire peer set is also stuck — you may have hit isolated testnet nodes. Restart and wait for better peers.

Check 2: Is the network port accessible?

# From another machine or use online port checker
nc -zv YOUR_SERVER_IP 30333
Enter fullscreen mode Exit fullscreen mode

If port 30333 isn't reachable from the internet, you'll only get local peers (usually none). Fix firewall rules:

# Ubuntu UFW
sudo ufw allow 30333/tcp
sudo ufw status
Enter fullscreen mode Exit fullscreen mode

If behind NAT (home network), you need port forwarding on your router for port 30333.

Check 3: Storage I/O performance

# Check if storage is the bottleneck
iostat -x 1 5

# Look at the %util column for your storage device
# >80% sustained means storage is saturating
Enter fullscreen mode Exit fullscreen mode

If your storage is consistently >80% utilized, the node is I/O bottlenecked. Migrate to faster SSD.

Check 4: Memory pressure

free -h
# If available memory < 1GB, you have memory pressure

# Check if the node is being OOM-killed
docker inspect midnight-node | grep -A5 "State"
dmesg | grep -i "out of memory" | tail -5
Enter fullscreen mode Exit fullscreen mode

Check 5: Corrupt chain database

If the node was killed ungracefully (power loss, kill -9), the chain database may be corrupt:

docker compose down
# Back up and remove the data directory
mv data data.bak
mkdir data
docker compose up -d
# Node will re-sync from genesis
Enter fullscreen mode Exit fullscreen mode

Resource Requirements During Operation

After initial sync completes, steady-state resource usage:

CPU: 5-20% on a 4-core machine during normal operation. Spikes to 80%+ during epoch transitions.

RAM: 2-4 GB for the node process alone. Add proof server if running locally.

Storage growth rate: Approximately 5-15 GB/month depending on network activity (testnet figures — mainnet will differ).

Network: 50-200 MB/hour download, 20-50 MB/hour upload for a non-archival node.


Verifying Your Node is Synced and Healthy

When sync completes, you'll see:

INFO sync  💤 Idle (12 peers), best: #485231 (0xabcd…1234)
Enter fullscreen mode Exit fullscreen mode

"Idle" with the current chain head block number = fully synced.

Health check script (health_check.sh):

#!/bin/bash
set -euo pipefail

RPC="http://localhost:9933"

# Get sync state
SYNC=$(curl -s -H "Content-Type: application/json" \
  -d '{"id":1,"jsonrpc":"2.0","method":"system_health","params":[]}' \
  $RPC)

IS_SYNCING=$(echo $SYNC | python3 -c "import json,sys; print(json.load(sys.stdin)['result']['isSyncing'])")
PEERS=$(echo $SYNC | python3 -c "import json,sys; print(json.load(sys.stdin)['result']['peers'])")

echo "Peers: $PEERS"
echo "Syncing: $IS_SYNCING"

if [ "$IS_SYNCING" = "False" ]; then
  echo "✅ Node is synced and healthy"
else
  # Get current block
  BLOCK=$(curl -s -H "Content-Type: application/json" \
    -d '{"id":1,"jsonrpc":"2.0","method":"chain_getHeader","params":[]}' \
    $RPC | python3 -c "import json,sys; print(int(json.load(sys.stdin)['result']['number'], 16))")
  echo "⏳ Still syncing, current block: #$BLOCK"
fi

# Check peer connectivity
if [ "$PEERS" -lt 3 ]; then
  echo "⚠️  Warning: low peer count ($PEERS). Check firewall rules."
fi
Enter fullscreen mode Exit fullscreen mode

Run this check after every deployment or restart.


Monitoring with Docker Logs

The Midnight node logs are structured and informative. Key patterns to watch:

# Follow logs in real time
docker compose logs -f midnight-node

# Filter for errors only
docker compose logs --since 1h midnight-node | grep -E "ERROR|WARN"

# Check sync rate (last 10 sync messages)
docker compose logs --since 1h midnight-node | grep "Syncing" | tail -10
Enter fullscreen mode Exit fullscreen mode

Concerning log patterns:

  • WARN network Dropping slow peer: One or two is fine, many indicate network issues
  • ERROR import Failed to import block: Usually means corrupt database or invalid block — investigate immediately
  • WARN sync Reorg at block: Chain reorganization — normal if infrequent, concerning if frequent
  • WARN peering Peer disconnected: Normal to see occasionally; constant disconnects indicate networking or resource issues

Operational Best Practices

Automatic restart on failure:
The restart: unless-stopped in the compose file handles this. Verify:

docker inspect midnight-node | grep RestartPolicy
Enter fullscreen mode Exit fullscreen mode

Log rotation: Already configured in the compose file above. Verify rotation is working:

ls -lh ~/midnight-node/logs/
docker system df  # Check Docker's disk usage
Enter fullscreen mode Exit fullscreen mode

Monitoring alerting: Set up a simple cron job to alert if the node falls behind:

# Add to crontab: crontab -e
*/5 * * * * /home/user/midnight-node/health_check.sh >> /home/user/midnight-node/logs/health.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Backup strategy: The node database can be re-synced from genesis, so it doesn't need backup. What does need backup: any private keys used for block authoring (if you're running a validator, which requires separate setup).

Upgrade procedure:

docker compose pull          # Get new image
docker compose down          # Stop current node
docker compose up -d         # Start with new image
docker compose logs -f       # Watch startup
Enter fullscreen mode Exit fullscreen mode

Common Questions

Q: Can I run this on a VPS/cloud server?

Yes. Most cloud providers work. Avoid burstable CPU instances (AWS T-series, GCP E2) for sustained sync workloads — they'll throttle under sustained load. Use compute-optimized instances with dedicated CPU.

Q: Do I need to keep port 30333 open?

For syncing: yes, strongly recommended. Without inbound P2P connections, your node relies entirely on outbound connections and has fewer peers, which slows sync and makes you vulnerable to being isolated from the main network.

Q: How do I know if my node is on the canonical chain?

Compare your node's best block hash with hashes from the Midnight block explorer. If they match at the same height, you're on the canonical chain.

Q: What's the difference between an archival and non-archival node?

A non-archival node (default) prunes old state to save space. An archival node (--pruning archive) keeps all historical state, useful for querying historical data. Archival nodes require significantly more storage.


In my experience, 90% of "stuck at block 1" issues come down to two things: port 30333 isn't reachable from outside, or you're running on spinning rust and the I/O just can't keep up. Check those before going deeper into the diagnostic tree.

Top comments (0)