Kubilay İşen

Posted on Oct 20

Building a High-Availability Vault Cluster with Docker and Raft Storage

#hashicorpvault #vault #docker #devops

Introduction

HashiCorp Vault is one of the most powerful secrets management solutions in the industry. However, setting up a production-ready, highly-available Vault cluster can be intimidating. In this article, I'll walk you through building a 3-node Vault cluster using Docker with automatic unsealing, Raft-based storage, and infrastructure-as-code automation.

By the end of this guide, you'll have a resilient secrets management infrastructure that can handle node failures and scale horizontally.

Why Vault? Why High-Availability?

The Problem

In modern infrastructure:

Secrets are scattered across multiple systems (databases, APIs, certificates)
No single source of truth for credential rotation
Compliance requirements demand audit trails
Manual secret management is error-prone

The Solution

Vault provides:

Centralized secret management - Single source of truth
Encryption as a service - Encrypt/decrypt without exposing keys
Dynamic credentials - Automatically generate short-lived credentials
Audit logging - Complete trail of who accessed what and when
High availability - Never lose access to your secrets

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                 Docker Compose Network                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │  Vault Node  │  │  Vault Node  │  │  Vault Node  │   │
│  │     (1)      │  │     (2)      │  │     (3)      │   │
│  └──────────────┘  └──────────────┘  └──────────────┘   │
│         │                 │                  │          │
│  ┌──────┴─────────────────┼──────────────────┴────┐     │
│  │                  Raft Consensus                │     │
│  └────────────────────────────────────────────────┘     │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │   Nginx Load Balancer (Optional)                 │   │
│  └──────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Our setup features:

3 Vault nodes for true high-availability
Raft storage backend - No external dependencies (unlike Consul)
Auto-unsealing - Automatic node recovery without manual intervention
Docker Compose orchestration - Easy to deploy and manage
Health checks - Automatic failure detection

Prerequisites

Before we begin, make sure you have:

- Docker & Docker Compose (v3.8+)
- Taskfile CLI for [go-task](https://taskfile.dev/#/installation) automation
- curl (for API testing)

Install the requirements:

# macOS
brew install docker docker-compose task jq curl

# Linux
sudo apt-get install docker.io docker-compose taskfile jq curl

Project Structure

vault-docker-cluster/
├── docker-compose.yaml           # Services definition
├── Dockerfile.vault              # Custom Vault image
├── Taskfile.yml                  # Task automation
├── init-and-generate-unseal.sh   # Cluster initialization
├── auto-unseal-monitor.sh        # Monitoring script
├── vault-1/
│   ├── config/
│   │   ├── vault.hcl            # Vault configuration
│   │   └── unseal.sh            # Auto-generated unseal script
│   └── data/                     # Raft data storage
├── vault-2/                      # (same structure)
└── vault-3/                      # (same structure)

Step 1: Docker Compose Configuration

Let's start with the docker-compose.yaml. This file orchestrates three Vault nodes:

version: '3.8'

networks:
  vault_net:
    driver: bridge

services:
  vault-1:
    image: vaultdockercluster:1.20
    restart: unless-stopped
    volumes:
      - ./vault-1/config:/vault/config
      - ./vault-1/data:/vault/data
    cap_add:
      - IPC_LOCK                    # Required for mlock
    entrypoint:
      - vault
      - server
      - -config=/vault/config/vault.hcl
    ulimits:
      memlock: -1                   # Unlimited memory lock
      nofile:
        soft: 65535
        hard: 65535
    networks:
      - vault_net
    healthcheck:
      test: ["CMD", "vault", "status", "-tls-skip-verify"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G

  # vault-2 and vault-3 use the same configuration...

Key Configuration Points:

Setting	Purpose
`IPC_LOCK`	Allows Vault to lock memory pages (prevents swapping secrets to disk)
`memlock: -1`	Unlimited memory lock for all processes
`healthcheck`	Detects node failures automatically
`ulimits`	Handles many concurrent connections
`networks`	Isolated network for inter-node communication

Step 2: Vault Configuration (HCL)

Each node has its own vault.hcl configuration:

vault-1:

storage "raft" {
  path    = "/vault/data"
  node_id = "vault-1"
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_disable   = true
}

api_addr     = "http://vault-1:8200"
cluster_addr = "http://vault-1:8201"

ui = true
disable_mlock = true

vault-2:

storage "raft" {
  path    = "/vault/data"
  node_id = "vault-2"

  retry_join {
    leader_api_addr = "http://vault-1:8200"
  }
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_disable   = true
}

api_addr     = "http://vault-2:8200"
cluster_addr = "http://vault-2:8201"

ui = true
disable_mlock = true

vault-3:

storage "raft" {
  path    = "/vault/data"
  node_id = "vault-3"

  retry_join {
    leader_api_addr = "http://vault-1:8200"
  }
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_disable   = true
}

api_addr     = "http://vault-3:8200"
cluster_addr = "http://vault-3:8201"

ui = true
disable_mlock = true

Raft vs Other Backends:

Aspect	Raft	Consul	S3
Setup Complexity	Simple	Complex	Simple
Dependencies	None	Consul cluster needed	AWS account
Cost	Free	Free (self-hosted)	$ per API call
Performance	Excellent	Good	Slower
Best For	Self-hosted HA	Enterprise	AWS-native

Step 3: Automated Setup with Taskfile

The Taskfile.yml automates common operations:

version: '3'

vars:
  VAULT_ADDR: 'http://127.0.0.1:8200'
  KEY_SHARES: 5
  KEY_THRESHOLD: 3

tasks:
  bootstrap:
    desc: Full bootstrap - start cluster, initialize, and setup
    cmds:
      - task: up
      - echo "⏳ Waiting for containers to be ready (15 seconds)..."
      - sleep 15
      - task: init
      - echo ""
      - echo "📋 Credentials saved! Review them above."
      - echo "Press Enter to continue with cluster setup..."
      - read _
      - task: setup-cluster
      - echo ""
      - echo "✅ Bootstrap complete!"

  up:
      desc: Start the Vault cluster
      cmds:
        - docker-compose up -d

  init:
    desc: Initialize vault-1 and auto-generate unseal.sh scripts
    cmds:
      - ./init-and-generate-unseal.sh
    preconditions:
      - sh: docker-compose ps vault-1 | grep -q "Up"
        msg: "vault-1 is not running. Run 'task up' first."
      - sh: command -v jq >/dev/null 2>&1
        msg: "jq is required but not installed. Install with: brew install jq"

  setup-cluster:
    desc: Complete cluster setup - unseal vault-1, join and unseal vault-2 and vault-3
    cmds:
      - task: unseal-vault-1
      - task: join-vault-2
      - task: unseal-vault-2
      - task: join-vault-3
      - task: unseal-vault-3

Usage is straightforward:

# Start everything
task bootstrap

# Or step by step
task up
task init
task setup-cluster
# Check status
docker-compose ps

Step 4: Initialization and Unsealing

The initialization script (init-and-generate-unseal.sh) handles the critical setup:

Calls vault operator init against vault-1.
Prints the unseal keys + root token (and writes them to vault-credentials-<timestamp>.md and vault-init-keys.json).
Generates vault-*/config/unseal.sh helper scripts.

#!/bin/bash
set -e

echo "🔐 Initializing Vault Cluster..."
echo ""

# Wait for vault-1 to be ready (unsealed but not initialized)
echo "⏳ Waiting for vault-1 to be ready..."
max_attempts=30
attempt=0

while [ $attempt -lt $max_attempts ]; do
    # Check if we can connect to Vault and get a status response
    # vault status returns non-zero exit code when sealed, so we need to capture both stdout and exit code
    status=$(docker-compose exec -T vault-1 sh -c "export VAULT_ADDR='http://127.0.0.1:8200' && vault status -format=json 2>&1" || true)

    # Check if we got valid JSON output (meaning Vault is responding)
    if echo "$status" | jq -e . >/dev/null 2>&1; then
        initialized=$(echo "$status" | jq -r '.initialized // false')

        if [ "$initialized" = "false" ]; then
            echo "✅ Vault is ready for initialization"
            break
        elif [ "$initialized" = "true" ]; then
            echo "❌ Error: Vault is already initialized!"
            echo ""
            echo "If you want to re-initialize:"
            echo "  1. Run: task reset"
            echo "  2. Run: task bootstrap"
            exit 1
        fi
    else
        # Vault is not responding yet, keep waiting
        echo "⏳ Waiting for Vault to start (attempt $((attempt + 1))/$max_attempts)..."
    fi

    attempt=$((attempt + 1))
    sleep 2
done

if [ $attempt -eq $max_attempts ]; then
    echo "❌ Timeout waiting for vault-1 to be ready"
    echo ""
    echo "Check logs with: docker-compose logs vault-1"
    exit 1
fi

echo ""

# Initialize vault-1 and capture output
echo "🔑 Initializing Vault..."
INIT_OUTPUT=$(docker-compose exec -T vault-1 sh -c "export VAULT_ADDR='http://127.0.0.1:8200' && vault operator init -key-shares=5 -key-threshold=3 -format=json")

# Parse the JSON output
UNSEAL_KEY_1=$(echo "$INIT_OUTPUT" | jq -r '.unseal_keys_b64[0]')
UNSEAL_KEY_2=$(echo "$INIT_OUTPUT" | jq -r '.unseal_keys_b64[1]')
UNSEAL_KEY_3=$(echo "$INIT_OUTPUT" | jq -r '.unseal_keys_b64[2]')
UNSEAL_KEY_4=$(echo "$INIT_OUTPUT" | jq -r '.unseal_keys_b64[3]')
UNSEAL_KEY_5=$(echo "$INIT_OUTPUT" | jq -r '.unseal_keys_b64[4]')
ROOT_TOKEN=$(echo "$INIT_OUTPUT" | jq -r '.root_token')

echo "════════════════════════════════════════════════════════════════"
echo "⚠️  SAVE THESE CREDENTIALS SECURELY - THEY CANNOT BE RECOVERED!"
echo "════════════════════════════════════════════════════════════════"
echo ""
echo "Unseal Key 1: $UNSEAL_KEY_1"
echo "Unseal Key 2: $UNSEAL_KEY_2"
echo "Unseal Key 3: $UNSEAL_KEY_3"
echo "Unseal Key 4: $UNSEAL_KEY_4"
echo "Unseal Key 5: $UNSEAL_KEY_5"
echo ""
echo "Root Token: $ROOT_TOKEN"
echo ""
echo "════════════════════════════════════════════════════════════════"
echo ""

# Save to a backup file
BACKUP_FILE="vault-credentials-$(date +%Y%m%d-%H%M%S).md"
cat > "$BACKUP_FILE" << EOF
Vault Cluster Initialization - $(date)
════════════════════════════════════════════════════════════════

Unseal Key 1: $UNSEAL_KEY_1
Unseal Key 2: $UNSEAL_KEY_2
Unseal Key 3: $UNSEAL_KEY_3
Unseal Key 4: $UNSEAL_KEY_4
Unseal Key 5: $UNSEAL_KEY_5

Root Token: $ROOT_TOKEN

════════════════════════════════════════════════════════════════
⚠️  Store this file securely and delete it from this location!
EOF

echo "✅ Credentials saved to: $BACKUP_FILE"
echo ""

# Generate unseal.sh for vault-1
echo "📝 Generating unseal.sh scripts..."

cat > vault-1/config/unseal.sh << EOF
#!/bin/sh
set -e
export VAULT_ADDR='http://127.0.0.1:8200'
vault operator unseal $UNSEAL_KEY_1
vault operator unseal $UNSEAL_KEY_2
vault operator unseal $UNSEAL_KEY_3
echo "✅ Vault unsealed successfully"
EOF

chmod +x vault-1/config/unseal.sh

# Copy to vault-2
cp vault-1/config/unseal.sh vault-2/config/unseal.sh

# Copy to vault-3
cp vault-1/config/unseal.sh vault-3/config/unseal.sh

echo "✅ Created unseal.sh in vault-1/config/"
echo "✅ Created unseal.sh in vault-2/config/"
echo "✅ Created unseal.sh in vault-3/config/"
echo ""

# Also save JSON format for automation
echo "$INIT_OUTPUT" | jq '.' > vault-init-keys.json
echo "✅ Saved JSON format to: vault-init-keys.json"
echo ""

echo "════════════════════════════════════════════════════════════════"
echo "🎉 Initialization Complete!"
echo "════════════════════════════════════════════════════════════════"
echo ""
echo "Next steps:"
echo "  1. Secure the credentials file: $BACKUP_FILE"
echo "  2. Run: task setup-cluster"
echo "  3. Or manually:"
echo "     - task unseal-vault-1"
echo "     - task join-vault-2 && task unseal-vault-2"
echo "     - task join-vault-3 && task unseal-vault-3"
echo ""

Important Security Notes:

⚠️ Critical: Store the root token and unseal keys safely:

Save them in a password manager
Never commit them to version control
Consider using Vault's auto-unseal with KMS (AWS, GCP, Azure)

Step 5: Building the Docker Image

A minimal Dockerfile.vault:

FROM hashicorp/vault:1.20
ENV TZ=Europe/Istanbul

Build it with:

docker build -f Dockerfile.vault -t vaultdockercluster:1.20 .

Step 6: Raft Cluster Formation

After initializing vault-1, join the other nodes:

# Join vault-2 to the cluster
docker-compose exec vault-2 sh -c \
  "vault operator raft join http://vault-1:8200"

# Join vault-3 to the cluster
docker-compose exec vault-3 sh -c \
  "vault operator raft join http://vault-1:8200"

# Verify cluster status
docker-compose exec vault-1 vault operator raft list-peers

Expected output:

Node ID    Address            State       Voter
------     -------            -----       -----
vault-1    vault-1:8201       leader      true
vault-2    vault-2:8201       follower    true
vault-3    vault-3:8201       follower    true

Complete Workflow: From Zero to Hero

Here's the complete startup sequence:

# 1. Start the containers
task up

# 2. Watch the logs
docker-compose logs -f vault-1

# 3. Initialize the cluster
task init

# Save the credentials somewhere safe!

# 4. Unseal all nodes
task unseal-all

# 5. Join nodes 2 and 3 to the cluster
task join-vault-2
task join-vault-3

# 6. Verify the cluster
docker-compose exec vault-1 vault operator raft list-peers

# 7. Access the UI
open http://localhost:8200/ui

Testing Your Cluster

Test 1: High Availability

Kill the leader node and watch the cluster recover:

# Kill vault-1 (the leader)
docker-compose kill vault-1

# Check who's the new leader
docker-compose exec vault-2 vault operator raft list-peers

# Bring it back
docker-compose restart vault-1

# Verify recovery
docker-compose ps

Auto-Unseal Monitor

When a Vault container restarts, the auto-unseal monitor automatically detects that the Vault node has become sealed and immediately unseals it using the stored unseal keys.

The auto-unseal-monitor.sh script runs in its own container (vault-unsealer).

Polls each Vault node every 30 seconds via vault status.
If a node transitions to sealed, reads the unseal keys from the mounted unseal.sh file and runs the three vault operator unseal commands.
Retries up to three times per incident. Logs show timestamps and outcomes.

Because the helper executes the same unseal script stored on disk, protect the vault-*/config/unseal.sh files and remove them when no longer needed.

Test 2: Secret Storage

Store and retrieve a secret:


docker-compose exec vault-1 sh

# Login
export VAULT_TOKEN="your-root-token"
export VAULT_ADDR="http://localhost:8200"

# Create a secret
vault kv put secret/my-app/database \
  username=admin \
  password=super-secret-password

# Retrieve it
vault kv get secret/my-app/database

# Read it via API
curl -H "X-Vault-Token: $VAULT_TOKEN" \
  http://localhost:8200/v1/secret/data/my-app/database

Test 3: Encryption as a Service

# Enable transit engine
vault secrets enable transit

# Create an encryption key
vault write -f transit/keys/my-key

# Encrypt data
vault write transit/encrypt/my-key plaintext=@data.txt

# Decrypt it
vault write transit/decrypt/my-key ciphertext=vault:v1:...

Monitoring and Maintenance

Check Cluster Health

# Status of all nodes
docker-compose exec vault-1 vault status

# Raft peer status
docker-compose exec vault-1 vault operator raft list-peers

# Audit logs
docker-compose logs vault-1 | grep ERROR

# System metrics
docker-compose stats

Common Issues and Solutions

Issue	Cause	Solution
Node won't join	Already part of a cluster	Run `task reset` and `task bootstrap`
Connection refused	Node not running	Check with `docker-compose ps`
Memory locked	mlock issues	Check `ulimits` and `IPC_LOCK` capability

Production Considerations

1. Enable TLS/HTTPS

listener "tcp" {
  address            = "0.0.0.0:8200"
  tls_cert_file      = "/vault/config/cert.pem"
  tls_key_file       = "/vault/config/key.pem"
}

2. Enable Audit Logging

audit {
  file {
    path = "/vault/logs/audit.log"
  }
}

3. Configure Storage Snapshots

# Backup Raft data
vault operator raft snapshot save vault-backup.snap

# Restore from snapshot
vault operator raft snapshot restore -force vault-backup.snap

4. Set Resource Limits

deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '1'
      memory: 2G

Scaling Considerations

Adding More Nodes

# Copy vault-1 directory structure
cp -r vault-1 vault-4

# Update vault-4/config/vault.hcl (change node_id)
sed -i 's/vault-1/vault-4/g' vault-4/config/vault.hcl

# Add to docker-compose.yaml and run
docker-compose up -d vault-4

Load Balancing

upstream vault {
    server vault-1:8200;
    server vault-2:8200;
    server vault-3:8200;
}

server {
    listen 80;
    location / {
        proxy_pass http://vault;
    }
}

Advanced: Disaster Recovery

Scenario: Complete Cluster Failure

# 1. Reset everything
task reset

# 2. Restore from backup
vault operator raft snapshot restore -force vault-backup.snap

# 3. Bring cluster back up
task bootstrap

Scenario: Corrupted Raft State

# 1. Stop all nodes
task down

# 2. Clean data directories
rm -rf vault-*/data/raft/*

# 3. Restore from known-good backup
# Copy backed-up raft directory to all nodes

# 4. Start nodes
task up

Conclusion

You now have a production-ready, highly-available Vault cluster with:

✅ Three-node Raft cluster for high availability

✅ Automated deployment via Docker Compose

✅ Task automation for common operations

✅ Auto-unsealing capability

✅ Health monitoring and failure detection

✅ Scalable architecture ready for growth

Resources

About the Author

This article demonstrates a practical approach to secrets management using open-source tools. For production deployments, consider consulting with security specialists to ensure compliance with your organization's security requirements.

Have questions? Share them in the comments below!

GitHub Repository: vault-docker-cluster

DEV Community