Many companies moving to the cloud want to continue working with legacy tools to:
- avoid vendor lock-in,
- use the existing skill and process,
- take advantage of the multi-cloud strategy,
- and so on.
Among companies that have used Vault in their on-premises environment, many continue to use it after their migration to the cloud.
Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log. [1]
In this post we will deploy step by step a Vault cluster on Amazon Amazon Elastic Container Kubernetes.
Using terraform we will deploy:
- A highly available architecture that spans three Availability Zones.
- A virtual private cloud (VPC) configured with public and private subnets according to AWS best practices.
- In the public subnets:
- Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.
- In the private subnets:
- A group of Kubernetes nodes.
- An Amazon EKS cluster, which provides the Kubernetes control plane.
To deploy the Vault cluster, we create in AWS:
- An Elastic Load Balancer for the Vault UI.
- An AWS Certificate Manager (ACM) certificate for the Vault UI.
- A boot-vault IAM role to bootstrap the Vault servers.
- A vault-server IAM role for Vault to access AWS Key Management Service (AWS KMS) for auto unseal.
- AWS Secrets Manager to store the Vault on Amazon EKS root secret.
- An AWS KMS key for auto unseal.
In Kubernetes:
- A dedicated node group for Vault on Amazon EKS.
- A dedicated namespace for Vault on Amazon EKS.
- An internal Vault TLS certificate and certificate authority for securing communications.
- For the Vault service:
- Vault server pods.
- A Vault UI.
If you prefer to use AWS Cloudformation instead of Terraform, the equivalent workshop can be found in aws-quickstart
Prerequisites
- Installing and configuring AWS CLI
- Terraform
- Kubectl
- Vault
- Create a public hosted zone in Route 53. See tutorial
- Request a public certificate with AWS Certificate Manager. See tutorial
Network
In this section, we create a VPC, 3 private and public subnets, 3 NAT Gateways and an internet gateway.
plan/vpc.tf
resource "aws_vpc" "security" {
cidr_block = var.vpc_cidr_block
instance_tenancy = "default"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Environment = "core"
Name = "security"
}
lifecycle {
ignore_changes = [tags]
}
}
resource "aws_default_security_group" "defaul" {
vpc_id = aws_vpc.security.id
}
plan/subnet.tf
resource "aws_subnet" "private" {
for_each = {
for subnet in local.private_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
cidr_block = each.value.cidr_block
availability_zone = var.az[index(local.private_nested_config, each.value)]
map_public_ip_on_launch = false
tags = {
Environment = "security"
Name = each.value.name
"kubernetes.io/role/internal-elb" = 1
}
lifecycle {
ignore_changes = [tags]
}
}
resource "aws_subnet" "public" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
cidr_block = each.value.cidr_block
availability_zone = var.az[index(local.public_nested_config, each.value)]
map_public_ip_on_launch = true
tags = {
Environment = "security"
Name = each.value.name
"kubernetes.io/role/elb" = 1
}
lifecycle {
ignore_changes = [tags]
}
}
plan/igw.tf
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.security.id
tags = {
Environment = "core"
Name = "igw-security"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.security.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
tags = {
Environment = "core"
Name = "rt-public-security"
}
}
resource "aws_route_table_association" "public" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
subnet_id = aws_subnet.public[each.value.name].id
route_table_id = aws_route_table.public.id
}
plan/nat.tf
resource "aws_eip" "nat" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc = true
tags = {
Environment = "core"
Name = "eip-${each.value.name}"
}
}
resource "aws_nat_gateway" "nat-gw" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
allocation_id = aws_eip.nat[each.value.name].id
subnet_id = aws_subnet.public[each.value.name].id
tags = {
Environment = "core"
Name = "nat-${each.value.name}"
}
}
resource "aws_route_table" "private" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.nat-gw[each.value.name].id
}
tags = {
Environment = "core"
Name = "rt-${each.value.name}"
}
}
resource "aws_route_table_association" "private" {
for_each = {
for subnet in local.private_nested_config : "${subnet.name}" => subnet
}
subnet_id = aws_subnet.private[each.value.name].id
route_table_id = aws_route_table.private[each.value.associated_public_subnet].id
}
Amazon EKS
In this section we create our Kubernetes cluster with the following settings:
- restrict access to a specific IP (it could be your office range IPs) and to the NAT gateways IPs (if you want to access the vault from a CI / CD tool hosted in this VPC)
- enable all logs
- enable IAM roles for service accounts
- security groups for the cluster
plan/eks-cluster.tf
resource "aws_eks_cluster" "security" {
name = var.eks_cluster_name
role_arn = aws_iam_role.eks.arn
version = "1.17"
vpc_config {
security_group_ids = [aws_security_group.eks_cluster.id]
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = concat([var.authorized_source_ranges], [for n in aws_eip.nat : "${n.public_ip}/32"])
subnet_ids = concat([for s in aws_subnet.private : s.id], [for s in aws_subnet.public : s.id])
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
depends_on = [
aws_iam_role_policy_attachment.eks-AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.eks-AmazonEKSVPCResourceController,
aws_iam_role_policy_attachment.eks-AmazonEKSServicePolicy
]
tags = {
Environment = "core"
}
}
resource "aws_iam_role" "eks" {
name = var.eks_cluster_name
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
data "tls_certificate" "cert" {
url = aws_eks_cluster.security.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "openid" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.cert.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.security.identity[0].oidc[0].issuer
}
resource "aws_iam_role_policy_attachment" "eks-AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks.name
}
resource "aws_iam_role_policy_attachment" "eks-AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = aws_iam_role.eks.name
}
resource "aws_security_group" "eks_cluster" {
name = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
description = "Communication between the control plane and worker nodegroups"
vpc_id = aws_vpc.security.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
}
}
resource "aws_security_group_rule" "cluster_inbound" {
description = "Allow unmanaged nodes to communicate with control plane (all ports)"
from_port = 0
protocol = "-1"
security_group_id = aws_eks_cluster.security.vpc_config[0].cluster_security_group_id
source_security_group_id = aws_security_group.eks_nodes.id
to_port = 0
type = "ingress"
}
Here we create two nodegroups, one private and one public.
plan/eks-nodegroup.tf
resource "aws_eks_node_group" "private" {
cluster_name = aws_eks_cluster.security.name
node_group_name = "private-node-group-security"
node_role_arn = aws_iam_role.node-group.arn
subnet_ids = [for s in aws_subnet.private : s.id]
labels = {
"type" = "private"
}
instance_types = ["t3.small"]
scaling_config {
desired_size = 3
max_size = 5
min_size = 3
}
depends_on = [
aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly
]
tags = {
Environment = "core"
}
}
resource "aws_eks_node_group" "public" {
cluster_name = aws_eks_cluster.security.name
node_group_name = "public-node-group-security"
node_role_arn = aws_iam_role.node-group.arn
subnet_ids = [for s in aws_subnet.public : s.id]
labels = {
"type" = "public"
}
instance_types = ["t3.small"]
scaling_config {
desired_size = 1
max_size = 3
min_size = 1
}
depends_on = [
aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly,
]
tags = {
Environment = "core"
}
}
resource "aws_iam_role" "node-group" {
name = "eks-node-group-role-security"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy" "node-group-ClusterAutoscalerPolicy" {
name = "eks-cluster-auto-scaler"
role = aws_iam_role.node-group.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
]
Effect = "Allow"
Resource = "*"
},
]
})
}
resource "aws_security_group" "eks_nodes" {
name = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
description = "Communication between all nodes in the cluster"
vpc_id = aws_vpc.security.id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
ingress {
from_port = 0
to_port = 0
protocol = "-1"
security_groups = [aws_eks_cluster.security.vpc_config[0].cluster_security_group_id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
Environment = "core"
}
}
Vault
In this section, we create the AWS resources needed to allow Vault Cluster to access Secret Manager, CloudWatch logs, and KMS keys. We also create a RecordSet on Route53 to access vault-ui
. We upload the necessary scripts to the S3 bucket.
plan/vault.tf
resource "aws_iam_role" "vault-unseal" {
name = "vault-unseal"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": aws_iam_openid_connect_provider.openid.arn
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:vault"
}
}
}
]
})
tags = {
Environment = "core"
}
}
resource "aws_iam_role_policy" "vault-unseal" {
name = "vault-unseal"
role = aws_iam_role.vault-unseal.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"iam:GetRole",
]
Effect = "Allow"
Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault-unseal"
},
{
Action = [
"kms:*",
]
Effect = "Allow"
Resource = "*"
}
]
})
}
resource "aws_iam_role" "vault" {
name = "vault"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": aws_iam_openid_connect_provider.openid.arn
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:boot-vault"
}
}
}
]
})
tags = {
Environment = "core"
}
}
resource "aws_iam_role_policy" "vault" {
name = "vault"
role = aws_iam_role.vault.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogStream",
"logs:DescribeLogStreams"
]
Effect = "Allow"
Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs"
},
{
Action = [
"logs:PutLogEvents",
]
Effect = "Allow"
Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs:log-stream:*"
},
{
Action = [
"ec2:DescribeInstances",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"s3:*",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"secretsmanager:UpdateSecretVersionStage",
"secretsmanager:UpdateSecret",
"secretsmanager:PutSecretValue",
"secretsmanager:GetSecretValue"
]
Effect = "Allow"
Resource = aws_secretsmanager_secret.vault-secret.arn
},
{
Action = [
"iam:GetRole"
]
Effect = "Allow"
Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault"
}
]
})
}
resource "aws_kms_key" "vault-kms" {
description = "Vault Seal/Unseal key"
deletion_window_in_days = 7
policy = <<EOT
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable IAM User Permissions",
"Action": [
"kms:*"
],
"Principal": {
"AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
},
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "Allow administration of the key",
"Action": [
"kms:Create*",
"kms:Describe*",
"kms:Enable*",
"kms:List*",
"kms:Put*",
"kms:Update*",
"kms:Revoke*",
"kms:Disable*",
"kms:Get*",
"kms:Delete*",
"kms:ScheduleKeyDeletion",
"kms:CancelKeyDeletion"
],
"Effect": "Allow",
"Resource": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
"Principal": {
"AWS": [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
]
}
},
{
"Sid": "Allow use of the key",
"Action": [
"kms:DescribeKey",
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey",
"kms:GenerateDataKeyWithoutPlaintext"
],
"Principal": {
"AWS": [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
]
},
"Effect": "Allow",
"Resource": "*"
}
]
}
EOT
}
resource "random_string" "vault-secret-suffix" {
length = 5
special = false
upper = false
}
resource "aws_secretsmanager_secret" "vault-secret" {
name = "vault-secret-${random_string.vault-secret-suffix.result}"
kms_key_id = aws_kms_key.vault-kms.key_id
description = "Vault Root/Recovery key"
}
resource "aws_route53_record" "vault" {
zone_id = data.aws_route53_zone.public.zone_id
name = "vault.${var.public_dns_name}"
type = "CNAME"
ttl = "300"
records = [data.kubernetes_service.vault-ui.status.0.load_balancer.0.ingress.0.hostname]
depends_on = [
kubernetes_job.vault-initialization,
helm_release.vault,
data.kubernetes_service.vault-ui
]
}
resource "aws_s3_bucket" "vault-scripts" {
bucket = "bucket-${data.aws_caller_identity.current.account_id}-${var.region}-vault-scripts"
acl = "private"
tags = {
Name = "Vault Scripts"
Environment = "core"
}
}
resource "aws_s3_bucket_object" "vault-script-bootstrap" {
bucket = aws_s3_bucket.vault-scripts.id
key = "scripts/bootstrap.sh"
source = "scripts/bootstrap.sh"
etag = filemd5("scripts/bootstrap.sh")
}
resource "aws_s3_bucket_object" "vault-script-certificates" {
bucket = aws_s3_bucket.vault-scripts.id
key = "scripts/certificates.sh"
source = "scripts/certificates.sh"
etag = filemd5("scripts/certificates.sh")
}
Here we create our Kubernetes resources to initialize and deploy the Vault cluster.
plan/k8s.tf
resource "kubernetes_namespace" "vault-server" {
metadata {
name = "vault-server"
}
}
data "template_file" "vault-values" {
template = <<EOF
global:
tlsDisable: false
ui:
enabled: true
externalPort: 443
serviceType: "LoadBalancer"
loadBalancerSourceRanges:
- ${var.authorized_source_ranges}
- ${aws_eip.nat["public-security-1"].public_ip}/32
- ${aws_eip.nat["public-security-2"].public_ip}/32
- ${aws_eip.nat["public-security-3"].public_ip}/32
annotations: |
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${var.acm_vault_arn}
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: https
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443,8200"
service.beta.kubernetes.io/do-loadbalancer-healthcheck-path: "/ui/"
service.beta.kubernetes.io/aws-load-balancer-internal: "false"
external-dns.alpha.kubernetes.io/hostname: "vault.${var.public_dns_name}"
external-dns.alpha.kubernetes.io/ttl: "30"
server:
nodeSelector: |
eks.amazonaws.com/nodegroup: private-node-group-security
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
extraVolumes:
- type: secret
name: vault-server-tls
image:
repository: "vault"
tag: "1.6.0"
logLevel: "debug"
serviceAccount:
annotations: |
eks.amazonaws.com/role-arn: "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
extraEnvironmentVars:
AWS_ROLE_SESSION_NAME: some_name
ha:
enabled: true
nodes: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 0
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
seal "awskms" {
region = "${var.region}"
kms_key_id = "${aws_kms_key.vault-kms.key_id}"
}
EOF
}
resource "helm_release" "vault" {
name = "vault"
chart = "hashicorp/vault"
values = [data.template_file.vault-values.rendered]
namespace = "vault-server"
depends_on = [kubernetes_job.vault-certificate]
}
resource "kubernetes_cluster_role" "boot-vault" {
metadata {
name = "boot-vault"
}
rule {
api_groups = [""]
resources = ["pods/exec", "pods", "pods/log", "secrets", "tmp/secrets"]
verbs = ["get", "list", "create"]
}
rule {
api_groups = ["certificates.k8s.io"]
resources = ["certificatesigningrequests", "certificatesigningrequests/approval"]
verbs = ["get", "list", "create", "update"]
}
}
resource "kubernetes_service_account" "boot-vault" {
metadata {
name = "boot-vault"
namespace = "vault-server"
labels = {
"app.kubernetes.io/name" = "boot-vault"
}
annotations = {
"eks.amazonaws.com/role-arn" = aws_iam_role.vault.arn
}
}
}
resource "kubernetes_job" "vault-initialization" {
metadata {
name = "boot-vault"
namespace = "vault-server"
}
spec {
template {
metadata {}
spec {
container {
name = "boot-vault"
image = "amazonlinux"
command = ["/bin/bash","-c"]
args = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
env {
name = "S3_SCRIPT_URL"
value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/bootstrap.sh"
}
env {
name = "VAULT_SECRET"
value = aws_secretsmanager_secret.vault-secret.arn
}
}
service_account_name = "boot-vault"
restart_policy = "Never"
}
}
backoff_limit = 0
}
depends_on = [
kubernetes_job.vault-certificate,
helm_release.vault,
aws_s3_bucket_object.vault-script-bootstrap
]
}
resource "kubernetes_job" "vault-certificate" {
metadata {
name = "certificate-vault"
namespace = "vault-server"
}
spec {
template {
metadata {}
spec {
container {
name = "certificate-vault"
image = "amazonlinux"
command = ["/bin/bash","-c"]
args = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; export NAMESPACE='vault-server'; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
env {
name = "S3_SCRIPT_URL"
value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/certificates.sh"
}
}
service_account_name = "boot-vault"
restart_policy = "Never"
}
}
backoff_limit = 0
}
depends_on = [
aws_eks_node_group.private,
aws_s3_bucket_object.vault-script-certificates
]
}
resource "kubernetes_cluster_role_binding" "boot-vault" {
metadata {
name = "boot-vault"
labels = {
"app.kubernetes.io/name": "boot-vault"
}
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "boot-vault"
}
subject {
kind = "ServiceAccount"
name = "boot-vault"
namespace = "vault-server"
}
}
data "kubernetes_service" "vault-ui" {
metadata {
name = "vault-ui"
namespace = "vault-server"
}
depends_on = [
kubernetes_job.vault-initialization,
helm_release.vault
]
}
The following script is used to create the vault-server-tls
certificate.
plan/scripts/certificates.sh
#!/bin/bash -e
# SERVICE is the name of the Vault service in Kubernetes.
# It does not have to match the actual running service, though it may help for consistency.
SERVICE=vault
SECRET_NAME=vault-server-tls
# TMPDIR is a temporary working directory.
TMPDIR=/tmp
# Sleep timer
SLEEP_TIME=15
# Name of the CSR
echo "Name the CSR: vault-csr"
export CSR_NAME=vault-csr
# Install OpenSSL
echo "Install openssl"
yum install -y openssl 2>&1
# Install Kubernetes cli
echo "Install Kubernetes cli"
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client
# Create a private key
echo "Generate certificate Private key"
openssl genrsa -out ${TMPDIR}/vault.key 2048
# Create CSR
echo "Create CSR file"
cat <<EOF >${TMPDIR}/csr.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = ${SERVICE}
DNS.2 = ${SERVICE}.${NAMESPACE}
DNS.3 = ${SERVICE}.${NAMESPACE}.svc
DNS.4 = ${SERVICE}.${NAMESPACE}.svc.cluster.local
DNS.5 = vault-0.vault-internal
DNS.6 = vault-1.vault-internal
DNS.7 = vault-2.vault-internal
IP.1 = 127.0.0.1
EOF
# Sign the CSR
echo "Sign the CSR"
openssl req -new -key ${TMPDIR}/vault.key -subj "/CN=${SERVICE}.${NAMESPACE}.svc" -out ${TMPDIR}/server.csr -config ${TMPDIR}/csr.conf
echo "Create a CSR Manifest file"
cat <<EOF >${TMPDIR}/csr.yaml
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
name: ${CSR_NAME}
spec:
groups:
- system:authenticated
request: $(cat ${TMPDIR}/server.csr | base64 | tr -d '\n')
usages:
- digital signature
- key encipherment
- server auth
EOF
echo "Create CSR from manifest file"
kubectl create -f ${TMPDIR}/csr.yaml
sleep ${SLEEP_TIME}
echo "Fetch the CSR from kubernetes"
kubectl get csr ${CSR_NAME}
# Approve Cert
echo "Approve the Certificate"
kubectl certificate approve ${CSR_NAME}
serverCert=$(kubectl get csr ${CSR_NAME} -n kubecf -o jsonpath='{.status.certificate}')
echo "${serverCert}" | openssl base64 -d -A -out ${TMPDIR}/vault.crt
echo "Fetch Kubernetes CA Certificate"
kubectl get secret -o jsonpath="{.items[?(@.type==\"kubernetes.io/service-account-token\")].data['ca\.crt']}" | base64 --decode > ${TMPDIR}/vault.ca 2>/dev/null || true
echo "Create secret containing the TLS Certificates and key"
echo kubectl create secret generic ${SECRET_NAME} \
--namespace ${NAMESPACE} \
--from-file=vault.key=${TMPDIR}/vault.key \
--from-file=vault.crt=${TMPDIR}/vault.crt \
--from-file=vault.ca=${TMPDIR}/vault.ca
kubectl create secret generic ${SECRET_NAME} \
--namespace ${NAMESPACE} \
--from-file=vault.key=${TMPDIR}/vault.key \
--from-file=vault.crt=${TMPDIR}/vault.crt \
--from-file=vault.ca=${TMPDIR}/vault.ca
The following script is used to initialize vault
plan/scripts/bootstrap.sh
#!/bin/bash
VAULT_NUMBER_OF_KEYS_FOR_UNSEAL=3
VAULT_NUMBER_OF_KEYS=5
SLEEP_SECONDS=15
PROTOCOL=https
VAULT_PORT=8200
VAULT_0=vault-0.vault-internal
get_secret () {
local value=$(aws secretsmanager --region ${AWS_REGION} get-secret-value --secret-id "$1" | jq --raw-output .SecretString)
echo $value
}
# Install JQ as we use it later on
yum install -y jq 2>&1 >/dev/null
# Give the Helm chart a chance to get started
echo "Sleeping for ${SLEEP_SECONDS} seconds"
sleep ${SLEEP_SECONDS} # Allow helm chart some time
# Install Kubernetes cli
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client
until curl -k -fs -o /dev/null ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init; do
echo "Waiting for Vault to start..."
sleep 1
done
# See if vault is initialized
init=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init | jq -r .initialised)
echo "Is vault initialized: '${init}'"
if [ "$init" != "false" ]; then
echo "Initializing Vault"
SECRET_VALUE=$(kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator init -recovery-shares=${VAULT_NUMBER_OF_KEYS} -recovery-threshold=${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}")
echo "storing vault init values in secrets manager"
aws secretsmanager put-secret-value --region ${AWS_REGION} --secret-id ${VAULT_SECRET} --secret-string "${SECRET_VALUE}"
else
echo "Vault is already initialized"
fi
sealed=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/seal-status | jq -r .sealed)
# Should Auto unseal using KMS but this is for demonstration for manual unseal
if [ "$sealed" == "true" ]; then
VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)
for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
do
unseal_key+=($(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Recovery Key '${UNSEAL_KEY_INDEX}': (.*)/,m)) print m[1] }'| cut -d " " -f 1))
done
echo "Unsealing Vault"
# Handle variable number of unseal keys
for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
do
kubectl exec vault-0 -- vault operator unseal $unseal_key[${UNSEAL_KEY_INDEX}]
done
else
echo "Vault is already unsealed"
fi
VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)
# Show who we have joined
kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault login token=$root_token 2>&1 > /dev/null" # Hide this output from the console
# Join other pods to the raft cluster
kubectl exec -t vault-1 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"
kubectl exec -t vault-2 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"
# Show who we have joined
kubectl exec -t vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator raft list-peers"
Deployment
We've finished creating our terraform files, let's get ready for deployment!
plan/main.tf
data "aws_caller_identity" "current" {}
data "aws_route53_zone" "public" {
name = "${var.public_dns_name}."
}
plan/output.tf
output "eks-endpoint" {
value = aws_eks_cluster.security.endpoint
}
output "kubeconfig-certificate-authority-data" {
value = aws_eks_cluster.security.certificate_authority[0].data
}
output "eks_issuer_url" {
value = aws_iam_openid_connect_provider.openid.url
}
output "vault_secret_name" {
value = "vault-secret-${random_string.vault-secret-suffix.result}"
}
output "nat1_ip" {
value = aws_eip.nat["public-security-1"].public_ip
}
output "nat2_ip" {
value = aws_eip.nat["public-security-2"].public_ip
}
output "nat3_ip" {
value = aws_eip.nat["public-security-3"].public_ip
}
plan/variables.tf
variable "region" {
type = string
}
variable "az" {
type = list(string)
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "vpc_cidr_block" {
type = string
}
variable "eks_cluster_name" {
type = string
default = "security"
}
variable "acm_vault_arn" {
type = string
}
variable "private_network_config" {
type = map(object({
cidr_block = string
associated_public_subnet = string
}))
default = {
"private-security-1" = {
cidr_block = "10.0.0.0/23"
associated_public_subnet = "public-security-1"
},
"private-security-2" = {
cidr_block = "10.0.2.0/23"
associated_public_subnet = "public-security-2"
},
"private-security-3" = {
cidr_block = "10.0.4.0/23"
associated_public_subnet = "public-security-3"
}
}
}
locals {
private_nested_config = flatten([
for name, config in var.private_network_config : [
{
name = name
cidr_block = config.cidr_block
associated_public_subnet = config.associated_public_subnet
}
]
])
}
variable "public_network_config" {
type = map(object({
cidr_block = string
}))
default = {
"public-security-1" = {
cidr_block = "10.0.8.0/23"
},
"public-security-2" = {
cidr_block = "10.0.10.0/23"
},
"public-security-3" = {
cidr_block = "10.0.12.0/23"
}
}
}
locals {
public_nested_config = flatten([
for name, config in var.public_network_config : [
{
name = name
cidr_block = config.cidr_block
}
]
])
}
variable "public_dns_name" {
type = string
}
variable "authorized_source_ranges" {
type = string
description = "Addresses or CIDR blocks which are allowed to connect to the Vault IP address. The default behavior is to allow anyone (0.0.0.0/0) access. You should restrict access to external IPs that need to access the Vault cluster."
default = "0.0.0.0/0"
}
plan/backend.tf
terraform {
backend "s3" {
}
}
plan/versions.tf
terraform {
required_version = ">= 0.12"
}
plan/provider.tf
provider "aws" {
region = var.region
}
provider "kubernetes" {
host = aws_eks_cluster.security.endpoint
cluster_ca_certificate = base64decode(
aws_eks_cluster.security.certificate_authority[0].data
)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
command = "aws"
}
}
provider "helm" {
kubernetes {
host = aws_eks_cluster.security.endpoint
cluster_ca_certificate = base64decode(
aws_eks_cluster.security.certificate_authority[0].data
)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
command = "aws"
}
}
}
plan/terraform.tfvars
az = ["<AWS_REGION>a", "<AWS_REGION>b", "<AWS_REGION>c"]
region = "<AWS_REGION>"
acm_vault_arn = "<ACM_VAULT_ARN>"
vpc_cidr_block = "10.0.0.0/16"
public_dns_name = "<PUBLIC_DNS_NAME>"
authorized_source_ranges = "<LOCAL_IP_RANGES>"
Initialize AWS security infrastructure. The states will be saved in AWS.
terraform init \
-backend-config="bucket=$TERRAFORM_BUCKET_NAME" \
-backend-config="key=security/terraform-state" \
-backend-config="region=$AWS_REGION"
Complete plan/terraform.tfvars
and run
sed -i "s/<LOCAL_IP_RANGES>/$(curl -s http://checkip.amazonaws.com/)\/32/g; s/<PUBLIC_DNS_NAME>/${PUBLIC_DNS_NAME}/g; s/<AWS_ACCOUNT_ID>/${AWS_ACCOUNT_ID}/g; s/<AWS_REGION>/${AWS_REGION}/g; s/<EKS_CLUSTER_NAME>/${EKS_CLUSTER_NAME}/g; s,<ACM_VAULT_ARN>,${ACM_VAULT_ARN},g;" terraform.tfvars
terraform apply
Access the EKS Cluster using
aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
kubectl config set-context --current --namespace=vault-server
Set Vault's address, and the initial root token.
cd plan
export VAULT_ADDR="https://vault.${PUBLIC_DNS_NAME}"
export VAULT_TOKEN="$(aws secretsmanager get-secret-value --secret-id $(terraform output vault_secret_name) --version-stage AWSCURRENT --query SecretString --output text | grep "Initial Root Token: " | awk -F ': ' '{print $2}')"
Check all pods are running
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
boot-vault 1/1 54s 28m
certificate-vault 1/1 55s 39m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
boot-vault-4j76p 0/1 Completed 0 6m17s
certificate-vault-znwfb 0/1 Completed 0 17m
vault-0 1/1 Running 0 6m42s
vault-1 1/1 Running 0 6m42s
vault-2 1/1 Running 0 6m41s
vault-agent-injector-7d65f7875f-k8zgv 1/1 Running 0 6m42s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vault ClusterIP 172.20.116.147 <none> 8200/TCP,8201/TCP 7m39s
vault-active ClusterIP 172.20.213.40 <none> 8200/TCP,8201/TCP 7m39s
vault-agent-injector-svc ClusterIP 172.20.182.101 <none> 443/TCP 7m39s
vault-internal ClusterIP None <none> 8200/TCP,8201/TCP 7m39s
vault-standby ClusterIP 172.20.167.47 <none> 8200/TCP,8201/TCP 7m39s
vault-ui LoadBalancer 172.20.22.192 a7442caffb7f74b1ea2eb40bd5f432ef-694516578.eu-west-1.elb.amazonaws.com 443:32363/TCP 7m39s
$ kubectl get secrets
kubectl get secrets
NAME TYPE DATA AGE
boot-vault-token-nq8qm kubernetes.io/service-account-token 3 45m
default-token-6qjw8 kubernetes.io/service-account-token 3 45m
sh.helm.release.v1.vault.v1 helm.sh/release.v1 1 27m
vault-agent-injector-token-p6ktz kubernetes.io/service-account-token 3 27m
vault-server-tls Opaque 3 36m
vault-token-p9gqj kubernetes.io/service-account-token 3 27m
$ kubectl get sa
NAME SECRETS AGE
boot-vault 1 47m
default 1 47m
vault 1 29m
vault-agent-injector 1 29m
$ kubectl get role
NAME AGE
vault-discovery-role 30m
$ kubectl get rolebinding
NAME AGE
vault-discovery-rolebinding 30m
$ kubectl get certificatesigningrequests
NAME AGE REQUESTOR CONDITION
csr-5vqrf 43m system:node:ip-10-0-0-59.eu-west-1.compute.internal Approved,Issued
csr-6klsj 43m system:node:ip-10-0-5-29.eu-west-1.compute.internal Approved,Issued
csr-chh42 43m system:node:ip-10-0-10-214.eu-west-1.compute.internal Approved,Issued
csr-pm5jd 43m system:node:ip-10-0-2-39.eu-west-1.compute.internal Approved,Issued
vault-csr 37m system:serviceaccount:vault-server:boot-vault Approved,Issued
Let's create credentials:
ACCESS_KEY=ACCESS_KEY
SECRET_KEY=SECRET_KEY
PROJECT_NAME=web
$ vault secrets enable -path=company/projects/${PROJECT_NAME} -version=2 kv
Success! Enabled the kv secrets engine at: company/projects/web/
$ vault kv put company/projects/${PROJECT_NAME}/credentials/access key="$ACCESS_KEY"
Key Value
--- -----
created_time 2021-04-15T12:43:48.024422363Z
deletion_time n/a
destroyed false
version 1
$ vault kv put company/projects/${PROJECT_NAME}/credentials/secret key="$SECRET_KEY"
Key Value
--- -----
created_time 2021-04-15T12:44:01.270353488Z
deletion_time n/a
destroyed false
version 1
Create the policy named my-policy
with the contents from stdin
$ vault policy write my-policy - <<EOF
# Read-only permissions
path "company/projects/${PROJECT_NAME}/*" {
capabilities = [ "read" ]
}
EOF
Success! Uploaded policy: my-policy
Create a token and add the my-policy
policy
VAULT_TOKEN=$(vault token create -policy=my-policy | grep "token" | awk 'NR==1{print $2}')
Now we can retrieve our credentials
$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/access
ACCESS_KEY
$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/secret
SECRET_KEY
That's it!
The source code is available on Gitlab.
Conclusion
We discovered in this article how to create a highly available Vault cluster and deploy it to Amazon EKS.
Hope you enjoyed reading this blog post.
If you have any questions or feedback, please feel free to leave a comment.
Thanks for reading!
Top comments (13)
Hello, nice terraform template.
I've got an issue with the
job: vault-server/certificate-vault
which is in failed state.I've provided all the requirements but i'm not sure about the type of certificate requested.
First I was using an
vault.subdomain.domain.com
without success.After I've tried with a wildcard certfiticate
*.subdomain.domain.com
Both certificate were issued without error.
Do you have an idea ?
you can delete the vault resources and run terraform apply again
kubectl delete secretΒ vault-server-tls -n vault-server
kubectl delete CertificateSigningRequest vault-csr -n vault-server
terraform destroy -target=kubernetes_job.vault-certificate
Deploying on AWS EKS 1.21 ,
I got stuck on CSR approval , as I got the approval but not getting singined . Suspecting missing clusterrole binding for the user boot-vault . Need some help
hello
thanks for your contribution
yes, I tested with the 1.17 version. If I remember I got the same issue with the 1.18 version. As I see with your comment, it's still not working with the new versions.
If you resolve the CSR issue, do not hesitate to share :-)
There are few steps to get over it.
Hope I did not skip anothing. :-)
Hello
I followed your steps in detailed in the README for the code and ran into an issue with kubernetes version (1.17), it failed on that. After reading through AWS EKS documentation and saw end of support for that version i moved to the next version (1.18), deployment was running fine but later failed with an error "Unauthorized".
A further research showed it was a terraform provider (2.8.0) issue.
My question is, how do i get around the reported issue or have you been able to deploy the code as-is lately.
Thanks
BTW: Loved the article, very detailed .... keep up the good work sharing your knowledge!
Thank you for your feedback
I forgot to fix the terraform provider version in the version.tf. You should fix the terraform provider version to be sure your terraform resources will continue to work with the AWS APIs.
So what version of terraform do you advice i stick to for this?
As the article has been written last year (at least 3.37.0), you should fix the terraform version with this one
Thanks for the quick response, i will do as advised.
For people that get stuck at CSR step in "kubernetes_job.vault-certificate" after using eks/kubernetes >= v1.22, follow this step docs.aws.amazon.com/eks/latest/use... to have the csr signed.
This is a great tool for amazon. thanks for sharing. I am currently looking into this platform to build my business. I heard a lot of people are successful here so i am looking for a better strategy to be successful. I have already found myrealprofit.com/blog/amazon-fba-f... and it was a huge furore for me because here I found everything I needed to sell properly. I think it's an indispensable thing.