Step-by-step guide to deploying a private Amazon EKS cluster with zero public API exposure, self-hosted OpenVPN access, kube-prometheus-stack monitoring, and Route 53 private DNS — all automated with Terraform.
The Problem with Public Kubernetes Clusters
Every time I see an EKS cluster with a public API endpoint, I cringe.
Sure, it's convenient. But it means your Kubernetes API server — the brain of your entire cluster — is reachable from anywhere on the internet. One misconfigured IAM policy, one leaked credential, and you have a very bad day.
In this guide, I'll walk you through building a fully private EKS cluster where:
- The Kubernetes API has zero internet exposure
- Access is gated through a self-hosted OpenVPN server
- Prometheus + Grafana monitor everything, exposed via an internal load balancer
-
Route 53 private DNS gives us
grafana.devops.private -
Everything is Terraform — one
terraform applyto rule them all
Let's build this.
Architecture
What's in the box:
| Component | Role |
|---|---|
| VPC (10.0.0.0/16) | Isolated network — 2 public subnets, 2 private subnets across 2 AZs |
| EKS (Private API) | Kubernetes control plane — API endpoint only accessible within VPC |
| OpenVPN EC2 | Self-hosted VPN in public subnet — the single entry point |
| NAT Gateway | Outbound internet for private subnets (pulling container images) |
| kube-prometheus-stack | Prometheus (metrics) + Grafana (dashboards) + Alertmanager |
| Internal Load Balancer | Exposes Grafana inside VPC only |
| Route 53 Private Zone |
grafana.devops.private → Internal LB CNAME |
The traffic flow
Your Laptop
│
│ OpenVPN (UDP:1194)
▼
┌──────────────┐ ┌─────────────────────────────┐
│ OpenVPN EC2 │──────▶ │ EKS Private API (HTTPS:443)│
│ Public │ NAT │ Worker Node 1 │
│ Subnet │Masq. │ Worker Node 2 │
│ │ │ ┌─────────────────────┐ │
│ │───────▶│ │ Grafana (Internal LB)│ │
│ │ │ │ grafana.devops.private│ │
│ │ │ └─────────────────────┘ │
└──────────────┘ └─────────────────────────────┘
Public Subnet Private Subnets
The OpenVPN server uses iptables NAT masquerade to rewrite VPN client IPs (10.8.0.0/24) to its own VPC address. This means all VPC services see traffic from a legitimate VPC IP — not an unknown external range.
Project Structure
eks-private-vpn/
├── main.tf # VPC, EKS, OpenVPN EC2, Security Groups
├── monitoring.tf # kube-prometheus-stack (Prometheus + Grafana)
├── dns.tf # Route 53 private zone + Grafana CNAME
├── providers.tf # AWS, Helm, Kubernetes providers
├── variables.tf # Input variables
├── outputs.tf # Useful outputs
├── openvpn_userdata.sh # OpenVPN server bootstrap script
└── terraform.tfvars # Your configuration values
Step 1 — Provider Configuration
We need four providers. The Helm and Kubernetes providers use exec-based authentication to talk to the private EKS cluster:
# providers.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
tls = { source = "hashicorp/tls", version = "~> 4.0" }
helm = { source = "hashicorp/helm", version = "~> 2.0" }
kubernetes = { source = "hashicorp/kubernetes", version = "~> 2.0" }
}
}
provider "aws" {
region = var.aws_region
}
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name",
module.eks.cluster_name, "--region", var.aws_region]
}
}
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name",
module.eks.cluster_name, "--region", var.aws_region]
}
}
Why exec-based auth? The
aws eks get-tokencommand generates short-lived tokens via IAM. This is more secure than static kubeconfig tokens and works seamlessly with the private endpoint.
Step 2 — VPC & Networking
# main.tf
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.project_name}-vpc"
cidr = var.vpc_cidr # 10.0.0.0/16
azs = var.availability_zones # ["us-east-1a", "us-east-1b"]
private_subnets = var.private_subnet_cidrs # ["10.0.1.0/24", "10.0.2.0/24"]
public_subnets = var.public_subnet_cidrs # ["10.0.101.0/24", "10.0.102.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
enable_dns_support = true
# Required tags for EKS load balancer discovery
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
}
The kubernetes.io/role/internal-elb tag on private subnets is what tells the AWS Load Balancer Controller where to place internal load balancers — this is how our Grafana LB ends up in the right subnet.
Step 3 — Private EKS Cluster
This is where the magic happens. Two settings change everything:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "${var.project_name}-cluster"
cluster_version = var.kubernetes_version # "1.31"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# THE TWO SETTINGS THAT MAKE THIS PRIVATE
cluster_endpoint_public_access = false
cluster_endpoint_private_access = true
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Required in EKS module v20+ — without this, kubectl fails
enable_cluster_creator_admin_permissions = true
eks_managed_node_groups = {
default = {
instance_types = var.node_instance_types # ["t3.medium"]
min_size = var.node_min_size # 1
max_size = var.node_max_size # 3
desired_size = var.node_desired_size # 2
}
}
}
We also need to let the VPN server talk to the EKS API on port 443:
resource "aws_security_group_rule" "vpn_to_eks_api" {
description = "Allow VPN server to access EKS API"
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_id = module.eks.cluster_security_group_id
source_security_group_id = aws_security_group.vpn.id
}
EKS Module v20 Breaking Change: The
enable_cluster_creator_admin_permissionsflag is new in module v20. In older versions, the cluster creator automatically got admin access viaaws-authConfigMap. In v20+, this was replaced with EKS Access Entries — a more secure, IAM-native RBAC mechanism. Without this flag, you'll get the cryptic error:"You must be logged in to the server".
Step 4 — OpenVPN Server
The VPN server lives in the public subnet — it's the bridge between the internet and your private infrastructure:
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_security_group" "vpn" {
name_prefix = "${var.project_name}-vpn-"
vpc_id = module.vpc.vpc_id
description = "OpenVPN server security group"
# OpenVPN — open to the world (authentication via certificates)
ingress {
description = "OpenVPN"
from_port = 1194
to_port = 1194
protocol = "udp"
cidr_blocks = ["0.0.0.0/0"]
}
# SSH — restricted to your IP only
ingress {
description = "SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [var.admin_ingress_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle { create_before_destroy = true }
}
resource "aws_instance" "vpn" {
ami = data.aws_ami.ubuntu.id
instance_type = var.openvpn_instance_type # t3.small
key_name = var.ssh_key_name
subnet_id = module.vpc.public_subnets[0]
vpc_security_group_ids = [aws_security_group.vpn.id]
associate_public_ip_address = true
source_dest_check = false # ← Critical for VPN routing!
root_block_device {
volume_size = 20
volume_type = "gp3"
}
user_data = templatefile("${path.module}/openvpn_userdata.sh", {
vpc_cidr = var.vpc_cidr
vpn_client_cidr = var.vpn_client_cidr
})
tags = { Name = "${var.project_name}-openvpn" }
}
resource "aws_eip" "vpn" {
instance = aws_instance.vpn.id
domain = "vpc"
tags = { Name = "${var.project_name}-vpn-eip" }
}
Why
source_dest_check = false? By default, AWS drops traffic where the EC2 instance isn't the source or destination. Since the VPN server forwards traffic between VPN clients (10.8.0.0/24) and VPC resources (10.0.0.0/16), we must disable this check. Without it, all forwarded packets get silently dropped — your VPN connects but nothing works.
Step 5 — OpenVPN Bootstrap Script
This user data script does everything automatically on first boot: installs OpenVPN, generates a full PKI, configures the server, sets up NAT rules, and generates a ready-to-use .ovpn client profile:
#!/bin/bash
set -euo pipefail
exec > /var/log/openvpn-setup.log 2>&1
export DEBIAN_FRONTEND=noninteractive
apt-get update -y && apt-get upgrade -y
# Pre-seed iptables-persistent to avoid interactive prompts
echo iptables-persistent iptables-persistent/autosave_v4 boolean true | debconf-set-selections
echo iptables-persistent iptables-persistent/autosave_v6 boolean true | debconf-set-selections
apt-get install -y -o Dpkg::Options::='--force-confdef' \
-o Dpkg::Options::='--force-confold' openvpn easy-rsa iptables-persistent
# Enable IP forwarding
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
sysctl -p
# ── PKI setup ──────────────────────────────────────────────────────
EASY_RSA="/etc/openvpn/easy-rsa"
mkdir -p "$EASY_RSA"
cp -r /usr/share/easy-rsa/* "$EASY_RSA/"
cd "$EASY_RSA"
./easyrsa init-pki
EASYRSA_BATCH=1 ./easyrsa build-ca nopass
EASYRSA_BATCH=1 ./easyrsa build-server-full server nopass
EASYRSA_BATCH=1 ./easyrsa build-client-full client1 nopass
./easyrsa gen-dh
openvpn --genkey secret /etc/openvpn/ta.key
cp pki/ca.crt pki/issued/server.crt pki/private/server.key pki/dh.pem /etc/openvpn/
# ── Server config ─────────────────────────────────────────────────
cat > /etc/openvpn/server.conf <<'EOF'
port 1194
proto udp
dev tun
ca /etc/openvpn/ca.crt
cert /etc/openvpn/server.crt
key /etc/openvpn/server.key
dh /etc/openvpn/dh.pem
tls-auth /etc/openvpn/ta.key 0
server ${vpn_client_cidr} 255.255.255.0
topology subnet
push "route ${vpc_cidr} 255.255.0.0"
push "dhcp-option DNS 10.0.0.2"
keepalive 10 120
cipher AES-256-GCM
auth SHA256
user nobody
group nogroup
persist-key
persist-tun
status /var/log/openvpn-status.log
log-append /var/log/openvpn.log
verb 3
EOF
# ── NAT / forwarding rules ────────────────────────────────────────
PRIMARY_IF=$(ip route | grep default | awk '{print $5}')
iptables -t nat -A POSTROUTING -s ${vpn_client_cidr}/24 \
-o "$PRIMARY_IF" -j MASQUERADE
iptables -A FORWARD -i tun0 -o "$PRIMARY_IF" -j ACCEPT
iptables -A FORWARD -i "$PRIMARY_IF" -o tun0 \
-m state --state RELATED,ESTABLISHED -j ACCEPT
netfilter-persistent save
systemctl enable openvpn@server
systemctl start openvpn@server
# ── Generate client .ovpn profile ─────────────────────────────────
PUBLIC_IP=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4 \
|| curl -s http://checkip.amazonaws.com)
mkdir -p /home/ubuntu/client-configs
cat > /home/ubuntu/client-configs/client1.ovpn <<CLIENTCONF
client
dev tun
proto udp
remote $PUBLIC_IP 1194
resolv-retry infinite
nobind
persist-key
persist-tun
remote-cert-tls server
cipher AES-256-GCM
auth SHA256
key-direction 1
verb 3
<ca>
$(cat /etc/openvpn/ca.crt)
</ca>
<cert>
$(openssl x509 -in "$EASY_RSA/pki/issued/client1.crt")
</cert>
<key>
$(cat "$EASY_RSA/pki/private/client1.key")
</key>
<tls-auth>
$(cat /etc/openvpn/ta.key)
</tls-auth>
CLIENTCONF
chown -R ubuntu:ubuntu /home/ubuntu/client-configs
chmod 600 /home/ubuntu/client-configs/client1.ovpn
echo "=== OpenVPN setup complete ==="
Key details to highlight:
-
push "route ${vpc_cidr} 255.255.0.0"— tells VPN clients to route all VPC traffic (10.0.0.0/16) through the tunnel -
push "dhcp-option DNS 10.0.0.2"— pushes the VPC DNS resolver to clients.10.0.0.2is the Amazon-provided DNS (always VPC CIDR base + 2). This is howgrafana.devops.privateresolves! -
MASQUERADE— rewrites VPN client source IPs to the server's VPC IP, so EKS and internal services accept the traffic -
DEBIAN_FRONTEND=noninteractive+debconf-set-selections— preventsiptables-persistentfrom hanging on interactive prompts in user data
Step 6 — Monitoring Stack (Prometheus + Grafana)
We deploy the kube-prometheus-stack Helm chart — the industry-standard monitoring bundle:
# monitoring.tf
resource "kubernetes_namespace" "monitoring" {
metadata {
name = "monitoring"
}
depends_on = [module.eks]
}
resource "helm_release" "kube_prometheus_stack" {
name = "kube-prometheus-stack"
namespace = kubernetes_namespace.monitoring.metadata[0].name
repository = "https://prometheus-community.github.io/helm-charts"
chart = "kube-prometheus-stack"
version = "65.1.0"
# ── Grafana ─────────────────────────────────────────────────────
set {
name = "grafana.enabled"
value = "true"
}
set {
name = "grafana.adminPassword"
value = var.grafana_admin_password
}
set {
name = "grafana.service.type"
value = "LoadBalancer"
}
# Internal LB annotations — type = "string" is critical!
set {
name = "grafana.service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-internal"
value = "true"
type = "string"
}
set {
name = "grafana.service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-scheme"
value = "internal"
type = "string"
}
set {
name = "grafana.service.port"
value = "80"
}
# ── Prometheus ──────────────────────────────────────────────────
set {
name = "prometheus.prometheusSpec.retention"
value = "7d"
}
set {
name = "prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.accessModes[0]"
value = "ReadWriteOnce"
}
set {
name = "prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage"
value = "20Gi"
}
depends_on = [module.eks]
}
Gotcha:
type = "string"is mandatory on annotation set blocks. Without it, Terraform passes"true"as a boolean. Kubernetes annotations aremap[string]string— you'll getjson: cannot unmarshal bool into Go struct field ObjectMeta.metadata.annotations of type string. This one cost me 30 minutes of debugging.
Step 7 — Private DNS (Route 53)
The final piece — a private hosted zone that maps a clean domain to Grafana's internal load balancer:
# dns.tf
resource "aws_route53_zone" "private" {
name = "devops.private"
vpc {
vpc_id = module.vpc.vpc_id
}
tags = { Name = "${var.project_name}-private-zone" }
}
# Read the LB hostname after Helm deploys Grafana
data "kubernetes_service" "grafana" {
metadata {
name = "kube-prometheus-stack-grafana"
namespace = "monitoring"
}
depends_on = [helm_release.kube_prometheus_stack]
}
# CNAME: grafana.devops.private → internal-xxx.elb.amazonaws.com
resource "aws_route53_record" "grafana" {
zone_id = aws_route53_zone.private.zone_id
name = "grafana.devops.private"
type = "CNAME"
ttl = 300
records = [
data.kubernetes_service.grafana.status[0].load_balancer[0].ingress[0].hostname
]
}
The data "kubernetes_service" block reads the hostname that AWS assigns to Grafana's internal load balancer after the Helm chart deploys. This hostname becomes the CNAME target.
Step 8 — Variables & Outputs
# variables.tf
variable "aws_region" { default = "us-east-1" }
variable "project_name" { default = "eks-private" }
variable "vpc_cidr" { default = "10.0.0.0/16" }
variable "availability_zones" { default = ["us-east-1a", "us-east-1b"] }
variable "private_subnet_cidrs" { default = ["10.0.1.0/24", "10.0.2.0/24"] }
variable "public_subnet_cidrs" { default = ["10.0.101.0/24", "10.0.102.0/24"] }
variable "kubernetes_version" { default = "1.31" }
variable "node_instance_types" { default = ["t3.medium"] }
variable "node_desired_size" { default = 2 }
variable "node_min_size" { default = 1 }
variable "node_max_size" { default = 3 }
variable "openvpn_instance_type" { default = "t3.small" }
variable "vpn_client_cidr" { default = "10.8.0.0/24" }
variable "ssh_key_name" {
description = "Name of an existing EC2 key pair"
type = string
}
variable "admin_ingress_cidr" {
description = "Your public IP/32 for SSH access"
type = string
}
variable "grafana_admin_password" {
description = "Admin password for Grafana"
type = string
sensitive = true
default = "admin"
}
# outputs.tf
output "openvpn_public_ip" {
value = aws_eip.vpn.public_ip
}
output "ssh_to_vpn" {
value = "ssh -i <your-key.pem> ubuntu@${aws_eip.vpn.public_ip}"
}
output "configure_kubectl" {
description = "Run after connecting to VPN"
value = "aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}"
}
output "grafana_url" {
description = "Grafana URL (accessible only through VPN)"
value = "http://grafana.devops.private"
}
output "grafana_lb_hostname" {
value = data.kubernetes_service.grafana.status[0].load_balancer[0].ingress[0].hostname
}
Deploying & Connecting
1. Initialize and Apply
# Create terraform.tfvars
cat > terraform.tfvars <<EOF
aws_region = "us-east-1"
project_name = "eks-private"
ssh_key_name = "your-keypair-name"
admin_ingress_cidr = "YOUR_PUBLIC_IP/32"
grafana_admin_password = "YourSecurePassword"
EOF
terraform init
terraform apply
This creates approximately 60 resources — VPC, subnets, NAT gateway, EKS cluster, managed node group, OpenVPN EC2, security groups, Elastic IP, Helm releases, Route 53 zone, and DNS records.
2. Download VPN Profile & Connect
# Get the VPN public IP from outputs
VPN_IP=$(terraform output -raw openvpn_public_ip)
# Download the auto-generated client profile
scp -i your-key.pem ubuntu@${VPN_IP}:/home/ubuntu/client-configs/client1.ovpn .
# Connect to VPN
sudo openvpn --config client1.ovpn
Wait for Initialization Sequence Completed. You now have a tunnel into the VPC.
3. Configure DNS Resolution
Your local DNS resolver doesn't know about Route 53 private zones. Fix that:
# Route DNS through the VPN tunnel to VPC DNS
sudo resolvectl dns tun0 10.0.0.2
sudo resolvectl domain tun0 "~."
4. Access Your Cluster
# Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name eks-private-cluster
# Verify
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-1-xxx.ec2.internal Ready <none> 15m v1.31.x
ip-10-0-2-xxx.ec2.internal Ready <none> 15m v1.31.x
5. Open Grafana
Navigate to http://grafana.devops.private in your browser.
Login: admin / your configured password.
You get pre-built dashboards for:
- Kubernetes cluster health and resource utilization
- Node CPU, memory, disk, and network metrics
- Pod-level resource consumption
- Prometheus self-monitoring
Gotchas I Hit (So You Don't Have To)
1. "You must be logged in to the server"
Cause: EKS module v20+ no longer auto-grants admin access to the cluster creator.
Fix: Add enable_cluster_creator_admin_permissions = true to the EKS module.
2. Helm annotation boolean marshaling error
json: cannot unmarshal bool into Go struct field
ObjectMeta.metadata.annotations of type string
Cause: Terraform passes "true" as a boolean, but K8s annotations must be strings.
Fix: Add type = "string" to annotation set blocks.
3. iptables-persistent hangs during user data
Cause: The package prompts interactively — even in automated scripts.
Fix: Pre-seed with debconf-set-selections and use DEBIAN_FRONTEND=noninteractive.
4. Private DNS doesn't resolve on your machine
Cause: Your local DNS resolver (127.0.0.53) doesn't know about VPC private zones.
Fix: sudo resolvectl dns tun0 10.0.0.2 && sudo resolvectl domain tun0 "~." — routes DNS through VPN to the VPC DNS server.
5. EKS version jumps fail
Cause: AWS only allows upgrading one minor version at a time (e.g., 1.29 → 1.30, not 1.29 → 1.33).
Fix: Increment cluster_version one step at a time, running terraform apply for each.
Security Posture
| Attack Surface | Status |
|---|---|
| Kubernetes API | Not internet-facing — private endpoint only |
| Grafana / Prometheus | Not internet-facing — internal LB only |
| OpenVPN | UDP:1194 open, but certificate-authenticated (PKI + TLS-auth HMAC) |
| SSH to VPN server |
Restricted to admin IP (admin_ingress_cidr) |
| DNS records | Private hosted zone — not resolvable outside VPC |
| IAM / RBAC | EKS Access Entries — IAM-native, auditable |
The only internet-facing resource is the OpenVPN server on UDP:1194, protected by mutual TLS authentication with a pre-shared HMAC key.
Cost Breakdown
| Resource | ~Monthly Cost |
|---|---|
| EKS Control Plane | $73 |
| 2x t3.medium Workers | $60 |
| NAT Gateway + Data | $32+ |
| t3.small OpenVPN | $15 |
| Internal Load Balancer | $16 |
| Elastic IP | $3.65 |
| EBS (20Gi Prometheus + roots) | ~$5 |
| Total | ~$205/mo |
Cost optimization ideas: Spot instances for workers, Graviton (t4g) for ~20% savings, scheduled scaling for dev/staging environments.
Wrapping Up
We built a production-grade, fully private EKS cluster with:
- Zero public Kubernetes API exposure
- Self-hosted OpenVPN with automated PKI and client profile generation
- Prometheus + Grafana monitoring via internal load balancer
-
Route 53 private DNS —
grafana.devops.private - 100% Terraform — reproducible, version-controlled, auditable
This architecture eliminates an entire class of attack vectors by making the Kubernetes API server unreachable from the internet. Combined with certificate-based VPN authentication and private DNS, it provides a secure, practical setup for teams that take infrastructure security seriously.
The complete source code is on GitHub

Top comments (1)
The iptables NAT masquerade for VPN clients is a nice touch — ran into issues without it where VPC security groups couldn't match the 10.8.0.0/24 range. One gotcha: OpenVPN UDP on some corporate networks gets blocked, worth having a TCP fallback.