Secure Access to Private EKS Clusters Without Bastion Hosts Using SSM

#aws #eks #devops

Accessing Private EKS Clusters Without Losing Your Mind

Locking down your Kubernetes control plane is a basic requirement for any production environment. Exposing the EKS API server to the public internet is just asking for automated scanners to ruin your weekend. However, securing the endpoint creates an operational headache: how do you actually run kubectl when the API is sealed inside a private subnet?

The traditional answer was a bastion host. But managing SSH keys, rotating credentials, and maintaining yet another publicly exposed EC2 instance is tedious. We all know that a "temporary" bastion host spun up on a Friday afternoon will inevitably become a load-bearing production pillar by Monday.

Instead, we can use AWS Systems Manager (SSM) Session Manager. By leveraging the SSM agent already running on your EKS worker nodes, we can securely tunnel our local traffic directly to the private API endpoint without opening inbound ports or managing SSH keys.

The Mechanics of the SSM Tunnel

The flow is straightforward:

Your local machine initiates an SSM port forwarding session targeting a specific EKS worker node.
The SSM session is instructed to forward traffic to a remote host (the private EKS API endpoint URL) on port 443.
You update your kubeconfig to point to localhost on your chosen forwarded port.

Because the worker node is already in the VPC and authorized to talk to the EKS control plane, it acts as a highly secure, identity-aware proxy. Access is governed entirely by IAM, meaning you can audit every connection via CloudTrail.

Prerequisite: IAM Configuration

For this to work, your EKS worker nodes must have the SSM agent installed (the official EKS optimized AMIs have this by default) and the correct IAM permissions.

Here is a Terraform snippet demonstrating how to attach the necessary SSM policy to your existing EKS node IAM role.

# Assumes you already have an aws_iam_role defined for your worker nodes
# named 'eks_node_role'

resource "aws_iam_role_policy_attachment" "ssm_managed_instance_core" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.eks_node_role.name
}

# Optional but recommended: Restrict who can start sessions in IAM
resource "aws_iam_policy" "ssm_user_access" {
  name        = "EKS-SSM-Tunnel-Access"
  description = "Allows users to port forward to EKS nodes"
  policy      = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = "ssm:StartSession"
        Resource = [
          "arn:aws:ec2:*:*:instance/*",
          "arn:aws:ssm:*:*:document/AWS-StartPortForwardingSessionToRemoteHost"
        ]
        # In a real environment, restrict the instance resource via tags
        Condition = {
          StringEquals = {
            "ssm:resourceTag/eks:cluster-name" = "my-production-cluster"
          }
        }
      }
    ]
  })
}

This ensures your nodes can communicate with the SSM service and restricts which IAM users can actually initiate the tunnel.

Establishing the Tunnel

Once the nodes are registered in SSM, you need a script to extract a valid instance ID, locate the cluster API endpoint, and start the tunnel.

Here is a Bash script you can execute locally to handle the heavy lifting. It requires the AWS CLI and the Session Manager plugin to be installed on your workstation.

#!/bin/bash
set -euo pipefail

CLUSTER_NAME="my-production-cluster"
REGION="us-east-1"
LOCAL_PORT="8443"

# Fetch the private endpoint of the EKS cluster
echo "Fetching EKS endpoint for ${CLUSTER_NAME}..."
EKS_ENDPOINT=$(aws eks describe-cluster \
  --name "${CLUSTER_NAME}" \
  --region "${REGION}" \
  --query "cluster.endpoint" \
  --output text | sed 's/https:\/\///')

# Find an active worker node instance ID using tags
echo "Finding an active worker node..."
INSTANCE_ID=$(aws ec2 describe-instances \
  --region "${REGION}" \
  --filters "Name=tag:eks:cluster-name,Values=${CLUSTER_NAME}" "Name=instance-state-name,Values=running" \
  --query "Reservations[0].Instances[0].InstanceId" \
  --output text)

if [ "$INSTANCE_ID" == "None" ]; then
  echo "Error: No running worker nodes found."
  exit 1
fi

echo "Establishing SSM tunnel through ${INSTANCE_ID} to ${EKS_ENDPOINT}..."
echo "Leave this terminal open. Access EKS via https://localhost:${LOCAL_PORT}"

# Start the port forwarding session
aws ssm start-session \
  --region "${REGION}" \
  --target "${INSTANCE_ID}" \
  --document-name AWS-StartPortForwardingSessionToRemoteHost \
  --parameters "{\"host\":[\"${EKS_ENDPOINT}\"],\"portNumber\":[\"443\"],\"localPortNumber\":[\"${LOCAL_PORT}\"]}"

Run this script, and it will bind localhost:8443 to the private API endpoint.

Updating Kubeconfig

The final step is modifying your local Kubernetes configuration. You cannot simply run aws eks update-kubeconfig and call it a day, because that will configure the private AWS endpoint, which your machine still cannot route to directly.

You need to manually alter the server field for your cluster to point to the local port.

When you port-forward the EKS API server to your local machine, connecting to https://localhost:8443 introduces a new problem. The API server presents a TLS certificate minted for its internal AWS endpoint (e.g., 1234567890ABCDEF.yl4.us-east-1.eks.amazonaws.com), not localhost.

The quick, dirty fix is to add insecure-skip-tls-verify: true to your kubeconfig. But nothing screams "I definitely passed my SOC2 audit" quite like explicitly disabling TLS validation in production. It is the infrastructure equivalent of putting black tape over a check engine light.

Instead of turning off validation, we can instruct kubectl to connect via our local port but validate the TLS certificate against the actual EKS endpoint hostname. We do this by utilizing the tls-server-name parameter.

apiVersion: v1
clusters:
- cluster:
    server: https://localhost:8443
    # Validate the certificate against the real AWS endpoint
    tls-server-name: 1234567890ABCDEF.yl4.us-east-1.eks.amazonaws.com
  name: arn:aws:eks:us-east-1:123456789012:cluster/my-production-cluster
# ... contexts and users remain unchanged

Once saved, kubectl get pods will route securely through the SSM tunnel, across the worker node, and hit the control plane.

Wrap-Up

Relying on SSM port forwarding eliminates the need for VPNs, bastion hosts, and complex routing rules just to run operational commands against an isolated EKS cluster. By utilizing the existing IAM-integrated agent on your worker nodes, you shrink your external attack surface while maintaining strict audit trails for developer access.