DEV Community

Marco Braga
Marco Braga

Posted on

Install OpenShift in the cloud edge with AWS Local Zones

This article describes the steps to install the OpenShift cluster in an existing VPC with Local Zones subnets, extending compute nodes to the edge locations with MachineSets.

Table Of Contents:

Summary

Reference Architecture

The following network assets will be created in this article:

  • 1 VPC with CIDR 10.0.0.0/16
  • 4 Public subnets on the zones: us-east-1a, us-east-1b, us-east-1c, us-east-1-nyc-1a
  • 3 Private subnets on the zones: us-east-1a, us-east-1b, us-east-1c
  • 3 NAT Gateway, one per private subnet
  • 1 Internet gateway
  • 4 route tables, 3 for private subnets and one for public subnets

The following OpenShift cluster nodes will be created:

  • 3 Control Plane nodes running in the subnets on the "parent region" (us-east-1{a,b,c})
  • 3 Compute nodes (Machine Set) running in the subnets on the "parent region" (us-east-1{a,b,c})
  • 1 Compute node (Machine Set) running in the edge location us-east-1-nyc-1a (NYC Local Zone)

Requirements

  • OpenShift CLI (oc)
  • AWS CLI (aws)

Preparing the environment

  • Export the common environment variables (change me)
export VERSION=4.11.0
export PULL_SECRET_FILE=${HOME}/.openshift/pull-secret-latest.json
export SSH_PUB_KEY_FILE="${HOME}/.ssh/id_rsa.pub"
Enter fullscreen mode Exit fullscreen mode
  • Install the clients
oc adm release extract \
    --tools "quay.io/openshift-release-dev/ocp-release:${VERSION}-x86_64" \
    -a "${PULL_SECRET_FILE}"

tar xvfz openshift-client-linux-${VERSION}.tar.gz
tar xvfz openshift-install-linux-${VERSION}.tar.gz
Enter fullscreen mode Exit fullscreen mode

Opt-in the Local Zone locations

For each Local Zone location, you must opt-in on the EC2 configuration - it's opt-out by default.

You can use the describe-availability-zones to check the location available in the region running your cluster.

Export the region of your OpenShift cluster will be created:

export CLUSTER_REGION="us-east-1"
# Using NYC Local Zone (choose yours)
export ZONE_GROUP_NAME="${CLUSTER_REGION}-nyc-1a"
Enter fullscreen mode Exit fullscreen mode

Check the AZs available in your region:

aws ec2 describe-availability-zones \
    --filters Name=region-name,Values=${CLUSTER_REGION} \
    --query 'AvailabilityZones[].ZoneName' \
    --all-availability-zones
Enter fullscreen mode Exit fullscreen mode

Depending on the region, that list can be long. Things you need to know:

  • ${REGION}[a-z] : Availability Zones available in the Region (parent)
  • ${REGION}-LID-N[a-z] : Local Zones available, where LID-N is the location identifier, and [a-z] is the zone identifier.
  • ${REGION}-wl1-LID-wlz-[1-9] : Available Wavelength zones

Opt-in the location to your AWS Account - in this example US East (New York):

aws ec2 modify-availability-zone-group \
    --group-name "${ZONE_GROUP_NAME}" \
    --opt-in-status opted-in
Enter fullscreen mode Exit fullscreen mode

Steps to create the Cluster

Create the network stack

Steps to network stack describe how to:

  • create the Network (VPC, subnets, Nat Gateways) in the parent/main zone
  • create the subnet on the Local Zone location

Create the network (VPC and dependencies)

The first step is to create the network resources in the zones located in the parent region. Those steps reuse the VPC stack as described in the documentation[1], adapting it to tag the subnets with proper values[2] used by Kubernetes Controller Manager to discover the subnets used to create the Load Balancer used by the default router (ingress).

[1] OpenShift documentation / CloudFormation template for the VPC

[2] AWS Load Balancer Controller / Subnet Auto Discovery

Steps to create the VPC stack:

  • Set the environment variables
export CLUSTER_NAME="lzdemo"
export VPC_CIDR="10.0.0.0/16"
Enter fullscreen mode Exit fullscreen mode
  • Create the Template vars file
cat <<EOF | envsubst > ./stack-vpc-vars.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "VpcCidr",
    "ParameterValue": "${VPC_CIDR}"
  },
  {
    "ParameterKey": "AvailabilityZoneCount",
    "ParameterValue": "3"
  },
  {
    "ParameterKey": "SubnetBits",
    "ParameterValue": "12"
  }
]
EOF
Enter fullscreen mode Exit fullscreen mode
STACK_VPC=${CLUSTER_NAME}-vpc
STACK_VPC_TPL="${PWD}/ocp-aws-local-zones-day-0_cfn-net-vpc.yaml"
STACK_VPC_VARS="${PWD}/stack-vpc-vars.json"
aws cloudformation create-stack --stack-name ${STACK_VPC} \
     --template-body file://${STACK_VPC_TPL} \
     --parameters file://${STACK_VPC_VARS}
Enter fullscreen mode Exit fullscreen mode
  • Wait for the stack to be completed (StackStatus=CREATE_COMPLETE)
aws cloudformation describe-stacks --stack-name ${STACK_VPC}
Enter fullscreen mode Exit fullscreen mode
  • (optional) Update the stack
aws cloudformation update-stack \
  --stack-name ${STACK_VPC} \
  --template-body file://${STACK_VPC_TPL} \
  --parameters file://${STACK_VPC_VARS}
Enter fullscreen mode Exit fullscreen mode

Create the Local Zones subnet

  • Set the environment the variables to create the Local Zone subnet
export CLUSTER_REGION="us-east-1"
export LZ_ZONE_NAME="${CLUSTER_REGION}-nyc-1a"
export LZ_ZONE_SHORTNAME="nyc1"
export LZ_ZONE_CIDR="10.0.128.0/20"

export VPC_ID=$(aws cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="VpcId").OutputValue' )
export VPC_RTB_PUB=$(aws cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="PublicRouteTableId").OutputValue' )
Enter fullscreen mode Exit fullscreen mode
  • Create the template vars file
cat <<EOF | envsubst > ./stack-lz-vars-${LZ_ZONE_SHORTNAME}.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "VpcId",
    "ParameterValue": "${VPC_ID}"
  },
  {
    "ParameterKey": "PublicRouteTableId",
    "ParameterValue": "${VPC_RTB_PUB}"
  },
  {
    "ParameterKey": "LocalZoneName",
    "ParameterValue": "${LZ_ZONE_NAME}"
  },
  {
    "ParameterKey": "LocalZoneNameShort",
    "ParameterValue": "${LZ_ZONE_SHORTNAME}"
  },
  {
    "ParameterKey": "PublicSubnetCidr",
    "ParameterValue": "${LZ_ZONE_CIDR}"
  }
]
EOF
Enter fullscreen mode Exit fullscreen mode
STACK_LZ=${CLUSTER_NAME}-lz-${LZ_ZONE_SHORTNAME}
STACK_LZ_TPL="${PWD}/ocp-aws-local-zones-day-0_cfn-net-lz.yaml"
STACK_LZ_VARS="${PWD}/stack-lz-vars-${LZ_ZONE_SHORTNAME}.json"
aws cloudformation create-stack \
  --stack-name ${STACK_LZ} \
  --template-body file://${STACK_LZ_TPL} \
  --parameters file://${STACK_LZ_VARS}
Enter fullscreen mode Exit fullscreen mode
  • Check the status (wait to be finished)
aws cloudformation describe-stacks --stack-name ${STACK_LZ}
Enter fullscreen mode Exit fullscreen mode

Repeat the steps above for each location.

Create the installer configuration

  • Set the vars used on the installer configuration
export BASE_DOMAIN="devcluster.openshift.com"

# Parent region (main) subnets only: Public and Private
mapfile -t SUBNETS < <(aws cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[0].OutputValue' | tr ',' '\n')
mapfile -t -O "${#SUBNETS[@]}" SUBNETS < <(aws cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[1].OutputValue' | tr ',' '\n')
Enter fullscreen mode Exit fullscreen mode
  • Create the install-config.yaml file, setting the subnets recently created (parent region only)

Adapt it as your usage, the requirement is to set the field platform.aws.subnets with the subnet IDs recently created

cat <<EOF > ${PWD}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME}"
platform:
  aws:
    region: ${CLUSTER_REGION}
    subnets:
$(for SB in ${SUBNETS[*]}; do echo "    - $SB"; done)
pullSecret: '$(cat ${PULL_SECRET_FILE} |awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF
Enter fullscreen mode Exit fullscreen mode
  • (Optional) Back up the install-config.yaml
cp -v ${PWD}/install-config.yaml \
    ${PWD}/install-config-bkp.yaml
Enter fullscreen mode Exit fullscreen mode

Create the installer manifests

  • Create the manifests
./openshift-install create manifests
Enter fullscreen mode Exit fullscreen mode
  • Get the InfraId used in the next sections
export CLUSTER_ID="$(awk '/infrastructureName: / {print $2}' manifests/cluster-infrastructure-02-config.yml)"
Enter fullscreen mode Exit fullscreen mode

Create the Machine Set manifest for Local Zones pool

  • Set the variables used to create the Machine Set

Adapt the instance type as you need, as supported on the Local Zone

export INSTANCE_TYPE="c5d.2xlarge"

export AMI_ID=$(grep ami \
  openshift/99_openshift-cluster-api_worker-machineset-0.yaml \
  | tail -n1 | awk '{print$2}')
export SUBNET_ID=$(aws cloudformation describe-stacks \
  --stack-name "${STACK_LZ}" \
  | jq -r .Stacks[0].Outputs[0].OutputValue)
Enter fullscreen mode Exit fullscreen mode
  • Create the Machine Set for nyc1 nodes

publicIp: true should be set to deploy the node in the public subnet in Local Zone.

The public IP mapping is used merely to get access to the internet (required), optionally you can modify the network topology to use a private subnet, associating correctly the Local Zone private subnet to a valid route table that has correct routing entries to the internet. Or explore the disconnected installations. None of those options will be covered in this article.

cat <<EOF > manifests/99_openshift-cluster-api_worker-machineset-nyc1.yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
  name: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
      machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
        machine.openshift.io/cluster-api-machine-role: edge
        machine.openshift.io/cluster-api-machine-type: edge
        machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
    spec:
      metadata:
        labels:
          location: local-zone
          zone_group: ${LZ_ZONE_NAME::-1}
          node-role.kubernetes.io/edge: ""
      taints:
        - key: node-role.kubernetes.io/edge
          effect: NoSchedule
      providerSpec:
        value:
          ami:
            id: ${AMI_ID}
          apiVersion: awsproviderconfig.openshift.io/v1beta1
          blockDevices:
          - ebs:
              volumeSize: 120
              volumeType: gp2
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: ${CLUSTER_ID}-worker-profile
          instanceType: ${INSTANCE_TYPE}
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: ${LZ_ZONE_NAME}
            region: ${CLUSTER_REGION}
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-worker-sg
          subnet:
            id: ${SUBNET_ID}
          publicIp: true
          tags:
          - name: kubernetes.io/cluster/${CLUSTER_ID}
            value: owned
          userDataSecret:
            name: worker-user-data
EOF
Enter fullscreen mode Exit fullscreen mode

Create IngressController manifest to use NLB (optional)

The OCP version used in this article, is using Classic Load Balancer as default router. This option will force to use the NLB by default.

This section is based on the official documentation.

  • Create the IngressController manifest to use NLB by default
cat <<EOF > manifests/cluster-ingress-default-ingresscontroller.yaml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService
EOF
Enter fullscreen mode Exit fullscreen mode

Update the VPC tag with the InfraID

This step is required when the ELB Operator (not covered) will be installed. It will update the InfraID value on the VPC "cluster tag".

Common error when installing the ELB Operator without setting the cluster tag: ERROR setup failed to get VPC ID {"error": "no VPC with tag \"kubernetes.io/cluster/<infra_id>\" found"}

  1. Edit the CloudFormation Template var file of the VPC stack
cat <<EOF | envsubst > ./stack-vpc-vars.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "ClusterInfraId",
    "ParameterValue": "${CLUSTER_ID}"
  },
  {
    "ParameterKey": "VpcCidr",
    "ParameterValue": "${VPC_CIDR}"
  },
  {
    "ParameterKey": "AvailabilityZoneCount",
    "ParameterValue": "3"
  },
  {
    "ParameterKey": "SubnetBits",
    "ParameterValue": "12"
  }
]
EOF
Enter fullscreen mode Exit fullscreen mode
  1. Update the stack
aws cloudformation update-stack \
  --stack-name ${STACK_VPC} \
  --template-body file://${STACK_VPC_TPL} \
  --parameters file://${STACK_VPC_VARS}
Enter fullscreen mode Exit fullscreen mode

Install the cluster

Now it's time to create the cluster and check the results.

  • Create the cluster
./openshift-install create cluster --log-level=debug
Enter fullscreen mode Exit fullscreen mode
  • Install summary
DEBUG Time elapsed per stage:
DEBUG            cluster: 4m28s
DEBUG          bootstrap: 36s
DEBUG Bootstrap Complete: 10m30s
DEBUG                API: 2m18s
DEBUG  Bootstrap Destroy: 57s
DEBUG  Cluster Operators: 8m39s
INFO Time elapsed: 25m50s
Enter fullscreen mode Exit fullscreen mode
  • Cluster operator's summary
$ oc get co -o json \
    | jq -r ".items[].status.conditions[] | select(.type==\"Available\").status" \
    | sort |uniq -c
     32 True

$ oc get co -o json \
    | jq -r ".items[].status.conditions[] | select(.type==\"Degraded\").status" \
    | sort |uniq -c
     32 False
Enter fullscreen mode Exit fullscreen mode
  • Machines in Local Zones
$ oc get machines -n openshift-machine-api \
  -l machine.openshift.io/zone=us-east-1-nyc-1a
NAME                                       PHASE     TYPE          REGION      ZONE               AGE
lzdemo-ds2dn-edge-us-east-1-nyc-1a-6645q   Running   c5d.2xlarge   us-east-1   us-east-1-nyc-1a   12m
Enter fullscreen mode Exit fullscreen mode
  • Nodes in Local Zones filtering by custom labels defined on Machine Set (location, zone_group)
$ oc get nodes -l location=local-zone
NAME                           STATUS   ROLES         AGE   VERSION
ip-10-0-143-104.ec2.internal   Ready    edge,worker   11m   v1.24.0+beaaed6
Enter fullscreen mode Exit fullscreen mode

Steps to Destroy the Cluster

To destroy the resources created, you need to first delete the cluster and then the CloudFormation stacks used to build the network.

  • Destroy the cluster
./openshift-install destroy cluster --log-level=debug
Enter fullscreen mode Exit fullscreen mode
  • Destroy the Local Zone subnet(s) stack(s)
aws cloudformation delete-stack --stack-name ${STACK_LZ}
Enter fullscreen mode Exit fullscreen mode
  • Destroy the Network Stack (VPC)
aws cloudformation delete-stack --stack-name ${STACK_VPC}
Enter fullscreen mode Exit fullscreen mode

Final notes / Conclusion

The OpenShift cluster can be installed successfully in existing VPC which has subnets in the Local Zones when the tags have been set correctly. So new Machines Sets can be added to any new location.

It was not found any technical blocker to install OpenShift cluster in existing VPC which has subnets in AWS Local Zones, although there is a sort of configuration to be asserted to avoid issues on the default router and ELB Operator.

As described on the steps section, the setup created one Machine Set setting it to unscheduled, creating the node-role.kubernetes.io/edge=''. The suggestion to create a custom MachineSet named edge was to keep easy the management of resources operating in the Local Zones, which is in general more expensive than the parent zone (the costs are almost 20%). This is a design pattern, the label topology.kubernetes.io/zone can be mixed with taint rules when operating in many locations.

The installation process runs correctly as Day-0 Operation, the only limitation we have found when installing was the ingress controller trying to discover all the public subnets on the VPC to create the service for the default router. The workaround was provided by tagging the Local Zone subnets with kubernetes.io/cluster/unmanaged=true to avoid the Subnets Auto Discovery including the Local Zone Subnets into the default router.

Additionally, when installing the ALB Operator in Day 2 (available on 4.11), the operator requires the cluster tag kubernetes.io/cluster/<infraID>=.* to run successfully, although the installer does not require it when installing a cluster in existing VPC[1]. The steps to use ALB on services deployed in Local Zones exploring the low-latency feature are not covered in this document, an experiment creating the operator from source can be found here.

Resources produced:

  • UPI CloudFormation template for VPC reviewed/updated
  • New CloudFormation template to create Local Zone subnets created
  • Steps for OpenShift 4.11 installing with support to create compute nodes in Local Zones

Takeaways / Important notes:

  • The Local Zone subnets should have the tag kubernetes.io/cluster/unmanaged=true to avoid the Subnet Discovery for load balancer controller automatically add the subnet located on the Local Zone to the default router.
  • The VPC should have the tag kubernetes.io/cluster/<infraID>=shared to install correctly the AWS ELB Operator (not covered in this post)
  • Local Zones do not support Nat Gateways, so there are two options for nodes on Local Zones to access the internet:

    1) Create the private subnet, associating the Local Zones subnet to one parent region route table, then create the machine in the private subnet without mapping public IP.
    2) Use a public subnet on Local Zone and map the public IP to the instance (Machine spec). There are no security constraints as the Security Group rules block all the access outside the VPC (default installation). The NLB has more unrestrictive rules on the security groups. Option 1 should be better until it is not improved. That option also implies extra data transfer fees from the instance located on the Local Zone to the parent zone, in addition to the standard costs to the internet.

References

Top comments (0)