DEV Community

Cover image for Navigating AWS EKS with Terraform: Understanding VPC Essentials for EKS Cluster Management
Oluwafemi Lawal for AWS Community Builders

Posted on • Edited on

Navigating AWS EKS with Terraform: Understanding VPC Essentials for EKS Cluster Management

AWS EKS (Elastic Kubernetes Service) offers a lighthouse for those navigating the complex waters of cloud computing, providing a managed Kubernetes service that simplifies running containerized applications. While EKS alleviates much of the heavy lifting associated with setting up a Kubernetes cluster, a deeper understanding of its components and requirements can significantly enhance your cloud infrastructure's efficiency and security.

Prerequisites

Embarking on this journey requires:

  • A grasp of AWS services and Kubernetes fundamentals
  • The AWS CLI, configured with your credentials
  • Terraform, to define infrastructure as code

Project directory structure:



.
├── modules
│   └── aws
│       └── vpc
│            ├── main.tf
│            ├── outputs.tf
│            ├── variables.tf
│            └── versions.tf
├── variables.tf
├── versions.tf
└── vpc.tf


Enter fullscreen mode Exit fullscreen mode

versions.tf file content:



terraform {
  required_version = "1.5.1"

  required_providers {
    aws = {
      version = "5.20.0"
      source  = "hashicorp/aws"
    }
  }

  backend "s3" {
    bucket         = ""
    key            = ""
    region         = ""
    dynamodb_table = ""
    encrypt        = true
  }
}

provider "aws" {
  region = var.region
  default_tags {
    tags = var.global_tags
  }
}


Enter fullscreen mode Exit fullscreen mode

You can remove the S3 backend configuration if you want to store state locally, otherwise you should ensure you have created an S3 bucket and DynamoDB table to use as values for the backend.

variables.tf file:



variable "region" {
  type        = string
  default     = "eu-west-1"
  description = "Target AWS region"
}

variable "cluster_name" {
  type        = string
  default     = "demo-cluster"
  description = "Name of the EKS cluster"
}

variable "aws_account_number" {
  type        = number
  description = "AWS account number used for deployment."
}

variable "global_tags" {
  type = map(string)
  default = {
    "ManagedBy"   = "Terraform"
    "Environment" = "dev"
  }
}


Enter fullscreen mode Exit fullscreen mode

Mastering VPC Networking for AWS EKS Clusters

When deploying a Kubernetes cluster in AWS using the Elastic Kubernetes Service (EKS), understanding the underlying Virtual Private Cloud (VPC) networking is crucial. It not only ensures secure communication within your cluster but also affects your cluster's interaction with the outside world, including how services are exposed and how resources are accessed.

The Importance of VPC Networking in EKS

VPC networking forms the backbone of your EKS cluster, dictating everything from pod communication to external access via load balancers. Here’s why getting your VPC setup right is critical:

EKS Cluster architecture

  • Security: Properly configured VPCs contain and protect your cluster, allowing only authorized access.
  • Connectivity: Your VPC setup determines how your cluster communicates with other AWS services, the internet, and on-premises networks.
  • Service Discovery & Load Balancing: Integrating with AWS services like Elastic Load Balancers (ELB) requires specific VPC configurations for seamless operation.

Architecting Your VPC: Public and Private Subnets

A well-architected VPC for EKS typically involves both public and private subnets:

  • Private Subnets are used for your worker nodes. This ensures that your workloads run securely, isolated from direct access to and from the internet. Private subnets connect to the internet through a NAT Gateway, allowing outbound traffic (e.g., for pulling container images) without exposing the nodes to inbound traffic.

  • Public Subnets are utilized for resources that need to be accessible from the internet, like ELBs that route external traffic to your services.

Understanding NAT Gateways and Route Tables in Your VPC

NAT Gateway

A NAT Gateway in your VPC enables instances in a private subnet to initiate outbound traffic to the internet (for updates, downloading software, etc.) without allowing inbound traffic from the internet. This is crucial for the security and integrity of your Kubernetes nodes, ensuring they have access to necessary resources while maintaining a strong security posture.

Route Tables

Route tables in AWS VPC define rules, known as routes, which determine where network traffic from your subnets or gateways is directed. In the context of EKS:

  • Public Route Table: Directs traffic from the public subnet to the internet gateway, allowing resources in the public subnet (like ELBs) to be accessible from the internet.
  • Private Route Table: Uses the NAT Gateway for routing outbound traffic from private subnets, ensuring that worker nodes can access the internet for essential tasks while remaining unreachable directly from the internet.

Importance of Tags

Tags in AWS serve as identifiers for your resources, enabling you to organize, track, and manage your infrastructure components efficiently. For EKS, tagging subnets correctly is crucial:

  • Kubernetes.io/cluster/<cluster-name>: This tag, with the value of either shared or owned, is essential for the Kubernetes cluster to identify which VPC resources it can manage and utilize for deploying services like Load Balancers.

  • kubernetes.io/role/elb: This tag, set to 1, identifies subnets that should be considered for hosting internet-facing ELBs. By tagging a subnet with kubernetes.io/role/elb, you're explicitly allowing Kubernetes to provision external load balancers in these subnets, facilitating access from the internet to services running in your cluster.

  • kubernetes.io/role/internal-elb: Similarly, this tag, set to 1, designates subnets for hosting internal ELBs. Internal load balancers are used to route traffic within your VPC, offering a method to expose services within your cluster to other components or services in your VPC without exposing them to the internet.

Why Tagging Matters

Tagging subnets with these roles guides the EKS control plane and the AWS Cloud Provider implementation in Kubernetes when automatically creating ELBs for services of type LoadBalancer. Without these tags:

  • Kubernetes may not be able to correctly provision an ELB for your service, leading to deployment issues or accessibility problems.
  • You could manually manage load balancer provisioning and association, which increases operational overhead and complexity.

Our completed VPC module will look something like this:

versions.tf:



terraform {
  required_version = ">= 1.5.1"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.20.0"
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

variables.tf:



variable "nat_gateway" {
  type        = bool
  default     = false
  description = "A boolean flag to deploy NAT Gateway."
}

variable "vpc_name" {
  type        = string
  nullable    = false
  description = "Name of the VPC."
}

variable "cidr_block" {
  type        = string
  default     = "10.0.0.0/16"
  description = "The IPv4 CIDR block for the VPC."

  validation {
    condition     = can(cidrnetmask(var.cidr_block))
    error_message = "Must be a valid IPv4 CIDR block address."
  }
}

variable "enable_dns_support" {
  type        = bool
  default     = true
  description = "A boolean flag to enable/disable DNS support in the VPC."
}

variable "enable_dns_hostnames" {
  type        = bool
  default     = false
  description = "A boolean flag to enable/disable DNS hostnames in the VPC."
}

variable "default_tags" {
  type        = map(string)
  default     = {}
  description = "A map of tags to add to all resources."
}

variable "public_subnet_count" {
  type        = number
  default     = 3
  description = "Number of Public subnets."
}

variable "public_subnet_additional_bits" {
  type        = number
  default     = 4
  description = "Number of additional bits with which to extend the prefix."
}

variable "public_subnet_tags" {
  type        = map(string)
  default     = {}
  description = "A map of tags to add to all public subnets."
}

variable "private_subnet_count" {
  type        = number
  default     = 3
  description = "Number of Private subnets."
}

variable "private_subnet_additional_bits" {
  type        = number
  default     = 4
  description = "Number of additional bits with which to extend the prefix."
}

variable "private_subnet_tags" {
  type        = map(string)
  default     = {}
  description = "A map of tags to add to all private subnets."
}



Enter fullscreen mode Exit fullscreen mode

main.tf:



data "aws_availability_zones" "available" {}


############################################################################################################
### VPC 
############################################################################################################
resource "aws_vpc" "main" {

  cidr_block           = var.cidr_block
  enable_dns_support   = var.enable_dns_support
  enable_dns_hostnames = var.enable_dns_hostnames

  tags = merge(
    var.default_tags,
    {
      Name = var.vpc_name
    }
  )
}

resource "aws_default_security_group" "main" {
  vpc_id = aws_vpc.main.id
}

############################################################################################################
### SUBNETS 
############################################################################################################
## Public subnets
resource "aws_subnet" "public" {
  count = var.public_subnet_count

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.cidr_block, var.public_subnet_additional_bits, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = merge(
    var.default_tags, var.public_subnet_tags, {
      Name = "${var.vpc_name}-public-subnet-${count.index + 1}"
  })
}

## Private Subnets
resource "aws_subnet" "private" {
  count = var.private_subnet_count

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.cidr_block, var.private_subnet_additional_bits, count.index + var.public_subnet_count)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = merge(
    var.default_tags, var.private_subnet_tags, {
      Name = "${var.vpc_name}-private-subnet-${count.index + 1}"
  })
}


############################################################################################################
### INTERNET GATEWAY
############################################################################################################
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = merge(
    var.default_tags, {
      Name = "${var.vpc_name}-internetgateway"
  })
}


############################################################################################################
### NAT GATEWAY 
############################################################################################################
resource "aws_eip" "nat_gateway" {
  count = var.nat_gateway ? 1 : 0

  domain = "vpc"
}

resource "aws_nat_gateway" "main" {
  count = var.nat_gateway ? 1 : 0

  allocation_id = aws_eip.nat_gateway[0].id
  subnet_id     = aws_subnet.public[0].id

  tags = merge(
    var.default_tags, {
      Name = "${var.vpc_name}-natgateway-default"
  })

  depends_on = [
    aws_internet_gateway.main
  ]
}


############################################################################################################
### ROUTE TABLES 
############################################################################################################
# Public Route table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  tags = merge(
    var.default_tags, {
      Name = "${var.vpc_name}-routetable-public"
  })
}

## Public Route Table rules
resource "aws_route" "public" {
  route_table_id         = aws_route_table.public.id
  gateway_id             = aws_internet_gateway.main.id
  destination_cidr_block = "0.0.0.0/0"
}

## Public Route table associations
resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Private Route table
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  tags = merge(
    var.default_tags, {
      Name = "${var.vpc_name}-routetable-private"
  })
}

## Private Route Table rules
resource "aws_route" "private" {
  route_table_id         = aws_route_table.private.id
  nat_gateway_id         = var.nat_gateway ? aws_nat_gateway.main[0].id : null
  destination_cidr_block = "0.0.0.0/0"
}

## Private Route table associations
resource "aws_route_table_association" "private" {
  count = length(aws_subnet.private)

  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}


Enter fullscreen mode Exit fullscreen mode

outputs.tf



output "vpc_id" {
  value = aws_vpc.main.id
}

output "public_subnets" {
  value = [for subnet in aws_subnet.public : subnet.id]
}

output "private_subnets" {
  value = [for subnet in aws_subnet.private : subnet.id]
}

output "aws_internet_gateway" {
  value = aws_internet_gateway.main
}

output "aws_route_table_public" {
  description = "The ID of the public route table"
  value       = aws_route_table.public.id
}

output "aws_route_table_private" {
  description = "The ID of the private route table"
  value       = aws_route_table.private.id
}


output "nat_gateway_ipv4_address" {
  value = var.nat_gateway ? aws_eip.nat_gateway[0].public_ip : null
}


Enter fullscreen mode Exit fullscreen mode

The usage of the module will look like this:

vpc.tf:



module "vpc" {
  source = "./modules/aws/vpc/v1"

  vpc_name             = "${var.cluster_name}-vpc"
  cidr_block           = "10.0.0.0/16"
  nat_gateway          = true
  enable_dns_support   = true
  enable_dns_hostnames = true

  public_subnet_count  = 3
  private_subnet_count = 3
  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
  }
}


Enter fullscreen mode Exit fullscreen mode

Securing AWS EKS Networking with Security Groups

In the realm of AWS EKS, securing the network traffic to and from your Kubernetes cluster is paramount. Security groups serve as the first line of defence, defining the rules that allow or deny network traffic to your EKS cluster and worker nodes. This article explores the role of security groups in EKS and outlines the necessary rules for secure communication within your cluster.

Security Groups: The Gatekeepers

EKS Cluster Security Group

The EKS Cluster Security Group acts as a shield for the control plane, governing the traffic to the Kubernetes API server, which is hosted by AWS. It's critical for enabling secure communication between the worker nodes and the control plane.



resource "aws_security_group" "eks_cluster_sg" {
  name        = "${var.cluster_name}-eks-cluster-sg"
  description = "Security group for EKS cluster control plane communication with worker nodes"
  vpc_id      = module.vpc.vpc_id
  tags = {
    Name = "${var.cluster_name}-eks-cluster-sg"
  }
}


Enter fullscreen mode Exit fullscreen mode

Key Rules:

  • Ingress from Worker Nodes: Allows inbound traffic on port 443 from worker nodes to the control plane, facilitating Kubernetes API calls.


resource "aws_security_group_rule" "eks_cluster_ingress_nodes" {
  type                     = "ingress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_cluster_sg.id
  source_security_group_id = aws_security_group.eks_nodes_sg.id
  description              = "Allow inbound traffic from the worker nodes on the Kubernetes API endpoint port"
}


Enter fullscreen mode Exit fullscreen mode
  • Egress to Kublet: Permits the control plane to initiate traffic to the kubelet running on each worker node, crucial for log collection and commands execution.


resource "aws_security_group_rule" "eks_cluster_egress_kublet" {
  type                     = "egress"
  from_port                = 10250
  to_port                  = 10250
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_cluster_sg.id
  source_security_group_id = aws_security_group.eks_nodes_sg.id
  description              = "Allow control plane to node egress for kubelet"
}


Enter fullscreen mode Exit fullscreen mode

Worker Nodes Security Group

Worker Nodes Security Group safeguards your worker nodes, which run your applications. It controls both inbound and outbound traffic to ensure only legitimate and secure communication occurs.



resource "aws_security_group" "eks_nodes_sg" {
  name        = "${var.cluster_name}-eks-nodes-sg"
  description = "Security group for all nodes in the cluster"
  vpc_id      = module.vpc.vpc_id
  tags = {
    Name                                        = "${var.cluster_name}-eks-nodes-sg"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
  }
}


Enter fullscreen mode Exit fullscreen mode

Key Rules:

  • Control to Worker Node: Even though we gave the Cluster security group an egress rule to the Worker Nodes on the kubelet port 10250, the Worker Nodes security group still has actually to allow that traffic.


resource "aws_security_group_rule" "worker_node_ingress_kublet" {
  type                     = "ingress"
  from_port                = 10250
  to_port                  = 10250
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_nodes_sg.id
  source_security_group_id = aws_security_group.eks_cluster_sg.id
  description              = "Allow control plane to node ingress for kubelet"
}


Enter fullscreen mode Exit fullscreen mode
  • Node-to-Node Communication: Allows nodes to communicate among themselves, which is essential for distributed systems like Kubernetes. ```terraform

resource "aws_security_group_rule" "worker_node_to_worker_node_ingress_ephemeral" {
type = "ingress"
from_port = 1025
to_port = 65535
protocol = "tcp"
self = true
security_group_id = aws_security_group.eks_nodes_sg.id
description = "Allow workers nodes to communicate with each other on ephemeral ports"
}


- **Egress to the Internet:** Enables nodes to initiate outbound connections to the internet via the NAT Gateway, vital for pulling container images or reaching external services. This also covers any other extra egress rules that would be needed, such as being able to communicate to the control plane on port 443.
```terraform


resource "aws_security_group_rule" "worker_node_egress_internet" {
  type              = "egress"
  from_port         = 0
  to_port           = 0
  protocol          = "-1"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.eks_nodes_sg.id
  description       = "Allow outbound internet access"
}


Enter fullscreen mode Exit fullscreen mode

CoreDNS Rules:
When deploying a Kubernetes cluster on AWS EKS, managing DNS resolution and traffic flow between pods is crucial for the stability and performance of your applications. CoreDNS plays a vital role in this ecosystem, serving as the cluster DNS service that enables DNS-based service discovery in Kubernetes. Its integration into an EKS cluster facilitates seamless communication between service endpoints within and outside your cluster. It also has dynamic DNS updates with pod state changes.



resource "aws_security_group_rule" "worker_node_to_worker_node_ingress_coredns_tcp" {
  type              = "ingress"
  from_port         = 53
  to_port           = 53
  protocol          = "tcp"
  security_group_id = aws_security_group.eks_nodes_sg.id
  self              = true
  description       = "Allow workers nodes to communicate with each other for coredns TCP"
}

resource "aws_security_group_rule" "worker_node_to_worker_node_ingress_coredns_udp" {
  type              = "ingress"
  from_port         = 53
  to_port           = 53
  protocol          = "udp"
  security_group_id = aws_security_group.eks_nodes_sg.id
  self              = true
  description       = "Allow workers nodes to communicate with each other for coredns UDP"
}


Enter fullscreen mode Exit fullscreen mode

You can then deploy your project:



$ terraform init
$ terraform plan
$ terraform apply


Enter fullscreen mode Exit fullscreen mode

The output will look something like this:
Sample Terraform apply output

Our deployed VPC:

Deployed VPC

Conclusion

As we conclude our exploration of VPC essentials for EKS cluster management with Terraform, it's clear that a well-architected network foundation is pivotal for the successful deployment and operation of Kubernetes clusters on AWS. Through the thoughtful configuration of NAT Gateways, Route Tables, and appropriate tagging, we ensure our EKS clusters are both secure and efficient, positioned to leverage the best of AWS infrastructure.

Security groups are essential for crafting a secure environment for your AWS EKS cluster. By meticulously defining ingress and egress rules, you ensure that your cluster communicates securely within its components and the outside world. Implementing these security measures lays the foundation for a robust and secure Kubernetes ecosystem on AWS.

Top comments (0)