Ahmed Bebars for AWS Community Builders

Posted on Oct 31, 2023

Architecting for Resilience: Crafting Opinionated EKS Clusters with Karpenter & Cilium Cluster Mesh — Part 1

Welcome to the future of digital ecosystems, where robustness meets unparalleled innovation! We’re about to dive into a world where Amazon’s Elastic Kubernetes Service (EKS) isn’t just a service; it’s an unbreakable, scalable fortress! 🚀

Wondering how? We’re mixing it up with Karpenter to provide a faster and better scaling that enables a new set of possibilities, also using Cilium Cluster Mesh to connect our clusters and build for resiliency.

Cilium is our networking superhero, ensuring our clusters talk to each other smoothly, while Karpenter keeps an eye on the scale and provides us with the perfect cost efficiency.

In this thrill ride, we’re not just exploring tech but crafting resilient warriors ready to combat digital challenges! 🛡️ So buckle up as we whisk you away to a land where each EKS cluster is an unassailable castle in the cloud, and every service is a valiant knight guarding the gates!

Let’s make our EKS clusters fun, resilient, and ready to roll! 🎢

Here are a few reference links about the previous services and tools:
What is Amazon EKS?
Cluster Mesh
Karpenter

Setting the Stage: Crafting the VPC Setup 🏗️

Before we jump into the exciting realms of Cilium Cluster Mesh and EKS Karpenter, let’s roll up our sleeves and start with the foundation — creating a Virtual Private Cloud (VPC). Imagine this VPC as the land where we will build our unassailable EKS castles. And remember, every robust castle starts with a solid foundation!

Navigating the Sea of IPs: Creating a Secondary CIDR 🌐

Our shiny new VPC is like a blank canvas, ready to host our future EKS clusters. But here’s the rub — IPs are like gold in the cloud world, and running out of them is a real bummer! IP exhaustion can stall our journey and leave our services stranded. But worry not! We’ve got a savvy solution — introducing a secondary CIDR block. This is like having a backup stash of gold, ensuring our pods never run out of valuable IPs!

When we’re sculpting our subnets, we’ll tag them with usage: Pods. Why, you ask? Because this little beacon of labeling will be our guiding light when we venture into installing Cilium! 🌟

The Landing Ground is Ready: VPC Setup Complete! 🌐

By wielding the powers of Terraform, we’ve crafted a formidable VPC, ripe and ready to host our unassailable EKS clusters. It’s not just any VPC; it’s a fortified domain strategically tagged and enriched with a secondary CIDR to ensure our pods have ample room to thrive. If you check your AWS console, you should see the created VPC like the following:

Embarking on EKS: Initiating the Cluster Setup 🚀

Having laid the solid groundwork with our fortified VPC, we are now ready to step into the world of EKS and bring our clusters to life! The journey of setting up our EKS Cluster will be a confluence of meticulous configurations, innovative integrations, and strategic optimizations. Let’s kickstart this expedition!

In the previous file, there are two things to keep in mind. First, we are applying specific taints to our nodes, so these nodes will only become ready for deployment once we have Cilium Installed; these taints will be removed automatically once Cilium Pod is working on the node.

        key    = "node.cilium.io/agent-not-ready"
        value  = "true"
        effect = "NO_SCHEDULE"

Second, We will only rely on one managed node group, but we will leverage Karpenter; however, karpenter needs to be deployed on a node. (This may change soon once the Karpenter is available on the EKS Control Plane.)
[EKS] Karpenter inside control plane · Issue #1792 · aws/containers-roadmap

Navigating back to the console, We should see the cluster being ready and deployments like (coredns, aws-node,kube-proxy) are deployed.

Networking Nirvana: Installing Cilium on the EKS Cluster 🕸️

With the EKS Cluster firmly established and the initial Node Group ready to roll, it’s time to invite Cilium to the party! Cilium is our networking and security superhero, ensuring seamless communication between pods and enhancing our cluster’s security posture.

We will leverage the EKS Blueprint addon modules to install the Cilium helm chart with the required values. We will not deploy Cilium in a Changing mode but install it in Overlay Mode and Kube Proxy Free. Hence, we need to complete two steps ahead because EKS clusters are shipped by default with AWS VPC CNI and Kube Proxy installed, so we need to remove these. daemon sets first.

We can apply the helm chart with the necessary cilium configuration in the same step.

In this step, We will create a custom CNI Config and instrument cilium to use the Pods Subnets we made earlier; then, we pass this ConfigMap to cilium helm chart values.

Balancing Terraform and Helm: A Synchronized Approach 🧩

I know there’s an undercurrent of preference for installing Helm through Terraform. Indeed, managing the state with Terraform offers elegance and control. However, installing and customizing charts and maintaining them centrally resonate more harmoniously when performed directly. This approach not only provides a consolidated management platform but also retains the flexibility of customization and scaling.

Elevating with GitOps: A Gateway to Continuous Operations 🔄

And there’s more! For those inclined towards a GitOps approach, this module serves as a conduit, supporting GitOps methodologies and ensuring continuous and consistent operational flows. It’s a paradigm where declarative configurations meet version control, opening avenues for automatic and reliable application deployment and management.

After the chart is installed, we should have all pods up and running and the cluster is configured with Cilium as expected; you can see the following.

(K9s is one of my favorite tools for navigating Kubernetes clusters through the CLI).

Since we installed Hubble on the cluster, Let’s check its cool UI and see how the traffic flows between the pods. To do so, let’s run:

kubectl port-forward -n kube-system deployment/hubble-ui 8081:8081

Then, we can access the UI and check what’s happening in kube-system names

http://localhost:8081/?namespace=kube-system

You should be able to see the traffic flowing between pods and to the internet as well, and since we haven’t defined any policies on the cluster yet, all of the traffic should be FORWARDED

Karpenter: The EKS Autoscaling Maestro 🚀

As we journey into the heart of our EKS setup, we arrive at Karpenter. This nimble auto scaler intelligently manages node provisioning, ensuring our workloads are optimally balanced and resources are used efficiently. Let’s delve into the installation!

Using a similar module from EKS Addons, we will install and configure Karpenter on the cluster.

Then, we must update the Cluster Auth Roles to allow the Karpenter IAM role to create nodes and attach them to the cluster as needed. And you can do that by adding the created IAM Role ARN to aws_auth_roles.

Now Karpenter Deployment should be ready.

However, Kaprneter doesn’t know yet how or what type of nodes that we need; that’s where we need to use Provisioner & NodeTemplates These two CRDs are the core elements of Karpenter and define how nodes are being created and many other options.

Think about like NodeGroup but without creating an actual Node Group, it’s flexible because it’s made in Kubernetes, which always allows us to define them within the same scope. This is a constructive approach, for example, If you want to create these on the fly.

It’s time to manifest our desires in the form of CRDs. The first to materialize is our Provisioner, the maestro dictating the variations and specifications of the nodes we summon.

And now, with the last resource we need to complete our symphony, Karpenter will start taking over our scaling.

Wrapping Up: Setting Up EKS 🌟

We’ve taken some significant steps in this guide! We’ve set up our own EKS Amazon Clusters and avoided running out of IP addresses by using secondary CIDRs. We’ve also tagged our subnets, so everything is easy to find.

We added Cilium to our project with the help of, and we’ve used Terraform and Helm to make everything easy to manage and adjust. For those who like using GitOps, our setup supports it too!

Then, we brought Karpenter to help our clusters use resources wisely and save on costs. We’ve made some custom adjustments and used AWSNodeTemplate to ensure our nodes are just how we want them.

What’s Coming Up Next? 🤔

This is just the start! We’ve got our EKS set up with Cilium and Karpenter, but there’s much more to learn. In the following parts, we’ll explore Cluster Mesh configurations and more remarkable EKS and Cilium CLuster Mesh features. So, keep reading as we uncover more about Kubernetes and Cilium and learn how to make the most of them!

Top comments (5)

Júlio Pedrosa • Jun 26 '24

Amazing post! I've seen that cillium-values.yaml is missing.
example bellow:

tee "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-cilium.yml" << EOF
cluster:
  name: ${CLUSTER_NAME}
  id: 0
serviceAccounts:
  operator:
    name: cilium-operator
    annotations:
      eks.amazonaws.com/role-arn: ${CILIUM_OPERATOR_SERVICE_ACCOUNT_ROLE_ARN}
bandwidthManager:
  enabled: true
egressMasqueradeInterfaces: eth0
encryption:
  enabled: true
  type: wireguard
eni:
  enabled: true
  awsEnablePrefixDelegation: true
  awsReleaseExcessIPs: true
  eniTags:
    $(echo "${TAGS}" | sed "s/,/\\n    /g; s/=/: /g")
  iamRole: ${CILIUM_OPERATOR_SERVICE_ACCOUNT_ROLE_ARN}
hubble:
  metrics:
    enabled:
      - dns
      - drop
      - tcp
      - flow
      - icmp
      - http
  relay:
    enabled: true
ipam:
  mode: eni
kubeProxyReplacement: disabled
tunnel: disabled

font: ruzickap.github.io/posts/cilium-am...

Keith Williams • Jan 21 '24

Feels incomplete with the absence of your values file.

Brian Schroeder • Dec 5 '23 • Edited

Hello! Great post, we're still using the ENI's with the secondary CIDR block or am I mistaken which will give us limitations with how many pods can be on one node at a time

masteredd • Dec 7 '23

Thanks!
That's very interesting.
Can you provide cilium-values.yaml values file ?