Amarachi Iheanacho

Posted on Aug 9

Securing your AWS EKS cluster

#aws #eks #devops #kubernetes

When a small analytics firm's misconfigured Kubernetes cluster exposed the sensitive data of a Fortune 500 client worth billions in revenue, it wasn't just a technical oversight, it was a business catastrophe waiting to happen. This isn't an isolated incident. In 2024 alone, researchers discovered over 350 organizations with publicly accessible, largely unprotected Kubernetes clusters, with 60% of them already breached and running active malware campaigns.

The financial stakes couldn't be higher. The average data breach now costs $4.88 million, while software supply chain attacks cost businesses $45.8 billion globally in 2023. For organizations running AWS Elastic Kubernetes Service (EKS), these aren't abstract statistics, they're urgent realities that demand immediate attention.

AWS EKS is indeed a powerful solution for running containerized applications at scale. It simplifies many operational challenges by providing a fully managed Kubernetes control plane, helping organizations reduce overhead and accelerate deployment. However, the AWS shared responsibility model makes one thing crystal clear: while AWS secures the underlying infrastructure, protecting your workloads, configurations, and data remains entirely your responsibility. Even the smallest oversight can cascade into breaches or compliance violations that cost millions.

This is where this guide comes in. We'll show you how to build a robust security posture for your EKS environment, covering:

Control plane security
Network and pod security layers
Secrets management
Monitoring and incident response planning.

Understanding the EKS security landscape

With EKS clusters holding our entire application, saying that you need to care about securing these clusters is by no means a throwaway statement. Security is absolutely critical, and, unfortunately, also very challenging.

The challenge with EKS security isn’t just about implementing individual protective measures. It’s about understanding how all these components interact within a complex, distributed system.

Unlike traditional monolithic applications, where security boundaries are clearly defined, containerized environments create dynamic attack surfaces that shift as pods scale, migrate, and communicate across your infrastructure.

This complexity is compounded by the fact that many security decisions must happen simultaneously at multiple levels: at the cluster level through RBAC policies, at the network level through CNI configurations, and at the application level through service mesh implementations. A compromise at any of these layers can put your entire cluster, and therefore your entire application, at risk.

In this article, we’re going to discuss security at different levels, covering how to secure the following components of your cluster:

Control plane
Network
Pods
Secrets
Images

And more importantly, we’ll look at why even after doing all this, you should never abandon your cluster and must continue monitoring it for anomalies and vulnerabilities.

Control plane security

The control plane is the heart, or more accurately, the brain, of any cluster, and this holds true for an EKS cluster as well. That makes it the most critical component to secure properly.

The control plane decides and enforces everything: where and when pods should run, who has access to what, how API requests are handled, and the overall state of the cluster. If an attacker gains access to the control plane, they can see everything happening in your environment, change configurations, and much more.

To put it simply, so you understand the severity: if you lose control of the control plane, you lose control of your entire cluster.

Thankfully, there are ways to protect your control plane from malicious actors. Here’s how:

API server access control

Your API server is the gateway to your entire Kubernetes cluster. Every kubectl command, every deployment, every request to access secrets, it all flows through this single point. This centralization makes the API server incredibly powerful but also a potential security risk if you don’t secure it properly.

To protect your API server, start by enabling private endpoint access. You can refer to the Cluster API server endpoint documentation for detailed steps on how to set this up.

Private endpoints ensure that communication between your worker nodes and the control plane stays entirely within your VPC, eliminating exposure to internet-based attack vectors.

While this approach is excellent for securing the control plane, using only private endpoints can make cluster management more challenging.

That’s why I usually recommend a hybrid approach: enable both private and public endpoints but restrict public access to specific IP ranges using CIDR blocks. This setup allows you to manage your cluster securely from authorized locations without sacrificing flexibility.

Authentication and authorization

Another way to secure your control plane is by carefully managing who has access to your clusters and, once they do, what they’re allowed to do inside them.

A powerful way to achieve this is by leveraging AWS IAM. Amazon EKS integrates seamlessly with IAM, allowing you to use your existing AWS identities and permissions to control access to your Kubernetes clusters.

While this integration is convenient, you must be cautious when configuring IAM to avoid privilege escalation, in other words, accidentally granting people permissions they don’t actually need.

When you create an EKS cluster, only the IAM entity (user or role) that created it has access by default. This design helps prevent unauthorized access right from the start. However, you’ll typically need to grant access to additional team members and service accounts in a controlled, systematic way, based on their roles and responsibilities.

To efficiently and accurately grant access to individuals and teams, you use the aws-auth ConfigMap, which maps IAM roles and users to Kubernetes groups. Be aware that this mapping is where many security misconfigurations originate.

Always review and test these mappings carefully. Refer to Grant IAM users access to Kubernetes with a ConfigMap for more information on how to grant access using the ConfigMap.

Finally, never grant cluster-admin privileges unless absolutely necessary. Instead, create fine-grained RBAC policies that follow the principle of least privilege. For example, a developer working on frontend applications doesn’t need access to database secrets or infrastructure namespaces.

Network security

Now that we’ve covered control plane security, let’s turn to another critical, and frankly quite complex, attack surface: the network.

Everything in your cluster, pods, nodes, and services, communicates over the network. If you don’t secure this layer, anyone who gains access to your network could potentially read or tamper with sensitive data.

In Amazon EKS, network security operates across multiple layers, each offering a different type of protection. Let’s take a closer look at how to secure each of these layers.

VPC configuration and subnet strategy

Your cluster's network foundation starts with proper VPC design. Place your worker nodes in private subnets whenever you can, this ensures that they can't be directly accessed from the internet.

This single decision can eliminate entire categories of threats, such as SSH brute-force attacks, external scanning, and remote exploitation, while still allowing your applications to function as needed.

Use separate subnets for different node groups based on their security requirements. Your production workloads shouldn't share network space with development environments, and your database nodes need different access patterns than your web servers.

The subnet strategy becomes particularly important when implementing network policies. Kubernetes network policies work at the pod level, but they're most effective when combined with VPC-level controls.

Network policies

Next are network policies, which have become very popular at this point.

Network policies allow you to control traffic flow between pods with remarkable precision, creating microsegmentation that would be impossible with traditional network security tools.

A well-designed network policy starts with a default-deny stance. This means that unless explicitly allowed, no traffic flows between pods. While this might seem restrictive, it's the foundation of a zero-trust network architecture.

Once you've established your default-deny policy, you can selectively allow traffic based on your application's requirements. For example, a frontend application might need to communicate with a backend API but should never have direct access to the database.

Service mesh

For secure service-to-service encryption, then service mesh is your guy.

Service meshes like Istio or AWS App Mesh provide traffic encryption, authentication, and authorization at the service level, adding another layer of security beyond network policies.

Service meshes excel in environments where you need to implement security policies based on service identity rather than network location. They're particularly valuable when dealing with compliance requirements that mandate encryption in transit for all inter-service communication.

Pod security

Now that you have defined and restricted access to your pods and the spaces they run in, it’s time to control what pods are allowed to do and how they run. This is exactly what pod security aims to achieve.

Pod security lets you establish rules that prevent pods from performing potentially risky actions inside your cluster and this section will explore the components of pod security in detail.

Pod Security Standards implementation

Pod Security Standards are rules that define how strict Kubernetes should be about what pods are allowed to do. There are three main policy levels:

Privileged: Almost no restrictions, pods can do virtually anything. This is generally a bad idea for production environments.
Baseline: Applies some restrictions to block the most dangerous behaviors such as running as root without restrictions, while still supporting common use cases.
Restricted: The most stringent level. Pods can’t run as root, can’t perform privileged actions, and must declare clear security settings.

The Restricted policy is particularly effective at reducing your cluster’s exposure to container escape attacks. While you may need to adjust some applications to comply, these controls provide strong protection for your workloads.

Refer to the Pod Security Standards for more information.

Security contexts

Unlike Pod Security Standards, which act as cluster-wide gatekeepers deciding what pods can even be scheduled, security contexts are the granular controls you apply to individual pods and containers.

Security contexts give you precise control over how your containers operate at runtime. You can specify exactly which user ID a container runs as, whether it can escalate privileges, what Linux capabilities it has access to, and how it interacts with the file system. This granular approach means you can tailor security settings to each workload's specific needs rather than applying broad restrictions across your entire cluster.

The real power of security contexts becomes apparent when you consider that they work in tandem with Pod Security Standards. Your Pod Security Standards might enforce that containers can't run as root, but security contexts let you specify that a particular container should run as user ID 1000 with group ID 3000. The standards provide the guardrails; the contexts provide the precise configuration.

Runtime security

While Pod Security Standards and security contexts define who can run and how they are configured, runtime security focuses on what containers actually do once they are running.

Runtime security is all about continuously monitoring your workloads for suspicious or unauthorized activity. Even if a container starts in a secure state, it could be exploited through vulnerabilities, misconfigurations, or malicious code. Runtime security tools help detect and stop these threats before they can escalate.

Secrets management

In addition to securing your pods and network, you also need to protect your secrets. Kubernetes offers almost no secure mechanism for secret storage, and the default base64 encoding is not encryption, it's merely obfuscation that provides no real security benefit.

To actually secure your secrets, you need to use the following:

AWS Secrets Manager integration

AWS provides a solution for managing secrets, AWS Secrets Manager.

You can use AWS Secrets Manager to securely store and retrieve sensitive data such as API keys, passwords, and certificates. You can store and retrieve secrets without exposing them in your manifests or ConfigMaps.

To make the process even more seamless, tools like External Secrets Operator can automatically synchronize secrets from AWS Secrets Manager into Kubernetes in a controlled way.

Encryption at Rest and in Transit

Next, you need to enable encryption at rest for the etcd database in your EKS cluster. This ensures that even if someone gains physical access to the underlying storage, your secrets remain protected.

In addition, configure TLS for all inter-service communication. Many applications default to unencrypted connections within the cluster, assuming that private networks are secure by default. This assumption is risky in cloud environments, where network boundaries are often more fluid and less predictable. Refer to the Encrypting Data-at-Rest and Data-in-Transit for more information.

Image security

Another surface to secure that is often overlooked are container images. A single vulnerable base image or malicious dependency can compromise your entire application, and this is why the importance of image security cannot be overstated.

To secure your images, do the following:

Image scanning and vulnerability management

Implement automated image scanning in your CI/CD pipeline using tools like Amazon ECR image scanning or third-party solutions like Twistlock or Aqua Security. These tools identify known vulnerabilities in your images before they reach production.

With these solutions, you can establish rules and policies that prevent deployment of images with high-severity vulnerabilities.

While in an ideal world you would want to have zero vulnerabilities, it's sometimes an impractical goal.

Instead, focus on removing critical and high-severity vulnerabilities while managing medium and low-severity issues through regular patching cycles.

Admission controllers

Admission controllers are pieces of code that intercept requests to the Kubernetes APIserver after authentication and authorization, but before the object is persisted to etcd.

They validate or mutate resource requests.

So any time you create, update, or delete a Kubernetes resource, like a Pod, Deployment, or Secret, the Admission controllers get a chance to inspect or change the request.

They can enforce policies, apply defaults, or reject things that don’t meet certain criteria.

Monitoring and logging

Even after you’ve taken the necessary steps to secure your EKS cluster, that alone isn’t enough to guarantee complete peace of mind. You also need to continuously monitor your cluster and log events to ensure your systems remain healthy and secure over time.

Effective monitoring and logging creates the foundation for visibility, accountability, and rapid response, so that you can be ready even when things eventually goes wrong.

Here are the components of an effective monitoring and response strategy:

Control plane logging

AWS EKS provides comprehensive control plane logging capabilities that capture critical events and activities within your cluster's management layer. By enabling control plane logs, you gain visibility into API server requests, authenticator decisions, audit trails, and scheduler operations. These logs are automatically delivered to Amazon CloudWatch Logs, where you can analyze patterns, set up alerts, and maintain compliance requirements.

The five types of control plane logs available include API server logs for tracking all API requests, audit logs for security compliance, authenticator logs for authentication debugging, controller manager logs for resource management oversight, and scheduler logs for pod placement decisions.

Application and node monitoring

Beyond the control plane, your monitoring strategy must extend to the applications running within your cluster and the underlying worker nodes. Container-level metrics such as CPU usage, memory consumption, and network traffic patterns help you understand application performance and resource utilization. Node-level monitoring tracks system health, disk usage, and overall infrastructure stability.

Tools like Prometheus paired with Grafana provide powerful open-source monitoring capabilities, while AWS CloudWatch Container Insights offers native integration with EKS clusters. Implement custom metrics for your specific applications and establish baseline performance indicators so you can quickly identify when systems deviate from normal behavior.

Incident response planning

Having great monitoring is only valuable if you have a well-defined plan for what to do when an issue arises.

A great example of an amazing incident response was in 2018, when Tesla addressed vulnerabilities within hours of Redlock discovering them, before any customer data had been stolen.

Incident response planning involves developing clear, repeatable procedures for identifying, containing, and resolving security or operational incidents.

Your response strategy should include playbooks that outline step-by-step actions for different scenarios, such as compromised workloads, suspicious API activity, or unexpected resource exhaustion. These playbooks should specify how to isolate affected resources, collect forensic evidence, escalate to the appropriate teams, and communicate with stakeholders.

Wrapping up your EKS security journey

Securing your AWS EKS cluster isn’t a one-time task you can check off a list. It’s an ongoing process that requires careful planning, continuous improvement, and vigilance. From protecting your control plane and locking down your network to hardening pods, securing secrets, scanning images, and building robust monitoring and response practices, every layer contributes to your overall security posture.

While this guide has covered many of the most critical strategies and tools, remember that the most effective security programs are adaptive. New threats, vulnerabilities, and attack techniques will continue to emerge, and your defenses must evolve accordingly.

The key is to approach EKS security as a shared responsibility that extends beyond infrastructure. It’s about cultivating a culture where security is considered at every stage, design, development, deployment, and operation. With the right processes, tooling, and mindset in place, you’ll be well-equipped to protect your workloads and maintain the trust of your users, no matter how your Kubernetes environment grows.