Running a highly available multi-cluster EKS architecture brings powerful benefits—zero downtime, disaster recovery, and global scalability. But it also multiplies your security challenges.
Securing a single EKS cluster is already complex. Add multiple clusters across regions, databases with sensitive data, and cross-cluster communication, and the attack surface grows significantly. One misconfigured security group or exposed secret can compromise your entire infrastructure.
This guide covers essential security considerations for multi-cluster architectures: network isolation, encryption, IAM management, secrets handling, and incident response. We'll focus on practical measures that protect your infrastructure without sacrificing performance or availability.
Let's build a secure, highly available system.
1. Network Security & Isolation
VPC Architecture
- Separate VPCs per cluster or use shared VPC with isolated subnets
- Private subnets for EKS nodes and databases (no direct internet access)
- Public subnets only for load balancers and NAT gateways
- Implement VPC peering or AWS Transit Gateway for inter-cluster communication
- Use separate VPCs per environment (dev, staging, production)
Network Segmentation
-
Production VPC-1 (Region A)
- Public Subnets: ALB only
- Private Subnets: EKS Nodes
- Database Subnets: RDS/Aurora (isolated)
-
Production VPC-2 (Region B)
- Public Subnets: ALB only
- Private Subnets: EKS Nodes
- Database Subnets: RDS/Aurora (isolated)
Security Groups
- Principle of least privilege - only allow necessary ports
- Database security groups: Only allow traffic from EKS node security groups
- EKS control plane: Restrict API access to specific CIDR ranges
- No 0.0.0.0/0 rules except for outbound NAT traffic
- Document and regularly audit security group rules
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-to-db
spec:
podSelector:
matchLabels:
app: backend
egress:
- to:
- namespaceSelector:
matchLabels:
name: database
ports:
- protocol: TCP
port: 5432
2. Identity & Access Management (IAM)
Cluster Access Control
- AWS IAM authentication for cluster access via aws-auth ConfigMap
- Never use permanent credentials in pods or containers
- Implement IAM Roles for Service Accounts (IRSA) for pod-level permissions
- Use AWS SSO/IAM Identity Center for human access
- Separate IAM roles for different teams/applications
- Enable MFA for all human users
IRSA (IAM Roles for Service Accounts)
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/app-role
---
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
serviceAccountName: app-sa
containers:
- name: app
image: myapp:latest
Kubernetes RBAC
- Role-Based Access Control (RBAC) for fine-grained permissions
- Namespace isolation - separate namespaces per team/application
- Principle of least privilege - minimal permissions needed
- ClusterRoles for cluster-wide resources (use sparingly)
- Roles for namespace-scoped resources
- Regular RBAC audits
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer
namespace: app-prod
rules:
- apiGroups: ["", "apps"]
resources: ["pods", "deployments", "services"]
verbs: ["get", "list", "watch"]
# Read-only access, no delete/update
Database Access
- IAM Database Authentication for RDS/Aurora (where possible)
- Avoid hardcoded credentials - use Secrets Manager or Parameter Store
- Rotate credentials regularly (automated rotation)
- Separate database users per application/service
- Read-only replicas for non-critical workloads
3. Secrets Management
Never Store Secrets in Code or ConfigMaps
❌ Bad: Secrets in environment variables or ConfigMaps
✅ Good: External secrets management
AWS Secrets Manager / Parameter Store
- Use External Secrets Operator or Secrets Store CSI Driver
- Automatic rotation enabled
- Encryption at rest with KMS
- Audit access via CloudTrail
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
target:
name: db-secret
data:
- secretKey: password
remoteRef:
key: prod/db/password
Alternative: HashiCorp Vault
- Centralized secrets management across clusters
- Dynamic secrets generation
- Lease-based credentials
- Fine-grained access policies
4. Encryption
Data at Rest
- EKS etcd encryption using AWS KMS
- EBS volumes encrypted (gp3 with KMS)
- RDS/Aurora encryption enabled with KMS
- S3 encryption (SSE-S3 or SSE-KMS)
- Use customer-managed KMS keys for compliance requirements
- Separate KMS keys per environment/cluster
Data in Transit
-
TLS/SSL everywhere:
- ALB → Pods (via Ingress with TLS)
- Pod → Pod (service mesh or mTLS)
- Application → Database (SSL/TLS enforced)
Certificate management with AWS Certificate Manager or cert-manager
mTLS with service mesh (Istio, Linkerd, AWS App Mesh)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: secure-ingress
annotations:
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
spec:
tls:
- hosts:
- app.example.com
5. Pod Security
Pod Security Standards
- Enforce restricted pod security standards
- No privileged containers unless absolutely necessary
- Read-only root filesystem where possible
- Non-root users for containers
- Drop all capabilities and add only required ones
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Image Security
- Scan images for vulnerabilities (Amazon ECR scanning, Trivy, Snyk)
- Use minimal base images (distroless, Alpine)
- Pin image versions - never use :latest
- Sign and verify images (Sigstore/Cosign, Notary)
- Private container registry (Amazon ECR with VPC endpoints)
Image pull secrets for private registries
Admission ControllersOPA/Gatekeeper or Kyverno for policy enforcement
Enforce security policies:
No privileged pods
Required resource limits
Approved registries only
Required security contexts
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged
spec:
validationFailureAction: enforce
rules:
- name: check-privileged
match:
resources:
kinds:
- Pod
validate:
message: "Privileged containers are not allowed"
pattern:
spec:
containers:
- securityContext:
privileged: false
6. Multi-Cluster Security
Cluster Isolation
- Separate clusters for different security zones (DMZ, internal, data)
- Separate clusters per environment (never share prod and non-prod)
- Separate AWS accounts per environment (AWS Organizations)
- Service Control Policies (SCPs) to restrict actions at account level
Cross-Cluster Communication
- Service mesh for secure cross-cluster communication (Istio multi-cluster)
- VPC peering or Transit Gateway with strict security groups
- mTLS for service-to-service authentication
- API Gateway or Internal Load Balancer as entry points
- Zero-trust networking - verify every request
DNS Security
- Private Route53 hosted zones for internal services
- DNSSEC where applicable
- Avoid DNS-based service discovery across clusters (security risk)
7. Database Security
RDS/Aurora Security
- Multi-AZ deployment for availability
- Private subnets only - no public access
- Encryption at rest (KMS) and in transit (SSL/TLS enforce)
- Automated backups with encryption
- Point-in-time recovery enabled
- Enhanced monitoring enabled
- Performance Insights with encryption
Connection Security
- RDS Proxy for connection pooling and IAM authentication
- SSL/TLS enforcement on database side
- Certificate validation on client side
- No hardcoded connection strings
# Database connection with SSL
DATABASE_URL: "postgresql://user@host:5432/db?sslmode=require"
Database Access Control
- Separate database users per service
- Minimal privileges (SELECT only for read-only services)
- No superuser access from applications
- Parameter groups to enforce security settings
- Audit logging enabled (PostgreSQL pgaudit, MySQL audit log)
Database Activity Monitoring
- AWS Database Activity Streams for real-time monitoring
- Alert on suspicious queries or access patterns
- Log all DDL and privilege changes
8. Logging & Monitoring
Comprehensive Logging
- EKS Control Plane logs to CloudWatch (API, audit, authenticator)
- Application logs via FluentBit/Fluentd to centralized location
- Database logs (query logs, error logs, slow query logs)
- VPC Flow Logs for network traffic analysis
- CloudTrail for all API calls
- Immutable logs - prevent tampering
Security Monitoring
- Amazon GuardDuty - threat detection
- AWS Security Hub - centralized security findings
- Amazon Detective - security investigation
- Falco - runtime security monitoring in Kubernetes
- Prometheus + Grafana for metrics and alerting
Audit Logging
# Enable EKS audit logs
apiVersion: v1
kind: ConfigMap
metadata:
name: audit-policy
data:
policy.yaml: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
omitStages:
- RequestReceived
Critical Alerts
- Failed authentication attempts
- Privileged container creation
- Security group changes
- Database connection failures
- Unusual API calls
- Resource exhaustion
9. Compliance & Governance
Compliance Frameworks
- AWS Config - track configuration changes
- AWS Audit Manager - compliance reporting
- CIS Kubernetes Benchmark - security hardening
- PCI-DSS, HIPAA, SOC 2 compliance where required
- Regular penetration testing
Policy as Code
- AWS Organizations with SCPs
- CloudFormation/Terraform for infrastructure
- GitOps for cluster configuration (ArgoCD/FluxCD)
- OPA/Kyverno for admission control
- Version control and peer review for all changes
Tagging Strategy
- Mandatory tags: Environment, Owner, Project, CostCenter
- Enforce tagging via AWS Config rules
- Use tags for resource-level IAM policies
- Cost allocation and chargeback
10. Disaster Recovery & Backup
Backup Strategy
- Automated RDS snapshots (daily, 7-30 day retention)
- Cross-region snapshot copies for DR
- EBS snapshots for persistent volumes
- etcd backups (Velero for cluster backups)
- GitOps - cluster configuration in Git
Disaster Recovery
- Multi-region setup for critical applications
- RTO/RPO requirements documented
- Failover procedures tested regularly
- Regular DR drills (quarterly minimum)
- Automated failover where possible (Route53 health checks)
11. Supply Chain Security
Container Supply Chain
- Verify base images from trusted sources
- SBOM (Software Bill of Materials) for dependencies
- Vulnerability scanning in CI/CD pipeline
- Sign images (Cosign/Notary)
- Admission controller to verify signatures
Dependency Management
- Dependabot or Renovate for automated updates
- Regular security patching
- Monitor CVEs for used software
- Minimal dependencies principle
12. Incident Response
Preparation
- Incident response plan documented
- Runbooks for common scenarios
- On-call rotation defined
- Communication channels established
- Post-mortem process defined
Detection & Response
- Automated alerting for security events
- Isolate compromised pods/nodes immediately
- Forensics capability (preserve logs and state)
- Contact AWS Support for suspected breaches
- Notify stakeholders per incident severity
13. API Gateway & Service Mesh
API Security
- AWS API Gateway or Kong/Envoy for API management
- Rate limiting to prevent abuse
- API keys or OAuth2 for authentication
- WAF (Web Application Firewall) rules
- DDoS protection via AWS Shield
Service Mesh Benefits
- mTLS for all service-to-service communication
- Zero-trust networking model
- Fine-grained authorization policies
- Observability and traffic monitoring
- Circuit breaking and fault injection
14. Update & Patch Management
EKS Updates
- Regular cluster updates (Kubernetes version support: ~14 months)
- Test in non-prod first
- Blue-green cluster upgrades for zero downtime
- Node group rolling updates
Security Patching
- Automated node updates (EKS managed node groups)
- Bottlerocket OS for minimal attack surface
- Container image updates (rebuild regularly)
- Database patching during maintenance windows
Security Checklist
- Private subnets for EKS nodes and databases
- Security groups with least privilege
- Network policies enforced
- IRSA configured for all pods needing AWS access
- No hardcoded credentials anywhere
- Secrets Manager/Parameter Store with rotation
- All encryption at rest enabled (etcd, EBS, RDS, S3)
- TLS/SSL enforced everywhere
- Pod security standards enforced (restricted)
- Image scanning in CI/CD
- Admission controllers (OPA/Kyverno) configured
- GuardDuty and Security Hub enabled
- Comprehensive logging to CloudWatch
- CloudTrail enabled in all regions
- VPC Flow Logs enabled
- Regular backups with cross-region copies
- Multi-factor authentication enforced
- RBAC properly configured
- Regular security audits scheduled
- Incident response plan documented
Summary
Security in multi-cluster architectures requires a defense-in-depth approach:
- Network isolation at every layer
- Zero-trust model - verify everything
- Encryption everywhere (at rest and in transit)
- Least privilege access for humans and workloads
- Continuous monitoring and alerting
- Regular audits and compliance checks
- Automation for consistency and reliability
Security is not a one-time setup but an ongoing process requiring continuous improvement and vigilance.
Top comments (0)