DEV Community

Cover image for Security Considerations for Multi-Cluster Cloud Architecture (HA EKS with Databases)
Nowsath for AWS Community Builders

Posted on

Security Considerations for Multi-Cluster Cloud Architecture (HA EKS with Databases)

Running a highly available multi-cluster EKS architecture brings powerful benefits—zero downtime, disaster recovery, and global scalability. But it also multiplies your security challenges.

Securing a single EKS cluster is already complex. Add multiple clusters across regions, databases with sensitive data, and cross-cluster communication, and the attack surface grows significantly. One misconfigured security group or exposed secret can compromise your entire infrastructure.

This guide covers essential security considerations for multi-cluster architectures: network isolation, encryption, IAM management, secrets handling, and incident response. We'll focus on practical measures that protect your infrastructure without sacrificing performance or availability.

Let's build a secure, highly available system.

1. Network Security & Isolation

VPC Architecture

  • Separate VPCs per cluster or use shared VPC with isolated subnets
  • Private subnets for EKS nodes and databases (no direct internet access)
  • Public subnets only for load balancers and NAT gateways
  • Implement VPC peering or AWS Transit Gateway for inter-cluster communication
  • Use separate VPCs per environment (dev, staging, production)

Network Segmentation

  • Production VPC-1 (Region A)

    • Public Subnets: ALB only
    • Private Subnets: EKS Nodes
    • Database Subnets: RDS/Aurora (isolated)
  • Production VPC-2 (Region B)

    • Public Subnets: ALB only
    • Private Subnets: EKS Nodes
    • Database Subnets: RDS/Aurora (isolated)

Security Groups

  • Principle of least privilege - only allow necessary ports
  • Database security groups: Only allow traffic from EKS node security groups
  • EKS control plane: Restrict API access to specific CIDR ranges
  • No 0.0.0.0/0 rules except for outbound NAT traffic
  • Document and regularly audit security group rules

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-to-db
spec:
  podSelector:
    matchLabels:
      app: backend
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432
Enter fullscreen mode Exit fullscreen mode

2. Identity & Access Management (IAM)

Cluster Access Control

  • AWS IAM authentication for cluster access via aws-auth ConfigMap
  • Never use permanent credentials in pods or containers
  • Implement IAM Roles for Service Accounts (IRSA) for pod-level permissions
  • Use AWS SSO/IAM Identity Center for human access
  • Separate IAM roles for different teams/applications
  • Enable MFA for all human users

IRSA (IAM Roles for Service Accounts)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/app-role
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: app-sa
      containers:
      - name: app
        image: myapp:latest
Enter fullscreen mode Exit fullscreen mode

Kubernetes RBAC

  • Role-Based Access Control (RBAC) for fine-grained permissions
  • Namespace isolation - separate namespaces per team/application
  • Principle of least privilege - minimal permissions needed
  • ClusterRoles for cluster-wide resources (use sparingly)
  • Roles for namespace-scoped resources
  • Regular RBAC audits
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: app-prod
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments", "services"]
  verbs: ["get", "list", "watch"]
# Read-only access, no delete/update
Enter fullscreen mode Exit fullscreen mode

Database Access

  • IAM Database Authentication for RDS/Aurora (where possible)
  • Avoid hardcoded credentials - use Secrets Manager or Parameter Store
  • Rotate credentials regularly (automated rotation)
  • Separate database users per application/service
  • Read-only replicas for non-critical workloads

3. Secrets Management

Never Store Secrets in Code or ConfigMaps
❌ Bad: Secrets in environment variables or ConfigMaps
✅ Good: External secrets management

AWS Secrets Manager / Parameter Store

  • Use External Secrets Operator or Secrets Store CSI Driver
  • Automatic rotation enabled
  • Encryption at rest with KMS
  • Audit access via CloudTrail
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
  target:
    name: db-secret
  data:
  - secretKey: password
    remoteRef:
      key: prod/db/password
Enter fullscreen mode Exit fullscreen mode

Alternative: HashiCorp Vault

  • Centralized secrets management across clusters
  • Dynamic secrets generation
  • Lease-based credentials
  • Fine-grained access policies

4. Encryption

Data at Rest

  • EKS etcd encryption using AWS KMS
  • EBS volumes encrypted (gp3 with KMS)
  • RDS/Aurora encryption enabled with KMS
  • S3 encryption (SSE-S3 or SSE-KMS)
  • Use customer-managed KMS keys for compliance requirements
  • Separate KMS keys per environment/cluster

Data in Transit

  • TLS/SSL everywhere:

    • ALB → Pods (via Ingress with TLS)
    • Pod → Pod (service mesh or mTLS)
    • Application → Database (SSL/TLS enforced)
  • Certificate management with AWS Certificate Manager or cert-manager

  • mTLS with service mesh (Istio, Linkerd, AWS App Mesh)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: secure-ingress
  annotations:
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
spec:
  tls:
  - hosts:
    - app.example.com
Enter fullscreen mode Exit fullscreen mode

5. Pod Security

Pod Security Standards

  • Enforce restricted pod security standards
  • No privileged containers unless absolutely necessary
  • Read-only root filesystem where possible
  • Non-root users for containers
  • Drop all capabilities and add only required ones
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
Enter fullscreen mode Exit fullscreen mode

Image Security

  • Scan images for vulnerabilities (Amazon ECR scanning, Trivy, Snyk)
  • Use minimal base images (distroless, Alpine)
  • Pin image versions - never use :latest
  • Sign and verify images (Sigstore/Cosign, Notary)
  • Private container registry (Amazon ECR with VPC endpoints)
  • Image pull secrets for private registries
    Admission Controllers

  • OPA/Gatekeeper or Kyverno for policy enforcement

  • Enforce security policies:

  • No privileged pods

  • Required resource limits

  • Approved registries only

  • Required security contexts

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged
spec:
  validationFailureAction: enforce
  rules:
  - name: check-privileged
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "Privileged containers are not allowed"
      pattern:
        spec:
          containers:
          - securityContext:
              privileged: false
Enter fullscreen mode Exit fullscreen mode

6. Multi-Cluster Security

Cluster Isolation

  • Separate clusters for different security zones (DMZ, internal, data)
  • Separate clusters per environment (never share prod and non-prod)
  • Separate AWS accounts per environment (AWS Organizations)
  • Service Control Policies (SCPs) to restrict actions at account level

Cross-Cluster Communication

  • Service mesh for secure cross-cluster communication (Istio multi-cluster)
  • VPC peering or Transit Gateway with strict security groups
  • mTLS for service-to-service authentication
  • API Gateway or Internal Load Balancer as entry points
  • Zero-trust networking - verify every request

DNS Security

  • Private Route53 hosted zones for internal services
  • DNSSEC where applicable
  • Avoid DNS-based service discovery across clusters (security risk)

7. Database Security

RDS/Aurora Security

  • Multi-AZ deployment for availability
  • Private subnets only - no public access
  • Encryption at rest (KMS) and in transit (SSL/TLS enforce)
  • Automated backups with encryption
  • Point-in-time recovery enabled
  • Enhanced monitoring enabled
  • Performance Insights with encryption

Connection Security

  • RDS Proxy for connection pooling and IAM authentication
  • SSL/TLS enforcement on database side
  • Certificate validation on client side
  • No hardcoded connection strings
# Database connection with SSL
DATABASE_URL: "postgresql://user@host:5432/db?sslmode=require"
Enter fullscreen mode Exit fullscreen mode

Database Access Control

  • Separate database users per service
  • Minimal privileges (SELECT only for read-only services)
  • No superuser access from applications
  • Parameter groups to enforce security settings
  • Audit logging enabled (PostgreSQL pgaudit, MySQL audit log)

Database Activity Monitoring

  • AWS Database Activity Streams for real-time monitoring
  • Alert on suspicious queries or access patterns
  • Log all DDL and privilege changes

8. Logging & Monitoring

Comprehensive Logging

  • EKS Control Plane logs to CloudWatch (API, audit, authenticator)
  • Application logs via FluentBit/Fluentd to centralized location
  • Database logs (query logs, error logs, slow query logs)
  • VPC Flow Logs for network traffic analysis
  • CloudTrail for all API calls
  • Immutable logs - prevent tampering

Security Monitoring

  • Amazon GuardDuty - threat detection
  • AWS Security Hub - centralized security findings
  • Amazon Detective - security investigation
  • Falco - runtime security monitoring in Kubernetes
  • Prometheus + Grafana for metrics and alerting

Audit Logging

# Enable EKS audit logs
apiVersion: v1
kind: ConfigMap
metadata:
  name: audit-policy
data:
  policy.yaml: |
    apiVersion: audit.k8s.io/v1
    kind: Policy
    rules:
    - level: Metadata
      omitStages:
      - RequestReceived
Enter fullscreen mode Exit fullscreen mode

Critical Alerts

  • Failed authentication attempts
  • Privileged container creation
  • Security group changes
  • Database connection failures
  • Unusual API calls
  • Resource exhaustion

9. Compliance & Governance

Compliance Frameworks

  • AWS Config - track configuration changes
  • AWS Audit Manager - compliance reporting
  • CIS Kubernetes Benchmark - security hardening
  • PCI-DSS, HIPAA, SOC 2 compliance where required
  • Regular penetration testing

Policy as Code

  • AWS Organizations with SCPs
  • CloudFormation/Terraform for infrastructure
  • GitOps for cluster configuration (ArgoCD/FluxCD)
  • OPA/Kyverno for admission control
  • Version control and peer review for all changes

Tagging Strategy

  • Mandatory tags: Environment, Owner, Project, CostCenter
  • Enforce tagging via AWS Config rules
  • Use tags for resource-level IAM policies
  • Cost allocation and chargeback

10. Disaster Recovery & Backup

Backup Strategy

  • Automated RDS snapshots (daily, 7-30 day retention)
  • Cross-region snapshot copies for DR
  • EBS snapshots for persistent volumes
  • etcd backups (Velero for cluster backups)
  • GitOps - cluster configuration in Git

Disaster Recovery

  • Multi-region setup for critical applications
  • RTO/RPO requirements documented
  • Failover procedures tested regularly
  • Regular DR drills (quarterly minimum)
  • Automated failover where possible (Route53 health checks)

11. Supply Chain Security

Container Supply Chain

  • Verify base images from trusted sources
  • SBOM (Software Bill of Materials) for dependencies
  • Vulnerability scanning in CI/CD pipeline
  • Sign images (Cosign/Notary)
  • Admission controller to verify signatures

Dependency Management

  • Dependabot or Renovate for automated updates
  • Regular security patching
  • Monitor CVEs for used software
  • Minimal dependencies principle

12. Incident Response

Preparation

  • Incident response plan documented
  • Runbooks for common scenarios
  • On-call rotation defined
  • Communication channels established
  • Post-mortem process defined

Detection & Response

  • Automated alerting for security events
  • Isolate compromised pods/nodes immediately
  • Forensics capability (preserve logs and state)
  • Contact AWS Support for suspected breaches
  • Notify stakeholders per incident severity

13. API Gateway & Service Mesh

API Security

  • AWS API Gateway or Kong/Envoy for API management
  • Rate limiting to prevent abuse
  • API keys or OAuth2 for authentication
  • WAF (Web Application Firewall) rules
  • DDoS protection via AWS Shield

Service Mesh Benefits

  • mTLS for all service-to-service communication
  • Zero-trust networking model
  • Fine-grained authorization policies
  • Observability and traffic monitoring
  • Circuit breaking and fault injection

14. Update & Patch Management

EKS Updates

  • Regular cluster updates (Kubernetes version support: ~14 months)
  • Test in non-prod first
  • Blue-green cluster upgrades for zero downtime
  • Node group rolling updates

Security Patching

  • Automated node updates (EKS managed node groups)
  • Bottlerocket OS for minimal attack surface
  • Container image updates (rebuild regularly)
  • Database patching during maintenance windows

Security Checklist

  • Private subnets for EKS nodes and databases
  • Security groups with least privilege
  • Network policies enforced
  • IRSA configured for all pods needing AWS access
  • No hardcoded credentials anywhere
  • Secrets Manager/Parameter Store with rotation
  • All encryption at rest enabled (etcd, EBS, RDS, S3)
  • TLS/SSL enforced everywhere
  • Pod security standards enforced (restricted)
  • Image scanning in CI/CD
  • Admission controllers (OPA/Kyverno) configured
  • GuardDuty and Security Hub enabled
  • Comprehensive logging to CloudWatch
  • CloudTrail enabled in all regions
  • VPC Flow Logs enabled
  • Regular backups with cross-region copies
  • Multi-factor authentication enforced
  • RBAC properly configured
  • Regular security audits scheduled
  • Incident response plan documented

Summary
Security in multi-cluster architectures requires a defense-in-depth approach:

  • Network isolation at every layer
  • Zero-trust model - verify everything
  • Encryption everywhere (at rest and in transit)
  • Least privilege access for humans and workloads
  • Continuous monitoring and alerting
  • Regular audits and compliance checks
  • Automation for consistency and reliability

Security is not a one-time setup but an ongoing process requiring continuous improvement and vigilance.

Top comments (0)