Nowsath for AWS Community Builders

Posted on Oct 26, 2025 • Edited on Nov 2, 2025

Security Considerations for Multi-Cluster Cloud Architecture (HA EKS with Databases)

#security #eks #rds #aws

Running a highly available multi-cluster EKS architecture brings powerful benefits—zero downtime, disaster recovery, and global scalability. But it also multiplies your security challenges.

Securing a single EKS cluster is already complex. Add multiple clusters across regions, databases with sensitive data, and cross-cluster communication, and the attack surface grows significantly. One misconfigured security group or exposed secret can compromise your entire infrastructure.

This guide covers essential security considerations for multi-cluster architectures: network isolation, encryption, IAM management, secrets handling, and incident response. We'll focus on practical measures that protect your infrastructure without sacrificing performance or availability.

Let's build a secure, highly available system.

1. Network Security & Isolation

VPC Architecture

Separate VPCs per cluster or use shared VPC with isolated subnets
Private subnets for EKS nodes and databases (no direct internet access)
Public subnets only for load balancers and NAT gateways
Implement VPC peering or AWS Transit Gateway for inter-cluster communication
Use separate VPCs per environment (dev, staging, production)

Network Segmentation

Production VPC-1 (Region A)
- Public Subnets: ALB only
- Private Subnets: EKS Nodes
- Database Subnets: RDS/Aurora (isolated)
Production VPC-2 (Region B)
- Public Subnets: ALB only
- Private Subnets: EKS Nodes
- Database Subnets: RDS/Aurora (isolated)

Security Groups

Principle of least privilege - only allow necessary ports
Database security groups: Only allow traffic from EKS node security groups
EKS control plane: Restrict API access to specific CIDR ranges
No 0.0.0.0/0 rules except for outbound NAT traffic
Document and regularly audit security group rules

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-to-db
spec:
  podSelector:
    matchLabels:
      app: backend
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432

2. Identity & Access Management (IAM)

Cluster Access Control

AWS IAM authentication for cluster access via aws-auth ConfigMap
Never use permanent credentials in pods or containers
Implement IAM Roles for Service Accounts (IRSA) for pod-level permissions
Use AWS SSO/IAM Identity Center for human access
Separate IAM roles for different teams/applications
Enable MFA for all human users

IRSA (IAM Roles for Service Accounts)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/app-role
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: app-sa
      containers:
      - name: app
        image: myapp:latest

Kubernetes RBAC

Role-Based Access Control (RBAC) for fine-grained permissions
Namespace isolation - separate namespaces per team/application
Principle of least privilege - minimal permissions needed
ClusterRoles for cluster-wide resources (use sparingly)
Roles for namespace-scoped resources
Regular RBAC audits

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: app-prod
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments", "services"]
  verbs: ["get", "list", "watch"]
# Read-only access, no delete/update

Database Access

IAM Database Authentication for RDS/Aurora (where possible)
Avoid hardcoded credentials - use Secrets Manager or Parameter Store
Rotate credentials regularly (automated rotation)
Separate database users per application/service
Read-only replicas for non-critical workloads

3. Secrets Management

Never Store Secrets in Code or ConfigMaps
❌ Bad: Secrets in environment variables or ConfigMaps
✅ Good: External secrets management

AWS Secrets Manager / Parameter Store

Use External Secrets Operator or Secrets Store CSI Driver
Automatic rotation enabled
Encryption at rest with KMS
Audit access via CloudTrail

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
  target:
    name: db-secret
  data:
  - secretKey: password
    remoteRef:
      key: prod/db/password

Alternative: HashiCorp Vault

Centralized secrets management across clusters
Dynamic secrets generation
Lease-based credentials
Fine-grained access policies

4. Encryption

Data at Rest

EKS etcd encryption using AWS KMS
EBS volumes encrypted (gp3 with KMS)
RDS/Aurora encryption enabled with KMS
S3 encryption (SSE-S3 or SSE-KMS)
Use customer-managed KMS keys for compliance requirements
Separate KMS keys per environment/cluster

Data in Transit

TLS/SSL everywhere:
- ALB → Pods (via Ingress with TLS)
- Pod → Pod (service mesh or mTLS)
- Application → Database (SSL/TLS enforced)
Certificate management with AWS Certificate Manager or cert-manager
mTLS with service mesh (Istio, Linkerd, AWS App Mesh)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: secure-ingress
  annotations:
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
spec:
  tls:
  - hosts:
    - app.example.com

5. Pod Security

Pod Security Standards

Enforce restricted pod security standards
No privileged containers unless absolutely necessary
Read-only root filesystem where possible
Non-root users for containers
Drop all capabilities and add only required ones

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

Image Security

Scan images for vulnerabilities (Amazon ECR scanning, Trivy, Snyk)
Use minimal base images (distroless, Alpine)
Pin image versions - never use :latest
Sign and verify images (Sigstore/Cosign, Notary)
Private container registry (Amazon ECR with VPC endpoints)
Image pull secrets for private registries

Admission Controllers

OPA/Gatekeeper or Kyverno for policy enforcement
Enforce security policies:
No privileged pods
Required resource limits
Approved registries only
Required security contexts

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged
spec:
  validationFailureAction: enforce
  rules:
  - name: check-privileged
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "Privileged containers are not allowed"
      pattern:
        spec:
          containers:
          - securityContext:
              privileged: false

6. Multi-Cluster Security

Cluster Isolation

Separate clusters for different security zones (DMZ, internal, data)
Separate clusters per environment (never share prod and non-prod)
Separate AWS accounts per environment (AWS Organizations)
Service Control Policies (SCPs) to restrict actions at account level

Cross-Cluster Communication

Service mesh for secure cross-cluster communication (Istio multi-cluster)
VPC peering or Transit Gateway with strict security groups
mTLS for service-to-service authentication
API Gateway or Internal Load Balancer as entry points
Zero-trust networking - verify every request

DNS Security

Private Route53 hosted zones for internal services
DNSSEC where applicable
Avoid DNS-based service discovery across clusters (security risk)

7. Database Security

RDS/Aurora Security

Multi-AZ deployment for availability
Private subnets only - no public access
Encryption at rest (KMS) and in transit (SSL/TLS enforce)
Automated backups with encryption
Point-in-time recovery enabled
Enhanced monitoring enabled
Performance Insights with encryption

Connection Security

RDS Proxy for connection pooling and IAM authentication
SSL/TLS enforcement on database side
Certificate validation on client side
No hardcoded connection strings

# Database connection with SSL
DATABASE_URL: "postgresql://user@host:5432/db?sslmode=require"

Database Access Control

Separate database users per service
Minimal privileges (SELECT only for read-only services)
No superuser access from applications
Parameter groups to enforce security settings
Audit logging enabled (PostgreSQL pgaudit, MySQL audit log)

Database Activity Monitoring

AWS Database Activity Streams for real-time monitoring
Alert on suspicious queries or access patterns
Log all DDL and privilege changes

8. Logging & Monitoring

Comprehensive Logging

EKS Control Plane logs to CloudWatch (API, audit, authenticator)
Application logs via FluentBit/Fluentd to centralized location
Database logs (query logs, error logs, slow query logs)
VPC Flow Logs for network traffic analysis
CloudTrail for all API calls
Immutable logs - prevent tampering

Security Monitoring

Amazon GuardDuty - threat detection
AWS Security Hub - centralized security findings
Amazon Detective - security investigation
Falco - runtime security monitoring in Kubernetes
Prometheus + Grafana for metrics and alerting

Audit Logging

# Enable EKS audit logs
apiVersion: v1
kind: ConfigMap
metadata:
  name: audit-policy
data:
  policy.yaml: |
    apiVersion: audit.k8s.io/v1
    kind: Policy
    rules:
    - level: Metadata
      omitStages:
      - RequestReceived

Critical Alerts

Failed authentication attempts
Privileged container creation
Security group changes
Database connection failures
Unusual API calls
Resource exhaustion

9. Compliance & Governance

Compliance Frameworks

AWS Config - track configuration changes
AWS Audit Manager - compliance reporting
CIS Kubernetes Benchmark - security hardening
PCI-DSS, HIPAA, SOC 2 compliance where required
Regular penetration testing

Policy as Code

AWS Organizations with SCPs
CloudFormation/Terraform for infrastructure
GitOps for cluster configuration (ArgoCD/FluxCD)
OPA/Kyverno for admission control
Version control and peer review for all changes

Tagging Strategy

Mandatory tags: Environment, Owner, Project, CostCenter
Enforce tagging via AWS Config rules
Use tags for resource-level IAM policies
Cost allocation and chargeback

10. Disaster Recovery & Backup

Backup Strategy

Automated RDS snapshots (daily, 7-30 day retention)
Cross-region snapshot copies for DR
EBS snapshots for persistent volumes
etcd backups (Velero for cluster backups)
GitOps - cluster configuration in Git

Disaster Recovery

Multi-region setup for critical applications
RTO/RPO requirements documented
Failover procedures tested regularly
Regular DR drills (quarterly minimum)
Automated failover where possible (Route53 health checks)

11. Supply Chain Security

Container Supply Chain

Verify base images from trusted sources
SBOM (Software Bill of Materials) for dependencies
Vulnerability scanning in CI/CD pipeline
Sign images (Cosign/Notary)
Admission controller to verify signatures

Dependency Management

Dependabot or Renovate for automated updates
Regular security patching
Monitor CVEs for used software
Minimal dependencies principle

12. Incident Response

Preparation

Incident response plan documented
Runbooks for common scenarios
On-call rotation defined
Communication channels established
Post-mortem process defined

Detection & Response

Automated alerting for security events
Isolate compromised pods/nodes immediately
Forensics capability (preserve logs and state)
Contact AWS Support for suspected breaches
Notify stakeholders per incident severity

13. API Gateway & Service Mesh

API Security

AWS API Gateway or Kong/Envoy for API management
Rate limiting to prevent abuse
API keys or OAuth2 for authentication
WAF (Web Application Firewall) rules
DDoS protection via AWS Shield

Service Mesh Benefits

mTLS for all service-to-service communication
Zero-trust networking model
Fine-grained authorization policies
Observability and traffic monitoring
Circuit breaking and fault injection

14. Update & Patch Management

EKS Updates

Regular cluster updates (Kubernetes version support: ~14 months)
Test in non-prod first
Blue-green cluster upgrades for zero downtime
Node group rolling updates

Security Patching

Automated node updates (EKS managed node groups)
Bottlerocket OS for minimal attack surface
Container image updates (rebuild regularly)
Database patching during maintenance windows

Security Checklist

Private subnets for EKS nodes and databases
Security groups with least privilege
Network policies enforced
IRSA configured for all pods needing AWS access
No hardcoded credentials anywhere
Secrets Manager/Parameter Store with rotation
All encryption at rest enabled (etcd, EBS, RDS, S3)
TLS/SSL enforced everywhere
Pod security standards enforced (restricted)
Image scanning in CI/CD
Admission controllers (OPA/Kyverno) configured
GuardDuty and Security Hub enabled
Comprehensive logging to CloudWatch
CloudTrail enabled in all regions
VPC Flow Logs enabled
Regular backups with cross-region copies
Multi-factor authentication enforced
RBAC properly configured
Regular security audits scheduled
Incident response plan documented

Summary
Security in multi-cluster architectures requires a defense-in-depth approach:

Network isolation at every layer
Zero-trust model - verify everything
Encryption everywhere (at rest and in transit)
Least privilege access for humans and workloads
Continuous monitoring and alerting
Regular audits and compliance checks
Automation for consistency and reliability

Security is not a one-time setup but an ongoing process requiring continuous improvement and vigilance.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.