Design Kubernetes HPA Auto-Scaling with Claude Code: Zero-Downtime Scaling

#claudecode #kubernetes #devops #aws

Introduction

Auto-scale on traffic spikes with Kubernetes HPA — implement CPU/memory and custom metrics-based autoscaling. Let Claude Code generate the design.

Generated HPA Config

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
    scaleDown:
      stabilizationWindowSeconds: 300

// Graceful shutdown for SIGTERM
process.on('SIGTERM', async () => {
  server.close(async () => {
    await Promise.all([prisma.$disconnect(), redis.quit()]);
    process.exit(0);
  });
  setTimeout(() => process.exit(1), 25_000);
});