DEV Community

myougaTheAxo
myougaTheAxo

Posted on

Design Kubernetes HPA Auto-Scaling with Claude Code: Zero-Downtime Scaling

Introduction

Auto-scale on traffic spikes with Kubernetes HPA — implement CPU/memory and custom metrics-based autoscaling. Let Claude Code generate the design.

Generated HPA Config

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
    scaleDown:
      stabilizationWindowSeconds: 300
Enter fullscreen mode Exit fullscreen mode
// Graceful shutdown for SIGTERM
process.on('SIGTERM', async () => {
  server.close(async () => {
    await Promise.all([prisma.$disconnect(), redis.quit()]);
    process.exit(0);
  });
  setTimeout(() => process.exit(1), 25_000);
});
Enter fullscreen mode Exit fullscreen mode

Summary

  1. HPA behavior: instant scale-up, 5-min wait for scale-down
  2. Custom metrics: BullMQ queue depth via Prometheus
  3. PodDisruptionBudget: minAvailable 1
  4. Graceful shutdown: 25s timeout for SIGTERM

Review with **Code Review Pack* at prompt-works.jp*

myouga (@myougatheaxo) — Axolotl VTuber.

Top comments (0)