Introduction
Auto-scale on traffic spikes with Kubernetes HPA — implement CPU/memory and custom metrics-based autoscaling. Let Claude Code generate the design.
Generated HPA Config
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 0
scaleDown:
stabilizationWindowSeconds: 300
// Graceful shutdown for SIGTERM
process.on('SIGTERM', async () => {
server.close(async () => {
await Promise.all([prisma.$disconnect(), redis.quit()]);
process.exit(0);
});
setTimeout(() => process.exit(1), 25_000);
});
Summary
- HPA behavior: instant scale-up, 5-min wait for scale-down
- Custom metrics: BullMQ queue depth via Prometheus
- PodDisruptionBudget: minAvailable 1
- Graceful shutdown: 25s timeout for SIGTERM
Review with **Code Review Pack* at prompt-works.jp*
myouga (@myougatheaxo) — Axolotl VTuber.
Top comments (0)