In Part 1, we explored KEDA and how it scales your consumers based on queue depth.
But what if:
- You have N queues → M consumers
- Each queue has different thresholds, min, and max replicas
- Each consumer has a different workflow / endpoint
KEDA alone can’t handle this. That’s where a custom autoscaler comes in.
Problem Statement
Example scenario:
Queue | Consumer Deployment | Threshold | Min Pods | Max Pods |
---|---|---|---|---|
x-queue-1 | consumer-x-1 | 100 | 1 | 5 |
x-queue-2 | consumer-x-2 | 50 | 2 | 6 |
y-queue | consumer-y | 200 | 1 | 10 |
- Each queue has its own processing logic.
- Scaling decisions must be independent.
- Min/Max replicas can be defined dynamically in a database for flexibility.
Architecture Overview
┌───────────────────────┐
│ Producers │
│ (Apps push messages │
│ to queues) │
└─────────┬─────────────┘
│
▼
┌─────────────────┐
│ Message Broker │
│ (Kafka/Rabbit) │
└─────────┬────────┘
│
┌──────────┴──────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Python Exporter │ │ Prometheus Metrics │
│ - Reads queue depth │◄─────►│ storage │
│ - Reads min/max │ └─────────────────────┘
│ from DB │
│ - Applies scaling │
│ logic per queue │
│ - Calls Kubernetes │
│ API to scale │
└─────────┬───────────┘
│
▼
┌─────────────────────────────┐
│ Consumer Deployments │
│ - pod-deployment-1 │
│ - pod-deployment-2 │
│ - pod-deployment-y │
└─────────────────────────────┘
Python Example: Custom Scaling with DB Min/Max
from kubernetes import client, config
import requests
import sqlite3 # Example DB; replace with your real DB
# Load in-cluster config
config.load_incluster_config()
apps_v1 = client.AppsV1Api()
# Connect to database containing min/max per consumer
conn = sqlite3.connect('consumer_scaling.db')
cursor = conn.cursor()
# Read scaling config for all consumers
cursor.execute("SELECT consumer_name, queue_name, threshold, min_pods, max_pods FROM scaling_config")
scaling_rules = cursor.fetchall()
# Prometheus endpoint
prometheus_url = "http://prometheus:9090/api/v1/query"
for consumer_name, queue_name, threshold, min_pods, max_pods in queue_config:
# Query current queue depth
query = f'sum(queue_depth{{queue="{queue_name}"}})'
resp = requests.get(prometheus_url, params={"query": query}).json()
queue_depth = float(resp["data"]["result"][0]["value"][1])
# Calculate desired replicas using custom logic
desired_replicas = max(min_pods, min(max_pods, int(queue_depth / threshold)))
# Scale Deployment
scale = client.V1Scale(spec=client.V1ScaleSpec(replicas=desired_replicas))
apps_v1.replace_namespaced_deployment_scale(
name=consumer_name,
namespace="default",
body=scale
)
print(f"{consumer_name}: queue={queue_depth}, threshold={threshold}, scaled to {desired_replicas} pods")
Notes
-
threshold
is per queue. -
min_pods
andmax_pods
are read from a database, making it dynamic. - You can extend logic to include weighted scaling, multiple metrics, or cooldown periods.
Advantages of Custom Autoscaler
- Independent scaling per queue
- Dynamic min/max replicas from DB — no hardcoding
- Multi-metric decisions (queue depth, CPU, DB lag, etc.)
- Advanced logic (cooldowns, weighted scaling, prioritization)
- Can handle N queues → M consumers mapping flexibly
Trade-offs
Feature | KEDA | Custom Autoscaler |
---|---|---|
Easy setup | ✅ | ❌ (Python + DB + K8s API) |
Independent queue scaling | ❌ | ✅ |
Multi-metric logic | Limited | ✅ |
DB-driven min/max | ❌ | ✅ |
Reliability | ✅ Battle-tested | ⚠️ Managed by you |
✅ Takeaways
- KEDA is great for simple queue scaling.
- For complex microservices with multiple queues, custom autoscaler gives full control.
- Using a database for min/max replicas allows dynamic, production-ready scaling policies.
- Your custom autoscaler can evolve into a custom HPA/KEDA tailored to your architecture.
💡 Pro Tip:
Start with KEDA for simple cases. Move to a custom autoscaler with DB-defined min/max for multi-queue microservices with complex workflows.
Top comments (0)