HAL Meta-Scheduler: An Adaptive Layer That Learns How to Balance Your Cluster

#opensource #kubernetes #devops #aiops

🚀 Overview

HAL Meta-Scheduler is an adaptive orchestration layer that learns how to balance workloads in real time.

It doesn't replace your scheduler — it teaches it to breathe.

This open-source demo shows how simple feedback metrics can keep a distributed system stable under changing load and still save energy.

No proprietary math or hidden weights — everything you see here is functional and reproducible.

🧩 What It Does

HAL observes your cluster through four lightweight signals:

Symbol	Meaning	Role
σ	Coherence — how evenly the load is spread	stability indicator
H	Entropy — diversity of jobs per node	utilization diversity
δ	Queue drift — rate of pending growth	stress level
Φ	Informational potential — combined system tension	energy/stability metric

These are computed continuously and used to adjust the balance between packing (energy-efficient) and spreading (latency-resilient).

The result: fewer spikes, smoother utilization curves, and lower total energy per job.

⚙️ How It Works

HAL is implemented as a simple control layer:

Simulator – synthetic cluster with N nodes and a Poisson workload generator
Controllers – heuristic, PID, and Bayesian variants that adapt parameter p ∈ [0,1] (pack ↔ spread)
Metrics server – FastAPI + Prometheus /metrics endpoint for dashboards
Helm chart – deployable metrics demo for Kubernetes
Grafana dashboard – real-time visualization of σ, H, δ, Φ, and p

Everything runs locally with no external dependencies.

git clone https://github.com/Freeky7819/halms-demo
cd halms-demo
python -m venv .venv
.venv/Scripts/pip install -r requirements.txt
python simulate.py --steps 1500
python plot_metrics.py

yaml
Copy code

You’ll see two traces:

baseline (static scheduler)
adaptive HAL (dynamic control)

📊 Example Output

Queue spikes reduced by 40–70 %
Coherence σ stabilized near 0.9
Adaptive parameter p converging to steady state
Smooth Φ (stress metric) vs time

Even this demo, using only PID/Bayesian logic, shows how feedback control beats static heuristics for scheduling.

🧠 Why It Matters

Modern clusters waste cycles and energy because schedulers are blind to system feedback.

They rely on fixed heuristics like “bin pack until 80 % CPU” or “spread by labels”.

HAL introduces self-tuning — it reads the system’s own signals and re-balances automatically.

Benefits

✅ Reduced queue oscillations
⚡ Energy efficiency via adaptive packing
📈 Predictable latency under load
🔍 Native observability (Prometheus + Grafana)

Use cases

Kubernetes (as a policy advisor / extender)
HPC or SLURM queues
AI/ML job orchestrators
Edge or hybrid clusters

🧰 Tech Stack

Python 3.11 · FastAPI · Prometheus client · Helm v3 · Grafana · GitHub Actions CI (lint + SBOM)

License: Apache 2.0

🧭 Open vs Enterprise

Feature	Public Demo	Enterprise
Core control	heuristic, PID, Bayesian	proprietary resonant kernel
Deployment	metrics demo (Helm)	full operator + extender
Multi-cluster control	—	✅
Historical analytics	basic	advanced
SLA & support	community	commercial

The open demo is fully working — no placeholders — and safe for public use.

The enterprise version builds on this foundation for production-grade orchestration.

🧪 Try It

Live repo → github.com/Freeky7819/halms-demo

Run metrics server
python -m uvicorn server:app --host 127.0.0.1 --port 8015

Then open:
http://127.0.0.1:8015/metrics
or http://127.0.0.1:8015/live
yaml
Copy code

🤝 Contribute

Feedback, issues, and forks are welcome.

We’re particularly interested in:

new stability metrics
dataset-driven tuning
multi-cluster experimentation

Open discussions or PRs — everything helps us improve the adaptive model.

HAL is open, safe, and ready to explore.

If you’ve ever wondered what a scheduler with a feedback loop would look like — this is your playground.

🔗 GitHub → Freeky7819/halms-demo