DEV Community

Damjan Žakelj
Damjan Žakelj

Posted on • Originally published at github.com

HAL Meta-Scheduler: An Adaptive Layer That Learns How to Balance Your Cluster

🚀 Overview

HAL Meta-Scheduler is an adaptive orchestration layer that learns how to balance workloads in real time.

It doesn't replace your scheduler — it teaches it to breathe.

This open-source demo shows how simple feedback metrics can keep a distributed system stable under changing load and still save energy.

No proprietary math or hidden weights — everything you see here is functional and reproducible.


🧩 What It Does

HAL observes your cluster through four lightweight signals:

Symbol Meaning Role
σ Coherence — how evenly the load is spread stability indicator
H Entropy — diversity of jobs per node utilization diversity
δ Queue drift — rate of pending growth stress level
Φ Informational potential — combined system tension energy/stability metric

These are computed continuously and used to adjust the balance between packing (energy-efficient) and spreading (latency-resilient).

The result: fewer spikes, smoother utilization curves, and lower total energy per job.


⚙️ How It Works

HAL is implemented as a simple control layer:

  1. Simulator – synthetic cluster with N nodes and a Poisson workload generator
  2. Controllers – heuristic, PID, and Bayesian variants that adapt parameter p ∈ [0,1] (pack ↔ spread)
  3. Metrics server – FastAPI + Prometheus /metrics endpoint for dashboards
  4. Helm chart – deployable metrics demo for Kubernetes
  5. Grafana dashboard – real-time visualization of σ, H, δ, Φ, and p

Everything runs locally with no external dependencies.

git clone https://github.com/Freeky7819/halms-demo
cd halms-demo
python -m venv .venv
.venv/Scripts/pip install -r requirements.txt
python simulate.py --steps 1500
python plot_metrics.py

yaml
Copy code

You’ll see two traces:

  • baseline (static scheduler)
  • adaptive HAL (dynamic control)

📊 Example Output

  • Queue spikes reduced by 40–70 %
  • Coherence σ stabilized near 0.9
  • Adaptive parameter p converging to steady state
  • Smooth Φ (stress metric) vs time

Even this demo, using only PID/Bayesian logic, shows how feedback control beats static heuristics for scheduling.


🧠 Why It Matters

Modern clusters waste cycles and energy because schedulers are blind to system feedback.

They rely on fixed heuristics like “bin pack until 80 % CPU” or “spread by labels”.

HAL introduces self-tuning — it reads the system’s own signals and re-balances automatically.

Benefits

  • ✅ Reduced queue oscillations
  • ⚡ Energy efficiency via adaptive packing
  • 📈 Predictable latency under load
  • 🔍 Native observability (Prometheus + Grafana)

Use cases

  • Kubernetes (as a policy advisor / extender)
  • HPC or SLURM queues
  • AI/ML job orchestrators
  • Edge or hybrid clusters

🧰 Tech Stack

Python 3.11 · FastAPI · Prometheus client · Helm v3 · Grafana · GitHub Actions CI (lint + SBOM)

License: Apache 2.0


🧭 Open vs Enterprise

Feature Public Demo Enterprise
Core control heuristic, PID, Bayesian proprietary resonant kernel
Deployment metrics demo (Helm) full operator + extender
Multi-cluster control
Historical analytics basic advanced
SLA & support community commercial

The open demo is fully working — no placeholders — and safe for public use.

The enterprise version builds on this foundation for production-grade orchestration.


🧪 Try It

Live repo → github.com/Freeky7819/halms-demo

Run metrics server
python -m uvicorn server:app --host 127.0.0.1 --port 8015

Then open:
http://127.0.0.1:8015/metrics
or http://127.0.0.1:8015/live
yaml
Copy code


🤝 Contribute

Feedback, issues, and forks are welcome.

We’re particularly interested in:

  • new stability metrics
  • dataset-driven tuning
  • multi-cluster experimentation

Open discussions or PRs — everything helps us improve the adaptive model.


HAL is open, safe, and ready to explore.

If you’ve ever wondered what a scheduler with a feedback loop would look like — this is your playground.

🔗 GitHub → Freeky7819/halms-demo

Top comments (0)