🚀 Overview
HAL Meta-Scheduler is an adaptive orchestration layer that learns how to balance workloads in real time.
It doesn't replace your scheduler — it teaches it to breathe.
This open-source demo shows how simple feedback metrics can keep a distributed system stable under changing load and still save energy.
No proprietary math or hidden weights — everything you see here is functional and reproducible.
🧩 What It Does
HAL observes your cluster through four lightweight signals:
Symbol | Meaning | Role |
---|---|---|
σ | Coherence — how evenly the load is spread | stability indicator |
H | Entropy — diversity of jobs per node | utilization diversity |
δ | Queue drift — rate of pending growth | stress level |
Φ | Informational potential — combined system tension | energy/stability metric |
These are computed continuously and used to adjust the balance between packing (energy-efficient) and spreading (latency-resilient).
The result: fewer spikes, smoother utilization curves, and lower total energy per job.
⚙️ How It Works
HAL is implemented as a simple control layer:
- Simulator – synthetic cluster with N nodes and a Poisson workload generator
-
Controllers – heuristic, PID, and Bayesian variants that adapt parameter
p ∈ [0,1]
(pack ↔ spread) -
Metrics server – FastAPI + Prometheus
/metrics
endpoint for dashboards - Helm chart – deployable metrics demo for Kubernetes
- Grafana dashboard – real-time visualization of σ, H, δ, Φ, and p
Everything runs locally with no external dependencies.
git clone https://github.com/Freeky7819/halms-demo
cd halms-demo
python -m venv .venv
.venv/Scripts/pip install -r requirements.txt
python simulate.py --steps 1500
python plot_metrics.py
yaml
Copy code
You’ll see two traces:
- baseline (static scheduler)
- adaptive HAL (dynamic control)
📊 Example Output
- Queue spikes reduced by 40–70 %
- Coherence σ stabilized near 0.9
- Adaptive parameter p converging to steady state
- Smooth Φ (stress metric) vs time
Even this demo, using only PID/Bayesian logic, shows how feedback control beats static heuristics for scheduling.
🧠 Why It Matters
Modern clusters waste cycles and energy because schedulers are blind to system feedback.
They rely on fixed heuristics like “bin pack until 80 % CPU” or “spread by labels”.
HAL introduces self-tuning — it reads the system’s own signals and re-balances automatically.
Benefits
- ✅ Reduced queue oscillations
- ⚡ Energy efficiency via adaptive packing
- 📈 Predictable latency under load
- 🔍 Native observability (Prometheus + Grafana)
Use cases
- Kubernetes (as a policy advisor / extender)
- HPC or SLURM queues
- AI/ML job orchestrators
- Edge or hybrid clusters
🧰 Tech Stack
Python 3.11 · FastAPI · Prometheus client · Helm v3 · Grafana · GitHub Actions CI (lint + SBOM)
License: Apache 2.0
🧭 Open vs Enterprise
Feature | Public Demo | Enterprise |
---|---|---|
Core control | heuristic, PID, Bayesian | proprietary resonant kernel |
Deployment | metrics demo (Helm) | full operator + extender |
Multi-cluster control | — | ✅ |
Historical analytics | basic | advanced |
SLA & support | community | commercial |
The open demo is fully working — no placeholders — and safe for public use.
The enterprise version builds on this foundation for production-grade orchestration.
🧪 Try It
Live repo → github.com/Freeky7819/halms-demo
Run metrics server
python -m uvicorn server:app --host 127.0.0.1 --port 8015
Then open:
http://127.0.0.1:8015/metrics
or http://127.0.0.1:8015/live
yaml
Copy code
🤝 Contribute
Feedback, issues, and forks are welcome.
We’re particularly interested in:
- new stability metrics
- dataset-driven tuning
- multi-cluster experimentation
Open discussions or PRs — everything helps us improve the adaptive model.
HAL is open, safe, and ready to explore.
If you’ve ever wondered what a scheduler with a feedback loop would look like — this is your playground.
Top comments (0)