Repo:github.com/hearth-project/hearth · Apache-2.0 · v0.1.0, alpha.
I've been building Hearth, a Kubernetes operator that serves open-source LLMs (Qwen, DeepSeek, GLM, …) declaratively and scales them to zero when idle. It's at a point where the core works end-to-end on real GPUs, and I'm looking for people to build it with me. The thing I most want you to know up front: you can contribute without owning an accelerator. More on that below.
## The one interesting problem
Self-hosting an LLM on K8s is easy until you notice the GPU is burning money while nobody's using the model. The obvious fix — "scale to zero" — runs straight into a chicken-and-egg problem: a stock HPA can't scale up from zero, because zero replicas means zero metrics, which means it never wakes up.
Hearth puts a small gateway (an OpenAI-compatible reverse proxy) in front of each model. When a request arrives at a scaled-to-zero backend, the gateway accepts it, holds the connection open (SSE keepalive heartbeats so nothing times out), and bumps a pending counter exposed at /hearth/queue. KEDA polls that endpoint, sees pending > 0, and scales the backend 0 → 1. The pod loads weights from a warm cache, becomes Ready, and the gateway forwards the buffered request and streams tokens back. Idle again → KEDA scales it back to 0.
The whole thing is one manifest:
apiVersion: serving.hearth.dev/v1alpha1
kind: LLMService
metadata: { name: qwen3-8b, namespace: ai }
spec:
model:
source: { uri: modelscope://Qwen/Qwen3-8B-Instruct } # or hf://
runtime:
selector: { vendor: [nvidia, ascend] } # auto-pick a backend, in order
resources: { accelerators: 1 }
scaling: { min: 0, max: 3, metric: queueDepth, target: 10 }
$ kubectl get llmservice -n ai
NAME PHASE RUNTIME REPLICAS AGE
qwen3-8b ScaledToZero vllm-nvidia 0 30s
It's deliberately vendor-neutral: backends (NVIDIA-vLLM, vLLM-Ascend, …) are described as data in a cluster-scoped InferenceRuntime CRD — image, args, the device-plugin resource name, probes, metrics paths. Adding a chip is a thin adapter that does K8s-layer adaptation only; it never re-implements vLLM or touches kernels. The same LLMService is meant to run unchanged on NVIDIA or Ascend.
Hearth deliberately stays in its lane: it's the K8s orchestration/lifecycle layer. The engine is vLLM; scheduling is device-plugins / HAMi / Volcano; datacenter-scale serving is KServe / llm-d Hearth is the few-GPU, scale-to-zero, private end of that spectrum.
Why you can contribute without a GPU
This is the part I'm proud of and the reason I'm posting. A vendor-neutral project is useless to contributors if every change needs a rack of hardware. So there's a full no-GPU test path: a CPU vllm-stub that fakes startup delay, streaming, and /metrics, plus a fake extended resource on the node. On a plain kind cluster, with no accelerator, one command —
make test-scale-e2e
— runs the entire 0 → 1 → N → 0 loop, including cold-start keepalive and graceful drain. A laptop is enough to develop and verify the core behavior.
Honest status
I won't oversell it. As of v0.1.0:
- Works, verified end-to-end on real NVIDIA GPUs: multi-backend abstraction, model caching/prewarm, gateway + KEDA scale-to-zero, cold-start keepalive, graceful drain, 1→N autoscaling, Helm install, Grafana dashboard.
- Scaffolded + golden-tested, not yet on real hardware: the Ascend backend renders correct manifests but hasn't been validated on real NPUs. This is the big v1 gap, blocked purely on hardware access.
- Not there yet: auth, multi-tenancy. It's v1alpha1 and not production-ready — a strong fit today for internal/dev, latency-tolerant, cost-sensitive serving.
Where I'd love help
- Got Ascend (or Cambricon) hardware? Validating the Ascend backend on a real NPU is the single most valuable thing right now.
- No special hardware? Grab a good-first-issue (https://github.com/hearth-project/hearth/issues) — the no-GPU path above means you can build, test, and verify locally.
- Just curious? Try the kind quickstart, poke holes, open an issue, or ⭐ and follow along.
If any of this resonates, the Welcome issue (#1)(https://github.com/hearth-project/hearth/issues/1) is the place to
say hi. Thanks for reading.
Your models, your hearth. 🔥
Top comments (1)
Discussions and contributions from everyone are very welcome.