Using Kubernetes is a love-hate relationship.
Love, because before it, deploying to production was something dark and uncertain. Many tools tried ...
For further actions, you may consider blocking this person and/or reporting abuse
building an alternative sounds exciting but I'd push back - Kubernetes complexity is mostly ecosystem lock-in, which your alternative won't fix. teams complaining about k8s usually struggle with helm charts, not the scheduler itself.
Fair point — but I'd argue Helm exists precisely because Kubernetes doesn't package things natively. That's not ecosystem lock-in, that's a gap the ecosystem had to fill.
And it keeps going: CNI, Ingress controller, cert-manager, secrets management. None of it is the scheduler. All of it is complexity you have to own.
the scheduler-vs-everything-else divide is the real tell — lock-in you can architect around, but filling 4 structural gaps with 4 independent tools is a complexity floor that any alternative has to confront too
I agree, I wrote a second article where I detailed what I thought of as architecture, I would be happy to have your opinion.
dev.to/denerfernandes/i-decided-to...
will check it out, specifically curious how you handled the scheduler boundary. that's usually where multi-tool architectures get messy.
Love this kind of project. Kubernetes solved a huge problem, but it also normalized a level of complexity that’s overkill for a lot of real-world teams. I think there’s a massive gap between “docker-compose on a VPS” and “learn 15 abstractions + YAML archaeology.”
What I especially like here is that you’re not just saying “K8s bad,” you’re clearly thinking about the tradeoffs: Swarm stagnation, Nomad ecosystem lock-in, operational simplicity vs flexibility, etc. That’s the interesting part.
Even if the final result ends up being opinionated rather than universal, that’s still valuable. Some of the best infra tools started exactly like this: someone got tired of unnecessary complexity and built the thing they wished existed.
Looking forward to the architecture posts.
Thank you for your kind words, I will definitely share them in future posts.
This is the kind of “crazy” project that is worth doing even if it does not replace Kubernetes.
Kubernetes became the default because it solves a huge class of problems, but a lot of teams are absolutely paying for flexibility they do not need. If your workload model is simpler, the operational surface can probably be simpler too.
The hard part is deciding what not to support.
Most Kubernetes alternatives start clean, then slowly re-invent scheduling, networking, service discovery, secrets, rollbacks, health checks, autoscaling, logs, multi-tenancy, and policy until they accidentally become a smaller Kubernetes with fewer contributors.
If the project keeps a narrow opinionated scope, it could be useful. If it tries to be a general platform, the complexity comes back.
I like the idea. Would love to see what you end up with. But still, k8s has it's spot and sometimes when you think you need it your actually better of without, but more often I find cases where no one even thought k8s is a option, because there is nothing to scale andots of state to deal with yet weirdly enough it still was the best solution.
Sure, Kubernetes is amazing. I think the best thing about Kubernetes is the ecosystem that surrounds it, which has matured over the years.
Anything that wants to replace k8s for AI workloads will hit the same wall we did. GPU scheduling, OTel, and multi-tenant cost isolation are the three load-bearing problems. Curious how the project handles graceful node drain when model weights are paged into GPU memory. We saw cold-start penalties of 90 seconds on Llama-3.3-70B and ended up writing custom pre-drain hooks. Worth sharing how cost attribution works at the namespace level too.
Great points but Houdini isn't targeting AI workloads, at least not in v1. The goal is the opposite end of the spectrum — the developer who just wants to deploy a web app without a 40-hour Kubernetes course.
GPU scheduling, OTel, and multi-tenant cost isolation are genuinely hard problems, but they belong to a space where K8s (with the right operators) probably stays king for good reasons. I'd rather do one thing well first.
That said, the architecture is being designed to stay extensible — so nothing closes that door for the future. Appreciate the insight.
Wonderful
thanks!
I've been seeing a lot of kubernetes server and cluster stuff in my reels and shorts. This gives me a lot of insight to some of the questions I had. Thanks for taking the time to break it down <3
Thanks!
Prefect! Best way to learn the fundamentals of technologies
Sure! thanks
I think for more small projects you can use things like k3s, but I am curious what you are building, cant wat for the rest of the posts and your journey.
Totally agree on k3s — it's a solid choice and way easier than full K8s for small setups.
Thanks for following along — the next posts will get more concrete. 🙏
Looking forward to it!
⏳
You might be crazy, but the ecosystem needs this kind of audacity! Kubernetes is amazing at scale, but the operational complexity for smaller deployments is often overkill. Building a leaner orchestration tool from scratch is a massive undertaking, but the learnings alone will be worth it. What language are you building the control plane in?
Fair point on v1 scope, simple-web-app target is its own legitimate problem. Curious about v2 thinking. Once you have a base abstraction for deploy plus scale plus networking, the AI-workload extension is mostly GPU scheduling and weight-loading orchestration on top. Have you sketched what the GPU-aware variant looks like? The bottleneck we hit on Llama-3.3-70B was pre-loading weights into GPU memory before traffic shifts. Curious if Houdini's design would let you plug a GPU node-class in cleanly or if you would need to refactor the core scheduler.