In my last post I explained why I decided to think "ok, I'm going to build an alternative to Kubernetes." I got a lot of comments — some supportive, others questioning if I really knew what I was getting into. Fair enough. But the decision was already made, so the next step was simple: stop thinking and start doing.
The name
First thing I did was pick a name. Might seem silly, but I always do this with my projects — naming something turns a vague idea into something real. It stops being "that crazy thought I had in the shower" and becomes an actual thing.
This time the name came instantly. One of the main goals of the project is to radically simplify the experience of getting things running. No stress, no 40-hour courses, no infinite YAML. Something that feels almost... magical.
So: Houdini.
Harry Houdini was the world's first great magician and escape artist. Maybe it's pretentious to compare an open-source project with the guy who escaped coffins underwater, but the name fits perfectly with what I want to deliver: making something complex look simple.
The study phase
Let me clarify something that was confusing in the previous post — some people understood that I don't know Kubernetes at all. I don't blame them, I probably wasn't clear. When I mentioned "using AI to create the cluster" I was referring to a time and efficiency thing, not total ignorance. I've been using and deploying K8s in production for years.
But there's a difference between using Kubernetes and creating something that competes in the same space. For that, I needed to go much deeper.
I started studying the Kubernetes source code. Not the docs, not the tutorials — the code. How things were implemented, why certain decisions were made, what trade-offs were accepted. I did the same with HashiCorp's Nomad. This study even led me to consider getting the CKA certification — not for the certificate itself, but for the structured learning process.
AI helps a lot at this stage, I'll be honest. Reading 2 million lines of Kubernetes Go code without any guide would be insane. But the understanding has to be mine — AI accelerates, it doesn't replace.
It's not a new Kubernetes
A crucial point: I'm not trying to build a new Kubernetes. A solo developer will hardly achieve that, and it's not the goal anyway. I want to build an alternative. Not better, not worse — simply a different way of thinking about how an orchestrator should work.
The project is opinionated. Unapologetically. Because it reflects how I would like an orchestrator to work, based on 22 years of writing code and deploying to production.
The design: learning from others' mistakes
To arrive at Houdini's design, I did three things:
- Studied the architecture of Kubernetes and Nomad in depth
- Catalogued the main community criticisms of K8s, Nomad, and Swarm
- Combined all of that with my practical experience of what works and what doesn't
The most common criticisms I found:
Kubernetes:
- Absurd complexity for simple use cases
- Dozens of components to maintain (etcd, kube-apiserver, scheduler, controller-manager, kubelet, kube-proxy...)
- YAML hell — verbose configs that are painful to debug
- Service mesh requires external components (Istio, Linkerd)
- Learning curve measured in months
Nomad:
- Simple but incomplete — needs Consul for service discovery, Vault for secrets
- No built-in service mesh
- Smaller community, less tooling
Docker Swarm:
- Abandoned by Docker Inc.
- No real autoscaling
- Limited deployment strategies
Houdini's architecture
With all of that in mind, these are the design decisions I made. Note: I'm not saying these are immutable — the project is in active development and things may change as I encounter real problems.
Principle #1: One binary, zero external dependencies
$ houdini server # control plane
$ houdini agent # worker node
$ houdini deploy # CLI
Same binary does everything. No separate etcd, no Consul, no Vault, no Prometheus server. Everything embedded. Storage uses BoltDB (an embedded key-value store in Go), distributed consensus uses Raft (same library Nomad uses), service mesh uses native WireGuard.
Why? Because the biggest source of operational complexity in Kubernetes isn't the concept — it's the 15 components you need to keep alive. I want someone to be able to spin up a working cluster with literally houdini server and houdini agent --server <ip>.
Principle #2: Multi-runtime
Houdini doesn't just run containers. It supports four types of workload with the same API:
- Container — Docker, for when you need full isolation
- Process — native OS processes, for local dev or binaries that don't need a container
- WASM — WebAssembly modules, <1ms startup, lightweight sandboxing
- Function — serverless, event-driven, automatic scale-to-zero
Why? Because not every workload needs a container. A Python script that runs every 5 minutes doesn't need a 200MB Docker image. A webhook handler doesn't need to be alive 24/7 consuming RAM.
Principle #3: Networking built-in
Each node gets a WireGuard subnet (10.42.X.0/24). Containers communicate directly between nodes through encrypted tunnels. Internal DNS resolves service.namespace.houdini to the correct IPs. Ingress controller with automatic HTTPS (Let's Encrypt) included.
Why? Because in Kubernetes you need to choose between Calico, Flannel, Cilium for CNI, then install an Ingress controller (nginx, traefik, envoy), then configure cert-manager for TLS. That's 3-4 decisions and installations before external traffic reaches your service.
Principle #4: Smart deploy pipeline
Deploying isn't "send the YAML and pray." It's a pipeline with phases:
- Validation — does the spec make sense?
- Scheduling — where to run? (bin-pack or spread, with anti-affinity, constraints, failure scoring)
- Dispatch — direct push to agent via gRPC stream
Four strategies: Rolling (zero-downtime), Canary (test with % of traffic), Blue-Green (atomic switch), All-at-once (fast, accepts downtime).
If it fails? Retry with exponential backoff. Exhausted all attempts? Dead Letter Queue — nothing is silently lost.
Principle #5: Continuous reconciliation
Every 30 seconds, a reconciler compares the desired state (what you asked for) with the actual state (what's really running). If they diverge, it corrects. Workload crashed? Restart with backoff. Node died? Reschedule to another node. Orphan container? Automatic GC.
Why? This is the pattern Kubernetes absolutely nailed — declarative + reconciliation. I copied it. I'm not ashamed of copying what works.
Principle #6: Security is not optional
- Secrets encrypted with AES-256-GCM (Argon2id for key derivation)
- RBAC with 4 roles (admin, operator, developer, viewer)
- 2FA with TOTP
- Agent↔server communication with token + gRPC streams
- Policy engine for admission control (blocks :latest in prod, requires health checks, etc.)
Simplified overview
┌─────────── CONTROL PLANE (houdini server) ──────────────┐
│ │
│ REST API (:4646) │ gRPC (:4648) │ Ingress (:443) │
│ │
│ Scheduler │ Reconciler │ Deploy Pipeline │ Autoscaler │
│ │
│ State Store (BoltDB / Raft cluster) │
└──────────────────────────┬───────────────────────────────┘
│ gRPC Streams + WireGuard Mesh
▼
┌──────── WORKER NODE (houdini agent) ─────────┐
│ │
│ Runtime Registry │
│ ├── Docker (containers) │
│ ├── Process (native) │
│ ├── WASM (modules) │
│ └── Function (serverless) │
│ │
│ Workload Manager │ Health Probes │ Logs │
└───────────────────────────────────────────────┘
But what about the user experience?
I know what you're thinking: "Ok, lots of technical stuff under the hood, but what about in practice?" Good question.
The point is that all this complexity — scheduling, mesh, autoscaling, reconciliation — exists so that the user doesn't have to think about it. All of this will be managed transparently through a Web UI where you configure, deploy, monitor, and scale your services without ever opening a terminal if you don't want to.
The idea is simple: if you want to go deep with TOML and the CLI, go ahead. If you want to click "Deploy" and go grab a coffee, you can do that too. No judgment.
The Web UI will be the topic of a future article — and I promise that one will be way less technical than this one. Actually, this is probably the most technical article I'll write in this series. If you survived this far, the next ones will be a breeze.
Next steps
The project isn't public yet, but it will be soon. I'm preparing everything so that when I open the repository, people can actually clone, run, and test it — I don't want to open something half-baked and give the wrong first impression.
In the next post I'll talk about the Web UI and the experience I want to deliver for those who don't want (or need) to live in the terminal.
Until then, if you have questions, criticism, or suggestions — send them. Especially if you disagree with any decision. That's how projects improve.
Top comments (0)