Hey, I'm Jacob. Solo founder of Rhelm.
10+ years deep in infra, Kubernetes, distributed systems, Go, Python, and AI orchestration.
I got tired of watching API bills stack up fast. Every task, big or small, was getting routed to the most expensive model by default. Didn't matter if it was complex reasoning or fixing a typo. Same model, same price.
So I built recursion into the workflow.
How it works
Rhelm decomposes complex objectives into atomic subtasks. Each one is simple enough for a small model to nail perfectly, then gets routed to the cheapest capable model. Local models at $0/token handle the bulk of the work. The expensive frontier models only get called when the task actually needs them.
The result: real AI power in your hands, not rented behind paywalls.
What it looks like in practice
Most AI tools dump everything on you at once. Logs, token counts, model responses, errors, all fighting for your attention. You end up spending more time managing the AI than doing the actual work.
We solved that by putting everything on a kanban board. PMs write objectives in plain language, agents pick them up like team members, and each card only surfaces what matters for that task. Cost, quality, status. No noise. You see what you need when you need it.
Early numbers
- ~90% token cost reduction
- Output quality goes up, not down
- Runs on your hardware or in the cloud, your choice
Waitlist is open
If this sounds like a problem you're dealing with, check it out: rhelm.io
I'd love to hear from the community. What's your biggest pain point with current AI agent setups right now: cost, drift, security, or tool sprawl?
Drop your thoughts below. I'm building this in public and your feedback shapes the roadmap.
Top comments (3)
Hi, I came across this blog and I’m interested in knowing how these small models are being orchestrated. Also, when it comes to running them locally, how much RAM would it require? Nevertheless, it’s a pretty interesting blog.
Thanks so much, Micheal! Really appreciate you checking it out and joining the waitlist.
Great questions. We'll be dropping YouTube content soon that walks through how the orchestration works, RAM requirements for running models locally, and a lot more. Stay tuned for that.
Glad you found it interesting, and welcome aboard!
Also, Joined the waitlist👍