Jacob Haflett

Posted on Mar 1

Solo founder building Rhelm: Recursive High Efficiency Language Models

#ai #aiops #agents #discuss

Hey, I'm Jacob. Solo founder of Rhelm.

10+ years deep in infra, Kubernetes, distributed systems, Go, Python, and AI orchestration.

I got tired of watching API bills stack up fast. Every task, big or small, was getting routed to the most expensive model by default. Didn't matter if it was complex reasoning or fixing a typo. Same model, same price.

So I built recursion into the workflow.

How it works

Rhelm decomposes complex objectives into atomic subtasks. Each one is simple enough for a small model to nail perfectly, then gets routed to the cheapest capable model. Local models at $0/token handle the bulk of the work. The expensive frontier models only get called when the task actually needs them.

The result: real AI power in your hands, not rented behind paywalls.

What it looks like in practice

Most AI tools dump everything on you at once. Logs, token counts, model responses, errors, all fighting for your attention. You end up spending more time managing the AI than doing the actual work.

We solved that by putting everything on a kanban board. PMs write objectives in plain language, agents pick them up like team members, and each card only surfaces what matters for that task. Cost, quality, status. No noise. You see what you need when you need it.

Early numbers

~90% token cost reduction
Output quality goes up, not down
Runs on your hardware or in the cloud, your choice

Waitlist is open

If this sounds like a problem you're dealing with, check it out: rhelm.io

I'd love to hear from the community. What's your biggest pain point with current AI agent setups right now: cost, drift, security, or tool sprawl?

Drop your thoughts below. I'm building this in public and your feedback shapes the roadmap.

Top comments (5)

Micheal Angelo • Mar 1

Hi, I came across this blog and I’m interested in knowing how these small models are being orchestrated. Also, when it comes to running them locally, how much RAM would it require? Nevertheless, it’s a pretty interesting blog.

Jacob Haflett • Mar 2

Thanks so much, Micheal! Really appreciate you checking it out and joining the waitlist.

Great questions. We'll be dropping YouTube content soon that walks through how the orchestration works, RAM requirements for running models locally, and a lot more. Stay tuned for that.

Glad you found it interesting, and welcome aboard!

Micheal Angelo • Mar 2

Hi Jacob,

I’ve been experimenting with running small models locally (currently Mistral via Ollama on a 16GB system) and testing lightweight orchestration loops. One thing I’ve noticed is how quickly RAM becomes a constraint once you start layering planner/executor patterns or persistent memory.

I’m particularly interested in efficient architectures — not assuming any single approach is optimal, but exploring how different recursive or multi-model strategies trade off memory, latency, and persistence.

I’ve always been fascinated by the idea of having a personal local agent — something that could eventually run continuously on constrained hardware (even something like a Raspberry Pi), not just short-lived inference sessions.

I’m still actively learning and experimenting, but I’d genuinely be interested in understanding how you’re approaching orchestration and efficiency in Rhelm. If you ever need someone to test ideas, explore edge cases, or contribute experiments, I’d be glad to collaborate.

Either way, looking forward to seeing where you take this.

Regards,

Micheal Angelo

Micheal Angelo • Mar 2

All the best !

Micheal Angelo • Mar 1

Also, Joined the waitlist👍