SangheeSon

Posted on May 18

I still don't want to give Claude SSH access, so I built a doctor for my homelab

#ai #selfhosted #opensource #go

AI as interpreter rather than operator

A few weeks ago I wrote about why I don't want to give Claude SSH access to my home server.

It's not that AI agents are useless. It's the opposite. They got good enough that handing one a shell started to feel reckless.

A shell isn't really an interface. It's a weapon with tab completion. It can read anything, delete anything, restart anything, and confidently run the wrong command on the wrong machine at 2am.

So I've been building HomeButler in the other direction:

Not "AI runs my server."
"AI asks my server safe, structured questions."

The thing dashboards don't do

The first version of HomeButler was mostly about visibility:

homebutler status
homebutler docker list
homebutler ports
homebutler report

It worked. But after running it on my own homelab for a while, I noticed something.

Raw status isn't enough.

Most of the time I don't actually want to see every metric. I want to know four things:

Is something wrong?
Is it urgent?
What changed?
What should I check next?

Dashboards don't really answer those. They show you CPU, memory, disk, containers, ports, uptime, graphs, colors, tables — and then quietly hand the interpretation back to you.

That's fine when I'm sitting at my desk. It's much less useful when I'm checking my phone half-awake, wondering if something is quietly on fire.

For a small homelab I don't need a mini NOC. I need a calm answer.

So I added a new command:

homebutler doctor

Output looks like this:

🩺 HomeButler Doctor — mac-mini

✅ CPU looks normal
✅ Memory looks normal
⚠️ Disk usage is high: 91%
⚠️ 1 container is stopped
⚠️ Public listener found on 0.0.0.0:8080
⚠️ Latest backup is older than 7 days

Suggested next steps:
→ homebutler docker list
→ homebutler ports
→ homebutler backup list

Not "here is everything." More like "here is what deserves your attention."

Why this matters more once an AI is in the loop

This shape becomes a lot more interesting when an agent is involved.

Imagine giving an agent SSH and asking:

"Is my server okay?"

Now the agent has to decide what to run. Probably something like:

df -h
free -m
docker ps
docker logs ...
ss -tulpn
systemctl status ...

That can work. But the agent is now exploring my box through a general-purpose shell. It can run too much, see too much, or run the right command on the wrong host. The blast radius is "whatever the shell can do," which is everything.

With HomeButler, the agent gets a much smaller surface:

homebutler doctor --json

Structured output. Read-only. Bounded scope.

The agent doesn't need to be the operator. It can be the interpreter. That distinction is the whole point of the project for me.

What doctor actually checks

homebutler doctor is intentionally boring. That's the feature.

It checks the kinds of things I usually only notice once they've already become a problem:

high CPU, memory, or disk usage
stopped containers
public bind ports
missing or stale backups
notification readiness
whether report has a baseline for change detection

There's a strict mode for cron and CI:

homebutler doctor --strict

And because everything is machine-readable:

homebutler doctor --json

An agent can consume the result directly. No scraping terminal output, no parsing colored text, no guessing what df -h formatted on a particular distro.

Report vs Doctor

After using both for a while, I've started thinking about HomeButler as having two distinct kinds of answers.

report answers: "What changed since last time?"

homebutler report

It saves snapshots, compares the current state with the previous one, and summarizes notable changes.

doctor answers: "What looks risky right now?"

homebutler doctor

One looks backward. One looks at the current risk. Together they're much more useful than raw metrics, and honestly more useful than most of the dashboards I've run.

The pattern I keep coming back to

The longer I work on this, the more I think the framing matters more than the features:

AI agents don't need more power by default. They need better tools.

A shell gives an agent maximum power and maximum ambiguity. A narrow tool gives it less power but more meaning. For homelab ops, that tradeoff feels right to me. I don't want an agent freely roaming my server. I want it to ask specific, bounded questions:

homebutler doctor --json
homebutler report --json
homebutler inventory scan --json
homebutler backup drill uptime-kuma --json

And then explain the result in plain language.

That's a very different security model from "here is SSH, good luck."

Try it

HomeButler is a single Go binary. No daemon, no database, no always-on service.

brew tap Higangssh/homebutler
brew install homebutler

Or:

curl -fsSL https://raw.githubusercontent.com/Higangssh/homebutler/main/install.sh | sh

Then:

homebutler init
homebutler doctor
homebutler report

Repo: https://github.com/Higangssh/homebutler

I'm still building this around one idea: the future of AI-assisted ops shouldn't be "give the agent a shell." It should be "give the agent a tool that says what it means."

Genuine question for anyone running a homelab: what's the last thing your setup broke without warning? I'm collecting ideas for what doctor should check next.

Top comments (16)

Mykola Kondratiuk • May 25

ran into the same reluctance - giving an AI agent root over my homelab felt like handing it to an intern on day one. the doctor pattern is cleaner: define the safe surface explicitly, the agent stays in bounds

SangheeSon • May 28

Thanks, "intern on day one" is exactly the feeling. I just wanted a way to manage my server safely — letting an AI touch it through SSH felt way too risky. So instead of giving it a shell, I gave it a fixed set of commands that only read things or do safe ops. The agent stays in bounds because the bounds are the only thing it can see.

Mykola Kondratiuk • May 29

that constraint-by-design approach is cleaner than permission scoping - the agent can't exceed scope because the scope is its entire API surface. the edge I keep hitting is safe vs unsafe ops isn't always clear at build time. you decide 'service restart is safe' until it cascades. do you revisit the allowed command list after incidents or keep it static?

SangheeSon • Jun 30

Static for now — I haven't built anything that adjusts the list after an incident, and you're right that the cascade case is the one you can't predict up front.

Honestly, the safety today comes down to one thing: there's no shell. The agent can only run a short, fixed set of commands and literally can't do anything outside that. I do mark the riskier actions, but that's just a label right now — nothing in the code actually blocks or asks before running them yet. Making that real, and reworking the list when something bites, is what I want to do next.

Mykola Kondratiuk • Jun 30

no shell is the cleaner guarantee - the list can be wrong but the agent can't wander outside it. the label without enforcement is the part i'd revisit - it implies a tier that doesn't actually exist yet.

Andy Stewart • May 18

Spot on! Handing a blind weapon like SSH to an AI agent is a disaster waiting to happen.
Local ops demand strict boundary control and determinism. Using Go to build structured, read-only APIs lets the AI act as the 'interpreter' rather than the 'operator'—this is true Local-First architecture. Great work!

Klaudia Grzondziel • May 19

Interesting idea! Only today I had a heated discussion with my colleagues about security considerations around using AI and how much you should allow it. I think many people are still not aware of the risks it can bring. Will make sure to read your previous article.

Good luck with your project, it looks impressive! 👏🏻

SangheeSon • May 20

Thanks Klaudia! Glad the timing lined up with your discussion 😄 Hope the previous article is useful too!

Harjot Singh • May 30

"I don't want to give Claude SSH access" is a healthy instinct, and building a read-only doctor instead is the right pattern. The principle generalizes: give the agent rich observability and zero (or tightly scoped) mutation authority. Let it diagnose all day; make a human or a constrained, audited action path do the actual changing.

This is the same trust-boundary idea that makes agents safe anywhere - the danger isn't the model reasoning about your homelab, it's the model with a shell and sudo acting on a wrong conclusion. A doctor that reports findings keeps the failure mode at "bad advice you can ignore" instead of "rm -rf on prod." Smart build. Curious if you gave it any guarded remediation actions, or kept it strictly diagnostic?

SangheeSon • Jun 30

It can do more than diagnose — over MCP it can also restart, stop, restore, or remove a container/app. Those are tagged read/write/destructive in the code.

Honest part: that tagging is basically labels right now, not something the code enforces, and there's no built-in confirmation step. What actually keeps it safe is that there's no shell — the AI can only call a short, fixed list of commands, nothing arbitrary. A real read-only mode that enforces those tiers is what I want to add next.ing I want to add.

Shahzaib • May 20

The "interpreter vs operator" framing is gold. That's exactly the trade-off I've been struggling with for my own homelab. A shell is a weapon with tab completion, as you put it, and I'm starting to think the right path isn't "better SSH security" but "bounded, read-only tools" like what you built here.

homebutler doctor --json is a beautiful interface for an agent. It's a perfect example of giving the AI less power but more meaning.

Are you planning to add any sort of "change detection" beyond the report command? Like alerting when a specific metric crosses a threshold without needing a full dashboard?

SangheeSon • May 28 • Edited

Glad that framing landed. And yeah, change detection is basically the direction I’m pushing toward — not just a small feature on top.

Threshold alerts already exist today: homebutler alerts --watch polls CPU, memory, and disk, then sends Telegram/webhook notifications when limits are crossed. Defaults are 90/85/90, so it works without a dashboard.

report also keeps snapshots and diffs against the previous run, so you get changes like “Running containers: 4 → 3” or “Disk /: +2.3 GB since last report.” And watch handles Docker events, pre-crash logs, and restart flapping.

The harder part is drift detection. A container restarting five times is easy to flag. But memory slowly growing 1% every day for two weeks? That’s the thing I want to surface without pulling in a full TSDB. I’m thinking about something lightweight on top of existing snapshots — open to ideas.

Demi AI • May 22

Quite interesting!

SangheeSon • May 28

Thank you! 🙏

View full discussion (16 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.