SangheeSon

Posted on Apr 29

I don't want to give Claude SSH access to my home server

#opensource #ai #go #selfhosted

AI agents are getting scary good at ops work.

Claude Code, Codex, OpenHands — they can SSH into a server, read logs, restart containers, fix configs. It works.

But I still won't give my home server an SSH key for an agent.

A shell is too sharp. It can inspect anything. It can delete anything. It can rm -rf the wrong path very fast and very confidently.

I want the agent to help operate my server, not be the operator.

So I've been building HomeButler in the opposite direction: not "AI runs my server," but "AI talks to my server through a narrow, structured interface."

The shape of the tool

HomeButler is a single Go binary. No daemon. No database. No always-on web service.

You can copy it onto a server, run it from cron, pipe it into a script, open the web dashboard, or hook it into Claude Desktop via MCP. Same binary, same core, different surfaces.

┌──────────────────────────────────────────────────┐
│  Layer 3 — Chat Interface                        │
│  Telegram · Slack · Discord · Terminal · Browser │
└──────────────────────┬───────────────────────────┘
                       │
┌──────────────────────▼───────────────────────────┐
│  Layer 2 — AI Agent                              │
│  Claude · LangChain · n8n · OpenClaw             │
└──────────────────────┬───────────────────────────┘
                       │  CLI exec or MCP (stdio)
┌──────────────────────▼───────────────────────────┐
│  Layer 1 — Tool (homebutler)   ← YOU ARE HERE    │
│                                                  │
│   CLI · MCP · Web    → same internal/ core       │
│   system · docker · ports · backup · watch       │
└──────────────────────────────────────────────────┘

The agent gets explicit, JSON-returning operations. It can't reach outside this surface. The blast radius of a wrong call is bounded.

I sleep better.

Three things HomeButler does that I actually care about

There's a long feature list in the README. Three of them are why I keep working on this.

1. `report` — what changed, not just what is

Most homelab dashboards show state. Beautiful graphs, lots of dots.

But my home server doesn't fail because I lack data. It fails because I don't notice the right change at the right time.

🏠 Homebutler Report — mac-mini

── Current Status ──
   CPU: 5.0% · Memory: 8.3/16.0 GB · Disk: 4%
   Containers: 1 running, 1 stopped · Public ports: 5

── Needs Attention ──
   ⚠️  1 container(s) stopped

── Notable Changes ──
   No significant changes since last report.

── Suggested Actions ──
   → Address items in 'Needs attention' above.

First run takes a baseline. Every run after diffs against the last snapshot and tells you what moved. --json for piping into agents.

What I want it to answer is just three things:

What looks wrong?
What changed?
What should I check next?

Not "here's a graph, you figure it out."

2. `watch` — the 3 AM container death problem

Your container crashed at 3 AM. Why? By morning, the logs are gone, the container is back up, and you have no idea what happened.

watch catches the moment it dies, saves the pre-death and post-restart logs, and analyzes the cause:

[03:14:22] INCIDENT: nginx (incident nginx-20260410-031422-7a2124)
  Crash: OOM — process killed by SIGKILL (oom, confidence: high)
  ⚠ FLAPPING: acute (3 restarts in short window)

It uses exit codes (137 = OOM, 139 = segfault, 143 = SIGTERM) plus log pattern matching (panic:, Out of memory, FATAL) to categorize the crash. Flapping detection flags processes stuck in restart loops — acute (3+ in 10 min) or chronic (5+ in 24h).

Supports Docker (real-time event stream), systemd, and PM2.

homebutler watch add nginx           # interactive: pick docker / systemd / pm2
homebutler watch start               # foreground monitoring
homebutler watch history             # list past incidents
homebutler watch show <incident-id>  # full crash report

This is the feature I built for myself first. Every homelabber has had a 3 AM container death they couldn't explain.

3. `backup drill` — "having a backup" vs "being able to restore"

These are different things. I learned this the hard way.

backup drill boots your backup in an isolated Docker network, runs an HTTP health check against the booted app, and tears it all down. Like a fire drill for your data.

🔍 Backup Drill — uptime-kuma

  📦 Backup: backup_2026-04-04_1711.tar.gz (18.6 MB)
  🔐 Integrity: ✅ tar valid (8 files)
  🚀 Boot: ✅ container started in 0s
  🌐 Health: ✅ HTTP 200 on port 58574
  ⏱️  Total: 2s

  ✅ DRILL PASSED

Zero risk to your running services — completely isolated environment, random port, separate network. If the drill fails, your backup is theater. Better to find out today than on the day you actually need it.

Why this shape, not another dashboard

Portainer, Netdata, CasaOS, Uptime Kuma — these are all great. I run several of them. They solve real problems.

But on a small home server, my bottleneck isn't observability. It's me, the operator, with five tabs open trying to remember what "normal" looked like yesterday.

A small structured CLI fits that gap better than another tab:

Cron-friendly — homebutler report --json | jq at 8 AM with my coffee
Script-friendly — exit codes that mean something, JSON everywhere
Agent-friendly — MCP server built in, works with Claude Desktop / ChatGPT / Cursor out of the box
Air-gap friendly — single binary, no daemon, no phone-home

The dashboard is also there (homebutler serve for a go:embed web UI), but it's opt-in. The default mode is "command you run, answer you get, exit."

The bigger bet

I think the next interesting interface for ops isn't a prettier dashboard. It's the conversation:

"Is anything weird with the server?"

(Claude calls homebutler report via MCP, reads the JSON, summarizes)

"Yeah — Vaultwarden has been restarting every ~2 hours since yesterday. Want me to pull the crash logs?"

For that to be safe, the agent needs a narrow tool, not a shell. That's what HomeButler is becoming.

Try it

brew install Higangssh/homebutler/homebutler

Or:

curl -fsSL https://raw.githubusercontent.com/Higangssh/homebutler/main/install.sh | sh

Then:

homebutler init        # interactive setup
homebutler report      # baseline + summary
homebutler watch tui   # terminal dashboard

Repo: https://github.com/Higangssh/homebutler
Site: https://homebutler.dev

If you run a homelab, I'd love to hear: what's the one thing you wish someone would just tell you about your server every morning? That's the next thing I want report to answer.

Top comments (2)

PEACEBINFLOW • May 4

The "report — what changed, not just what is" idea gets at something I've felt but never articulated about the difference between monitoring tools and actual situational awareness. Most dashboards are answer engines for questions you already know to ask. They'll tell you CPU is at 85% if you go look. But the moment you most need help — something subtle shifted while you weren't watching — they're silent, because they're designed to show state, not detect drift.

The baseline-and-diff approach is interesting because it inverts that. Instead of you polling the system, the system tells you what moved relative to a known good state. That's closer to how a human operator actually thinks: "something feels different, I just can't place what." The hard part, I imagine, is tuning what counts as a notable change versus noise. A container restart is obvious. But a 2% memory increase that keeps climbing? A new process that appeared in the process list? Those are the things that might matter or might be nothing, and the difference between a useful report and an annoying one lives in that threshold.

I'm curious how you're handling that calibration. Is it manual — you define what's notable — or does the tool start to learn what "normal" looks like over multiple snapshots and flag deviations statistically? Because the latter is where this gets really powerful and also where it risks becoming the very complexity you're trying to avoid.

SangheeSon • May 6

Thanks for taking the time to read it closely — you articulated the tension really well.

Right now, HomeButler is intentionally on the simple side. It does not “learn normal” statistically yet. The current report flow takes a snapshot, compares it with the previous snapshot, and flags concrete changes like stopped containers, public port count changes, disk usage deltas, and high memory/disk usage.

I’m being careful about this because I don’t want the tool to become a black box. If HomeButler says something is notable, I think the user should be able to understand why without trusting some hidden model.

So for now, the sweet spot I’m aiming for is:

• obvious operational changes should be detected automatically
• noisy metrics should stay quiet unless they cross simple, explainable thresholds
• the output should tell you what changed and what to check next

Longer term, I do want HomeButler to become more agentic — something that can notice patterns over time and help reason about them. But I’m still thinking about how to get there without making the system feel complicated or opaque.