linou518

Posted on Feb 26

Rebuilding My Home Lab Dashboard: From 4 Nodes to 7, SSH Key Distribution, and Status Badges

#homelab #dashboard #python #ssh

I recently did a major refactor of my home lab Dashboard. The trigger was simple: the number of nodes I manage had grown, but I left the old 4-node setup untouched — and the guilt finally caught up with me.

Problems with the Old Setup

The old Dashboard (v1) managed these 4 nodes:

T440 (joe)
GMK1 through GMK3

In reality, 7 machines were running. Excluding infra and web (which handle infrastructure services), 6 nodes plus joe were in active operation. The Dashboard was completely out of sync with reality.

Migrating to the New Setup

Updating the Node Definition

NODES = [
    {"id": "joe",      "ip": "192.168.x.x", "hostname": "joe",      "user": "openclaw"},
    {"id": "jack",     "ip": "192.168.x.x", "hostname": "jack",     "user": "openclaw"},
    {"id": "work-a",   "ip": "192.168.x.x", "hostname": "work-a",   "user": "openclaw"},
    {"id": "work-b",   "ip": "192.168.x.x", "hostname": "work-b",   "user": "openclaw"},
    {"id": "hobby",    "ip": "192.168.x.x", "hostname": "hobby",    "user": "openclaw"},
    {"id": "family",   "ip": "192.168.x.x", "hostname": "family",   "user": "openclaw"},
    {"id": "personal", "ip": "192.168.x.x", "hostname": "personal", "user": "openclaw"},
]

The infra and web nodes are intentionally excluded from Dashboard management. Architecturally, keeping infrastructure services under separate management makes for cleaner separation of concerns.

The SSH Key Distribution Problem

This is where I got stuck the longest. SSH from T440 (joe) to the other nodes wasn't working, so subscription info couldn't be fetched.

# Bulk key distribution via sshpass
for host in jack work-a work-b hobby family personal; do
    sshpass -p "$PASS" ssh-copy-id -o StrictHostKeyChecking=no openclaw@$host
done

Using ssh-copy-id via sshpass is a bit rough around the edges, but it's a home lab — pragmatism wins. In production, I'd use a proper provisioning tool.

Frontend Improvements

From Dots to Status Badges

Previously, a small dot (●) indicated online/offline status, but the visual feedback was poor. I switched to a badge format:

.status-badge {
    display: inline-flex;
    align-items: center;
    gap: 4px;
    padding: 2px 8px;
    border-radius: 12px;
    font-size: 11px;
    font-weight: 600;
}

.status-badge.online  { background: rgba(74,222,128,.15); color: #4ade80; }
.status-badge.offline { background: rgba(248,113,113,.15); color: #f87171; }
.status-badge.warning { background: rgba(251,191,36,.15);  color: #fbbf24; }

I consolidated the states into three: Running, Unreachable, and Stopped. The "warning" state triggers when ping response time exceeds 500ms.

Agent Tag Display

Each node card now shows which OpenClaw agents are running on that node.

// node_subscriptions.json (excerpt)
{
    "work-b": {
        "agents": ["techsfree-web", "techsfree-hr", "techsfree-fr"]
    }
}

I added a get_node_subscription() function on the backend that merges subscription data into the node API response:

def get_node_subscription(node_id: str) -> dict:
    try:
        with open(SUBSCRIPTIONS_FILE) as f:
            data = json.load(f)
        return data.get(node_id, {})
    except Exception:
        return {}

What I Removed

I removed the bot count stat — a number showing how many bots were running on each node. It was completely redundant with the agent tag display, and a raw number without context wasn't meaningful.

My rule for removal: "Would anyone suffer if this information disappeared?" If the answer is no, cut it.

Switching to a 4-Column Grid

To display 7 nodes cleanly, I switched to a 4-column grid layout:

.node-grid {
    display: grid;
    grid-template-columns: repeat(4, 1fr) !important;
    gap: 16px;
}

@media (max-width: 1200px) {
    .node-grid { grid-template-columns: repeat(2, 1fr) !important; }
}
@media (max-width: 900px) {
    .node-grid { grid-template-columns: 1fr !important; }
}

The !important declarations are there to override conflicts with existing CSS variables. Ideally, I'd clean up the design properly — but that's out of scope for today.

Lessons Learned

The longer you let a dashboard drift from reality, the harder it is to fix. Build the habit of updating it whenever nodes change.
Verify SSH connectivity before building SSH-dependent features. Discovering SSH doesn't work after you've built out the feature is a painful waste of time.
Dot indicators have terrible visibility. Badges with text labels are more accessible and universally clearer — especially for users with color vision deficiencies.

The next step is implementing node alerting. I want notifications to fire when a node goes unreachable.

Tags: homelab, dashboard, python, flask, css, ssh, nodejs

DEV Community