Koala

Posted on May 29

Onyx: I Built an Hermes Agent That Runs My Entire Server While I Sleep

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge

What I Built

Onyx is an autonomous infrastructure operator running 24/7 on my droplet.

He manages my entire stack: 6 Next.js deployments, 5 Docker containers, a Minecraft server, fail2ban, Nginx, and UFW. He also helps me write my undergraduate thesis.

The difference from every other "AI agent" project I've seen: Onyx doesn't wait for commands. He surfaces problems, patches vulnerabilities, and pushes work forward on his own. When I wake up, there's a session log waiting for me, not a to-do list.

The core idea: graduate an AI agent from assistant to operator. A chatbot with tools bolted on doesn't cut it. I wanted something that runs infrastructure while I'm eating dinner, asleep, or in class.

Demo

Onyx operates through Discord. A normal week:

🔴 3 AM — gateway process failure, no wake-up required

A gateway process had a stale PID. Onyx detected it, diagnosed the root cause, restarted it cleanly, and wrote a session log. I found out in the morning. Zero human intervention, zero downtime.

🟡 Dinner — 9 CVEs found across Docker containers

While I was eating, Onyx ran a routine audit, found 9 CVEs, rebuilt 3 container images from fresh base images, patched Python dependencies, hardened fail2ban (ban time: 600s to 24 hours), and verified every container came back healthy.

🟢 "Fix it" — two words, full tunneling deployment

My friends in Indonesia couldn't connect to the Minecraft server because their ISPs use carrier-grade NAT. I sent Onyx "fix it." He researched solutions, selected playit.gg, installed the tunneling agent, configured a systemd service, and optimized TCP keepalive parameters. All autonomous.

🧠 Accountability loop

Onyx noticed I kept asking for things but not acting on the output. He surfaced it: "You keep opening new loops and not closing them."

He was right. Now when I open a loop, Onyx tracks it until it's closed or explicitly shelved.

📚 Thesis research partner

I'm finishing my undergraduate thesis on emotional design in e-commerce UX — Norman's 3-level model applied to TikTok Shop, PLS-SEM with G*Power sample sizing. Onyx stays in the thesis workspace across sessions: tracking academic papers, organizing findings against my 6-hypothesis research model, and drafting sections.

When I disappear for days, he nudges me. When I return, he picks up where we left off. No re-explaining, no context lost.

Code

The system is built on Hermes Agent's native extension points.

Skills — the compounding knowledge base

30+ reusable skill files covering Minecraft management, Next.js deployment, VPS security, Discord formatting, and thesis workflows. Each one encodes actual mistakes and actual fixes.

Example: the minecraft-crafty-management skill knows:

Fabric mod compatibility must be checked against version 26.1.2, not 1.21.5
How to parse server logs for TPS degradation
That DCIntegration is broken and mc2discord should be used instead

Every correction I make becomes permanent. The skill library covers practically every routine task I'd otherwise be doing by hand.

MCP integration — Crafty Controller as a first-class tool

I wrote a custom MCP server for Crafty Controller 4.0 that exposes server status, actions, logs, backups, and console commands as native Hermes Agent tools. Onyx manages Minecraft without ever touching the Crafty dashboard.

# ~/.hermes/scripts/crafty-mcp-server.py
# Exposes Crafty Controller 4.0 API as MCP tools:
# - get_server_status(server_id)
# - send_console_command(server_id, command)
# - get_server_logs(server_id, lines)
# - trigger_backup(server_id)
# - start_server / stop_server / restart_server

Cron jobs — scheduled autonomous operations

Daily: Minecraft world backups
Every 6 hours: Full health check across all services, delivered to Discord
Every 5 minutes: Incident monitoring with auto-escalation if anomalies detected

All self-contained. All logged. All delivered to Discord.

Repository

Config files and skill library: github.com/ko4lax/onyx-backup

How I Used Hermes Agent

Three Hermes Agent capabilities made the difference between a script and an operator.

1. The skill learning loop

Every correction becomes permanent. When Onyx got the Minecraft version mapping wrong, I corrected it once. The skill file updated. It never repeated the mistake.

The library compounds. Each fix makes every future session better. I'm building a knowledge base that sticks, not fine-tuning a model.

2. Memory architecture (Honcho + LCM)

Onyx builds a persistent model of who I am across sessions:

I hate permission-seeking for routine actions
Discord tables need monospace code blocks to render
My Minecraft server is version 26.1.2, not the latest

I never repeat myself twice. LCM (lossless context management) ensures no session context evaporates when conversations run long. Honcho provides semantic recall across sessions so Onyx answers questions about past work without me explaining again.

3. Structured autonomy with real guardrails

Onyx follows an explicit autonomy decision tree with four action risk tiers:

Tier	Type	Behavior
T1	Read-only	Always autonomous — status checks, log reads, health pings
T2	Reversible local	Act without asking — restarts, config edits, routine deploys
T3	External effect	Confirm once — installs, firewall changes, service calls
T4	Destructive	Always escalate — data deletion, credential changes

The checks: is the action reversible? local-only? no credentials involved? unambiguous intent? All yes, act. Any no, escalate.

This is delegation with clear kill switches. The blast radius stays bounded.

Tech Stack

Layer	Technology
Agent	Hermes Agent (Nous Research)
Model	DeepSeek V4 Pro via OpenRouter
Infrastructure	VPS (4 vCPU, 8 GB RAM, Ubuntu 24.04)
Gateway	Discord (primary), CLI, webhook
Process management	PM2 for 6 Next.js apps, Docker Compose for Honcho stack
Web server	Nginx + Let's Encrypt SSL, reverse proxy to standalone Next.js
Security	UFW, fail2ban (hardened), Docker CVE scanning
Minecraft	Crafty Controller 4.0, Fabric mod loader, playit.gg tunneling
Memory	Honcho (semantic layer) + LCM (lossless session compaction)

What I Learned

Agents fail at the edges of their knowledge, not the center. The hardest bugs were wrong version numbers, broken integrations, and subtle API quirks that no amount of training data could have predicted. The skill file system solved this: every edge case gets encoded once and never hit again.

Permission-seeking kills autonomy. Early versions of Onyx asked me to confirm everything. It was useless. The tier system, built around reversibility rather than task type, was the unlock. T1 and T2 cover 90% of real operations. T3 and T4 are rare.

Memory is what separates a tool from a collaborator. Without persistent memory, every session starts from zero. With Honcho and LCM, Onyx knows my infrastructure topology, my preferences, my thesis structure, and my open loops. That context makes autonomous action trustworthy.

What's Next

[ ] Expand the MCP surface to cover Nginx config management
[ ] Add anomaly detection to the 5-minute incident monitor
[ ] Close the thesis loop (defense incoming)
[ ] Open-source the Crafty Controller MCP server as a standalone package

Onyx is running on my VPS right now, probably checking server health while you read this.

DEV Community