cucoleadan

Posted on May 28 • Originally published at vibestacklab.substack.com on May 26

My Hermes AI Agent Maintenance Routine For Maximum Reliability

#agents #maintenance #cron #reliability

This post was originally published on my Substack publication as My Hermes AI Agent Maintenance Routine For Maximum Reliability.

Last week, I spent a few days blaming the model before I realized Hermes was waiting on a memory recall timeout.

When the response time got worse, I assumed provider latency because I'd changed models before and knew that layer could get noisy.

The real problem sat one layer earlier, inside the retrieval path I hadn't checked yet.

My external memory provider, Hindsight, threw a retrieval error, Hermes retried, and the request stalled because the memory system was broken before the model ever had a chance to answer.

A few days later, my Friday Hermes health-summary job missed its Telegram report over a long weekend. The stack still answered messages, but the missing report told me the scheduled workflow had stopped producing the artifact I expected to see.

Hermes maintenance means checking the layers around the model before you blame the model. The routine I use now is a set of cron-backed prompts that check memory, gateways, scheduled jobs, model IDs, and backups, then stop before they make changes that need approval.

Most install guides skip this part because they get you to the first successful command, then leave you with a working AI control plane and no maintenance loop around it.

Hermes feels like one system when it works, but it routes through models, memory, gateways, skills, cron jobs, provider keys, and local files, so the fault can sit in any one of those layers when the stack starts behaving strangely.

And don't get me wrong, I've never had a single issue with the actual Hermes code compared with my time using OpenClaw, but I have had issues with models, providers, and third-party integrations.

The model gets blamed first because it's the visible part of the stack, while the failure usually starts somewhere less obvious.

This article is the maintenance routine I use now, rewritten as prompts you can hand to your agent.

In this article:

A maintenance routine you can run after Hermes is installed, so silent drift doesn't turn into a broken workflow.
Copy-paste cron-job prompts for daily, weekly, and monthly checks across memory, gateways, scheduled jobs, providers, and backups.
A simple approval rule that lets agents report problems without giving them permission to delete, update, rotate, restore, or rewrite anything.
A rollout path for turning maintenance into useful visibility instead of another noisy automation.

After Hermes Install

The first successful Hermes run can trick you into treating setup as finished before operations have even started. You connect a provider, configure the gateway, test memory, send a message through Telegram or the TUI, and watch Hermes answer with context from the project you care about.

That moment is where the stack leaves the install guide and becomes something you have to run. Old configs can keep stale model names, scheduled jobs can miss their expected output, memory calls can slow down, and backups can look comforting until the first restore test fails.

I treat those failures as normal infrastructure behavior because a control plane becomes trustworthy only after you can see whether its dependencies are still healthy.

That lesson showed up during my OpenClaw to Hermes migration, even though the migration itself went smoothly. The first week felt better because Hermes followed instructions more closely, kept memory behavior cleaner, and made the gateway setup feel less stitched together.

The first problems were small enough to ignore in the moment but specific enough to matter later. An imported publishing skill failed because its YAML header was malformed, one environment variable was missing from the runtime, and token usage climbed while memory ingestion ran behind the workflow I was paying attention to.

None of those problems killed the setup, but each one pointed at the same operational truth: the model is only one layer inside a wider system. I stopped treating maintenance as an occasional chore once I realized a scheduled prompt could check those layers before the next failure stole an afternoon.

Cron Prompts Beat Commands

The earlier version of this routine had shell commands sprinkled through the article because that was how I checked my own server. Commands are useful when your environment matches mine, but they don't travel cleanly across Windows, Linux, Docker, hosted runners, local agents, and the custom glue every serious stack accumulates over time.

The official Hermes Agent docs are where I would start for setup details. This piece starts after setup, when the question changes from "Can Hermes run?" to "Can I trust this workflow tomorrow?"

Prompts travel better because they describe the job instead of assuming the tool. A cron-backed agent can inspect logs, check timestamps, call a gateway, read a config file, compare recent output, or ask for approval using the tools available inside its own environment.

If the maintenance prompt needs to reach outside Hermes, the same decision from When to Use MCPs, CLIs, or Your Own Tool applies here: use the smallest interface that can inspect the system cleanly without turning one check into a brittle integration project.

A scheduled prompt still needs firm boundaries because a useful maintenance job names the layer being checked and asks for evidence before it reports confidence. The report should be readable at a glance, but the agent should refuse to delete, update, rotate, restore, or rewrite anything without approval.

That boundary turns maintenance automation into a reporting system instead of a new source of damage. I want the agent to notice problems before I do while every irreversible action still comes back to me as a decision.

Three Maintenance Layers

My Hermes maintenance routine uses three layers that map cleanly to the way the stack fails: updates, cleanup, and health checks. Those labels keep the job concrete enough for a scheduled agent to report on the system without turning the prompt into a vague request to "check Hermes."

This is the operational side of the $30 Hermes stack, because a cheaper and more flexible agent setup only stays useful if the layers around it keep working.

The update layer asks whether something changed underneath the workflow while I was focused on using it. Providers rename models, preview routes become stale, plugins move, skills change formats, and memory backends update their APIs.

The cleanup layer asks whether the stack has accumulated enough junk to start changing behavior. Logs grow, sessions pile up, cached files stick around, and memory keeps old context long after the project has moved on.

The health-check layer answers the operational question before I start relying on the stack again. Before the workday starts, I want evidence that the gateway answers, the provider route works, scheduled jobs are producing output, and memory can retrieve a recent decision without timing out.

The layers keep the routine small enough to survive a busy week without reducing the review to a shallow status ping. Maintenance disappears when it depends on a vague intention, while a scheduled job with named layers can keep running after the calendar gets crowded.

Daily Hermes Health Check Prompt

The daily job should be boring enough that you can read it every morning without turning the start of the day into a debugging session. Its job is to tell you whether the stack is ready for work, then stop before it tries to repair anything.

Use this as a read-only cron job near the start of the workday, then adapt the gateway name, job names, and project references to match your own setup.

Create a Hermes cron job called "Daily Stack Pulse" that runs every morning at 8:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a daily read-only Hermes stack pulse check. Make no changes: do not delete files, rotate keys, update packages, prune memory, restore backups, or rewrite configuration.

1. Gateway. Send or simulate one normal request through the Telegram gateway and confirm it responds.
2. Scheduled workflows. Run `hermes cron list` and inspect ~/.hermes/cron/output/ for the latest runs of jobs tagged or named for morning briefing, health summary, memory maintenance, publishing, client, or paid workflows. Confirm each ran inside its expected window.
3. Logs. Scan recent warnings and errors from the Hermes runner (~/.hermes/logs/), the model provider, the memory layer (hindsight), the gateway, and the scheduler.
4. Memory recall. Run one hindsight_recall query against an active project decision (use "All Agents Considered newsletter" or "Vibe Stack Lab library repo"). Report whether the result was relevant, stale, missing, or slow.

Return a short report with exactly these sections, one sentence per item:

PASS:
Healthy checks with evidence.

WARN:
Items needing attention later, with the layer named in parentheses.

FAIL:
Broken or missing items that block reliance on the stack today.

APPROVAL NEEDED:
Any action that would delete, update, rotate, restore, rewrite, prune, or change provider behavior. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

The report matters more than the scheduler that happens to run it, as long as the result gives you enough evidence to trust or question the stack. You can run the prompt from cron, a recurring Hermes task, a hosted automation, a CI runner, or any agent runner that has permission to inspect the stack.

I care most about evidence that the gateway answered, the important jobs ran, memory recall still works, and recent errors haven't turned into a pattern. Once the report names the failed layer, the next step becomes smaller because the investigation has a place to start.

Weekly AI Agent Drift Review Prompt

My quiet cron failure is the reason I care more about weekly drift than a one-time setup checklist. A job definition sitting in a scheduler proved nothing once the Friday health-summary report stopped reaching Telegram.

That is the same reason my morning Hermes workflow checks visible output instead of trusting that a scheduled task exists somewhere in a config file.

The weekly review looks for slow changes that don't announce themselves while normal work still appears to be moving. Disk pressure, stale output, growing logs, slow memory, and old model IDs rarely feel urgent while they are accumulating, but they become expensive once they pile up inside a broken workflow.

Use this prompt near the end of the week, when the report can shape a short maintenance pass instead of interrupting deep work in the middle of a day.

Create a Hermes cron job called "Weekly Drift Review" that runs every Sunday at 9:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a weekly read-only Hermes drift review. Make no changes. If a fix is obvious, list it under RECOMMENDED ACTIONS or APPROVAL NEEDED but do not execute.

1. Storage growth. Measure size of ~/.hermes/logs/, ~/.hermes/sessions/, ~/.hermes/cache/, ~/.hermes/memory/, ~/.hermes/cron/output/, /tmp/hermes*, and any backup folder under ~/.hermes/. Compare to last week if a snapshot exists at ~/.hermes/cron/output/drift-snapshot.json. Save a fresh snapshot at that path after measuring. Flag any folder that grew more than 25 percent or crossed 1GB.

2. Scheduled jobs. Run `hermes cron list`. For each job, confirm it exists, has run inside its expected window, and produced a visible artifact in ~/.hermes/cron/output/ or the delivery channel. A job definition with no recent run counts as broken.

3. Memory recall. Run three hindsight_recall queries: one active project ("All Agents Considered newsletter"), one older project ("Build It #2 AI Code Review Agent"), one recent decision ("Vibe Stack Lab library repo"). Report each as accurate, stale, empty, or slow.

4. Provider and model config. Read ~/.hermes/config.yaml. Flag preview or dated model names (anything with -preview, -beta, dated suffixes, or matching known-deprecated IDs), fallback routes pointing at old IDs, and project-level overrides under ~/.hermes/profiles/*/config.yaml that diverge from the main config without obvious reason.

5. Logs. Scan the last 7 days of ~/.hermes/logs/ for repeated errors, retry loops, auth failures, timeouts, and missing-env-var messages. Group by layer (runner, provider, memory, gateway, scheduler).

Return a report with exactly these sections:

DRIFT:
Storage growth and configuration drift observed this week.

BROKEN:
Jobs, routes, providers, memory calls, or gateways that failed and need repair. Name the layer.

STALE:
Model IDs, project configs, skills, outputs, or memory entries that look outdated.

RECOMMENDED ACTIONS:
Small proposed fixes. For each: action, risk (low/med/high), expected benefit, approval needed (yes/no).

APPROVAL NEEDED:
Anything that changes files, deletes data, updates Hermes, rotates keys, changes providers, prunes memory, restores backups, or edits scheduled jobs. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

That weekly prompt would have caught my quiet cron failure earlier because a cron entry sitting in a file doesn't prove the workflow is alive. The agent has to find the last run, the last output, or the last expected message before it claims the job is healthy.

The same weekly review helps with memory issues because recall drift often feels like model weakness from the outside. When retrieval returns stale or empty context, the report should call that a memory-layer problem before anyone starts blaming generation quality.

Monthly Hermes Assumptions Review Prompt

The monthly job checks whether the assumptions under the stack still hold after weeks of normal use. Provider behavior, model IDs, permissions, backups, and release notes deserve a slower review because mistakes in those layers can create bigger messes than a missed daily report.

Run this one when you have enough time to read the report and decide what should change, because the monthly review is the one most likely to recommend actions that touch live state.

Create a Hermes cron job called "Monthly Assumptions Review" that runs on the 1st of every month at 10:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal, web, and file. Use this exact prompt body for the job:

---
Run a monthly read-only Hermes assumptions review. Make no changes: do not update Hermes, change providers, rotate keys, restore backups, prune memory, delete files, rewrite configs, or edit scheduled jobs.

1. External change summary. Check for changes that could affect this stack in the last ~30 days:
   - Hermes Agent: `cd ~/.hermes/hermes-agent && git log --since="30 days ago" --oneline` and check release notes
   - Plugins and skills: list anything in ~/.hermes/plugins/ and ~/.hermes/skills/ modified in the last 30 days
   - Provider changes: scan OpenRouter and opencode-go model lists for renamed, deprecated, or newly preview-flagged IDs that match anything in ~/.hermes/config.yaml
   - Gateway, memory backend (hindsight), scheduler, and backup tool changelogs if accessible
   Summarize only changes relevant to this stack.

2. Provider and model ID audit. Grep every config layer for model IDs:
   - Main: ~/.hermes/config.yaml
   - Profiles: ~/.hermes/profiles/*/config.yaml
   - Cron jobs: ~/.hermes/cron/jobs.json
   - Skills referencing models: search_files for "model:" or model IDs under ~/.hermes/skills/
   - Scripts under ~/.hermes/scripts/
   - Env files: ~/.hermes/.env and any *.env
   Flag preview IDs (-preview, -beta, dated suffixes), known-deprecated IDs, missing fallbacks, and defaults that conflict between layers.

3. Health sweep. Quick check across:
   - Gateway response (one Telegram round-trip)
   - Provider reachability (one ping each to configured providers)
   - Memory recall (hindsight_recall on an active project)
   - Scheduler activity (hermes cron list plus recent output)
   - Storage headroom (df -h on ~/.hermes/ partition)
   - Backup completion (most recent backup artifact timestamp and size)
   - Key availability (env vars and 1Password references exist, not the values)
   - Permissions (~/.hermes/ ownership and mode)

4. Restore test. Pick one non-sensitive backup artifact under ~/.hermes/backups/ or wherever backups land. Copy to /tmp/hermes-restore-test/, inspect contents, confirm it opens and matches expectations. Do not overwrite live files. Delete the temp copy after inspection.

5. Approval-gate review. List every workflow (cron job, skill, plugin, script) that can delete files, prune memory, rotate keys, change providers, restore backups, update Hermes, edit configs, or send messages outside this workspace. For each, confirm whether it requires explicit approval or runs automatically.

Return a report with exactly these sections:

ASSUMPTIONS STILL VALID:
Operational assumptions that still look safe.

ASSUMPTIONS TO RECHECK:
Provider, memory, gateway, scheduler, backup, or permission assumptions that may have drifted. Name the layer.

RESTORE TEST:
Artifact inspected, safe location used, and result.

PROPOSED CHANGES:
Each with reason, risk (low/med/high), rollback notes, approval status.

APPROVAL NEEDED:
Every action that would modify the stack or touch live data. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

I review provider model IDs here instead of waiting for a stale preview route to break under load. A fallback route in an old project config can keep calling yesterday's model even after the main Hermes provider has moved to the stable ID.

The Hindsight timeout became confusing because the symptom pointed at the wrong layer. Hermes felt slow, I blamed the model, and the retrieval path had already burned the time before generation started.

Approval Gate for Maintenance Jobs

Every scheduled maintenance job should carry the same approval rule because the boundary gets easy to forget after the first few reports look useful. Read-only inspection can run freely, while destructive or identity-changing work still needs a human decision.

If you haven't built that habit yet, start with the approval gate setup before you let a maintenance prompt touch files, providers, keys, or backups.

Add this block to the end of every maintenance prompt that runs on a schedule, especially if the agent has access to files, keys, backups, provider settings, or outbound channels.

Approval rule for this maintenance job:

You may observe, inspect, summarize, classify, and recommend without asking first.

You must ask for approval before any action that deletes files, prunes memory, rotates keys, changes providers, restores backups, updates Hermes, edits configuration, changes scheduled jobs, rewrites prompts, sends external messages, or changes permissions.

When approval is needed, return a proposal with the issue, suggested action, expected benefit, risk level, affected files or systems, rollback notes, and the exact command or tool call you want to run.

If the risk is unclear, classify the action as approval needed and wait.

That rule keeps the maintenance agent useful without letting it become a cleanup bot with too much confidence. The agent can prepare the decision, but I still want to make the decision when live state changes.

Quiet Agent Failures

The failures that cost time are small enough to miss and specific enough to blame on the wrong thing. My cron failure didn't crash the stack because it stopped doing work in a corner I wasn't watching.

The model ID drift behaved differently because the main provider setup looked current while an older route still pointed somewhere stale. The visible symptom showed up as slower Hermes responses and memory behavior that looked worse than it was.

The Hindsight timeout changed how I diagnose agent slowness in every workflow that depends on memory. When an AI tool slows down, I check the retrieval chain before I blame the model because the model may be downstream from the delay.

Maintenance doesn't prevent every failure, but it reduces the time spent accusing the wrong layer. Once you can name whether the issue sits in routing, memory, scheduling, storage, backup, skills, or config, the repair becomes less mysterious.

How to Roll Out the Routine

I would start with one weekly maintenance job before adding daily and monthly jobs. Weekly reporting is frequent enough to catch drift, and a month of reports gives you enough signal to decide whether the daily pulse is worth the extra noise.

Once the weekly report proves useful, add the daily pulse for the pieces you depend on most. My daily set covers gateway response, scheduled job output, memory recall, and provider reachability because those failures change whether I can trust the stack that morning.

The monthly review should stay slower and more deliberate because updates, provider IDs, backup restores, and permission gates need more attention than a quick morning report can give them.

Your stack may use different names, but the shape should stay the same. The scheduled agent observes the stack, reports the failed layer, proposes small actions, and stops before touching anything that could create real damage.

Failure Limits

Maintenance won't make the stack perfect, and the prompts shouldn't pretend they can. Provider outages, weak retrieval, bad project context, poor model fit, and bad release notes can still turn into manual work.

The routine also leaves approval gates in place for every action that changes live state. If Hermes wants to prune memory, change providers, delete logs, rotate keys, restore a backup, or update itself, I still want to approve that action before it touches anything real.

That boundary keeps the routine useful because the agent can notice problems before I do, while every action that changes the system comes back as a proposal I can read.

Bottom Line

Hermes feels like one system when it's working, but underneath it's a control plane sitting on top of models, memory, gateways, cron jobs, files, skills, providers, and backups. When one layer drifts, the whole experience gets worse even if the visible symptom looks like a slow model or a lazy agent.

The maintenance loop keeps those layers visible through a daily pulse, a weekly drift review, and a monthly assumptions review. For most personal agent stacks, that rhythm is enough to know where to look when something breaks.

Start with the weekly prompt and run it long enough to see whether the reports change your behavior. If the reports help you catch missed jobs, stale model IDs, slow memory, or backup gaps, add the daily pulse and monthly review around the same approval rule.

The install guide gets Hermes running, and the maintenance loop is what keeps it worth trusting.