hermes-safety-rig: a drop-in reliability layer for Hermes Agent

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission

This is a submission for the Hermes Agent Challenge

What I Built

hermes-safety-rig is a Python wrapper that gives Hermes Agent four reliability guarantees on every tool call: validated arguments, blocked off-allowlist egress, capped daily spend, and shape-validated output.

Hermes Agent is designed to live on your server, learn over time, and run continuously. That long-running property is the whole point, and it's also the failure surface. A model whose tool-call shape drifts on call 10,000 is a model that quietly burns money or leaks data months in. The rig is the boring, narrow safety net you put between Hermes and the rest of your infrastructure.

Four failure modes, four small primitives:

Failure	Primitive	What it does
Tool arg hallucination	`agentvet`	Validate every tool call against a schema; throw `ToolArgError` with an LLM-friendly retry hint
Surprise egress (secrets to attacker.com)	`agentguard`	Declarative domain allowlist; raise if a tool tries to fetch off-list
Runaway cost	`AgentBudget`	Per-run token and USD caps; trip before the next call would exceed
Unparseable output to downstream	`agentcast`	Structured-output validate-and-retry loop

Demo

examples/persistent_inbox_triage.py reproduces three failure modes on demand. A calendar invite called with the wrong type (start_iso=20260520 instead of an ISO string). A prompt-injected URL fetch that points at an attacker domain. A thread-summarization loop that would run past a $0.25 daily cap.

Run without the rig and watch each one go through. Run with the rig and watch each one get caught.

python examples/persistent_inbox_triage.py --without-rig
python examples/persistent_inbox_triage.py --with-rig

Sample run, with the rig wired in:

=== Failure mode 1: bad argument type ===
  caught by rig: arg 'start_iso': expected string, got int.

=== Failure mode 2: prompt-injected egress ===
  caught by rig: egress to 'attacker.example.com' blocked; allowlist=['calendar.google.com', 'gmail.googleapis.com']

=== Failure mode 3: cost runaway ===
  caught by rig: BudgetExceededError: USD cap 0.25 would be exceeded (current 0.00 + this 0.50)

Code

https://github.com/MukundaKatta/hermes-safety-rig

The whole rig is around 150 lines. Four classes of failure, four small primitives, one composition class.

My Tech Stack

Python 3.10+
agentvet for tool-arg schema validation
agentguard for declarative domain allowlist
AgentBudget for per-run cost and token caps
agentcast for structured-output validate-and-retry
Hermes Agent (the framework under test)

How I Used Hermes Agent

Hermes is the runtime. The rig wraps a Hermes tool function before it gets registered as a skill, so every tool call the model makes goes through the four checks first.

The interesting part is which Hermes capabilities make each primitive matter.

Skill creation from experience. Hermes derives new skills from observed sessions. That means the input shapes for a "skill" are an emergent property of the model, not a fixed contract. agentvet is the smallest cost way to keep that contract honest at the boundary.

Persistent memory across sessions. A bad output ingested into memory persists. agentcast's output-shape validation prevents malformed tool output from poisoning the model of the user across days or weeks.

Multi-step planning with tool use. Hermes can chain dozens of LLM calls per task. Without AgentBudget, a poorly bounded planning loop costs more than a $5 VPS earns back. The cap stops the runaway before it happens.

Model-agnostic backend. Hermes lets you swap models on a flag. That same flexibility means a model swap silently changes tool-call faithfulness. The rig is the layer that doesn't care which model you swapped to.

The composition is intentionally thin. Each primitive owns its own logic in its own published package. The rig is a glue file that gives Hermes one decorator to wrap tools with. Brand new code, written during the contest window; the four primitives are pre-existing third-party dependencies used through their public API.

If you're running Hermes Agent in production, please consider the rig before you give it the keys to your Gmail. The downside of a long-running agent without guardrails is exactly the kind of bill, leak, or silent corruption that nobody notices until the third incident report.

DEV Community