DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

hermes-safety-rig: a drop-in reliability layer for Hermes Agent

Hermes Agent Challenge Submission

This is a submission for the Hermes Agent Challenge

What I Built

hermes-safety-rig is a Python wrapper that gives Hermes Agent four reliability guarantees on every tool call: validated arguments, blocked off-allowlist egress, capped daily spend, and shape-validated output.

Hermes Agent is designed to live on your server, learn over time, and run continuously. That long-running property is the whole point, and it's also the failure surface. A model whose tool-call shape drifts on call 10,000 is a model that quietly burns money or leaks data months in. The rig is the boring, narrow safety net you put between Hermes and the rest of your infrastructure.

Four failure modes, four small primitives:

Failure Primitive What it does
Tool arg hallucination agentvet Validate every tool call against a schema; throw ToolArgError with an LLM-friendly retry hint
Surprise egress (secrets to attacker.com) agentguard Declarative domain allowlist; raise if a tool tries to fetch off-list
Runaway cost AgentBudget Per-run token and USD caps; trip before the next call would exceed
Unparseable output to downstream agentcast Structured-output validate-and-retry loop

Demo

examples/persistent_inbox_triage.py reproduces three failure modes on demand. A calendar invite called with the wrong type (start_iso=20260520 instead of an ISO string). A prompt-injected URL fetch that points at an attacker domain. A thread-summarization loop that would run past a $0.25 daily cap.

Run without the rig and watch each one go through. Run with the rig and watch each one get caught.

python examples/persistent_inbox_triage.py --without-rig
python examples/persistent_inbox_triage.py --with-rig
Enter fullscreen mode Exit fullscreen mode

Sample run, with the rig wired in:

=== Failure mode 1: bad argument type ===
  caught by rig: arg 'start_iso': expected string, got int.

=== Failure mode 2: prompt-injected egress ===
  caught by rig: egress to 'attacker.example.com' blocked; allowlist=['calendar.google.com', 'gmail.googleapis.com']

=== Failure mode 3: cost runaway ===
  caught by rig: BudgetExceededError: USD cap 0.25 would be exceeded (current 0.00 + this 0.50)
Enter fullscreen mode Exit fullscreen mode

Code

https://github.com/MukundaKatta/hermes-safety-rig

The whole rig is around 150 lines. Four classes of failure, four small primitives, one composition class.

My Tech Stack

  • Python 3.10+
  • agentvet for tool-arg schema validation
  • agentguard for declarative domain allowlist
  • AgentBudget for per-run cost and token caps
  • agentcast for structured-output validate-and-retry
  • Hermes Agent (the framework under test)

How I Used Hermes Agent

Hermes is the runtime. The rig wraps a Hermes tool function before it gets registered as a skill, so every tool call the model makes goes through the four checks first.

The interesting part is which Hermes capabilities make each primitive matter.

Skill creation from experience. Hermes derives new skills from observed sessions. That means the input shapes for a "skill" are an emergent property of the model, not a fixed contract. agentvet is the smallest cost way to keep that contract honest at the boundary.

Persistent memory across sessions. A bad output ingested into memory persists. agentcast's output-shape validation prevents malformed tool output from poisoning the model of the user across days or weeks.

Multi-step planning with tool use. Hermes can chain dozens of LLM calls per task. Without AgentBudget, a poorly bounded planning loop costs more than a $5 VPS earns back. The cap stops the runaway before it happens.

Model-agnostic backend. Hermes lets you swap models on a flag. That same flexibility means a model swap silently changes tool-call faithfulness. The rig is the layer that doesn't care which model you swapped to.

The composition is intentionally thin. Each primitive owns its own logic in its own published package. The rig is a glue file that gives Hermes one decorator to wrap tools with. Brand new code, written during the contest window; the four primitives are pre-existing third-party dependencies used through their public API.


If you're running Hermes Agent in production, please consider the rig before you give it the keys to your Gmail. The downside of a long-running agent without guardrails is exactly the kind of bill, leak, or silent corruption that nobody notices until the third incident report.

Top comments (0)