cucoleadan

Posted on May 21 • Originally published at vibestacklab.substack.com on Apr 28

How to Add Approval Gates to Your Hermes Agent

#agents #automation #prompts #security

This post was originally published on my Substack publication as How to Add Approval Gates to Your Hermes Agent.

Most people who try AI agents go through the same cycle. They set it up, give it access to everything, and watch it do impressive things for a week. Then something goes wrong, like a wrong message or a broken file, and they shut it down and go back to doing things manually.

The problem was skipping the safety net.

I went through that cycle twice. The first time, my agent sent an email to a client with the wrong name, and I mean a completely different person, not a typo. I found out when the client forwarded it back asking if I was working with someone else. I spent the next three weeks manually reviewing everything the agent touched. That burned more time than if I had done the work myself.

The second time, I set up gates before giving the agent access. Drafts and system changes came to me for review. Spending above a threshold required approval. I let it run, and nothing went wrong. The gates caught mistakes before they went live.

I’ll show you how to build approval gates into any Hermes workflow. Gate #1 takes 15 minutes and stops your agent from sending anything external without your OK. Gate #2 adds protection against unwanted system changes and puts dollar limits on spending. Start with the first one. Add the others when you’re ready.

If you’re new to Hermes itself, start with Hermes Is the AI Agent OpenClaw Promised to Be.

In this article:

What an approval gate is, and why it isn’t a roadblock
The three types of gates every AI workflow needs
Step-by-step setup for each gate, from beginner to advanced
A simple framework to decide what to gate and what to leave free
The three mistakes people make with approval gates, and how to avoid them

A Checkpoint Is Not a Roadblock

The word gate makes people think of barriers and delays. That’s the wrong mental model. An approval gate is more like a checkpoint at the end of an assembly line. The work happens at full speed. The checkpoint keeps defects from shipping.

Three patterns exist for keeping humans involved in AI workflows. Human-in-the-loop means the agent stops and asks you before taking an action. You review, you approve, the agent continues. Human-on-the-loop means the agent runs autonomously, but you can watch what it does and intervene if something looks wrong. Full autonomy means you set it up and never look at it again.

Shopify defaults to human-in-the-loop by design for anything that touches production systems. LangChain found that most organizations use approval checkpoints as their primary guardrail. The EU AI Act requires evidence of appropriate oversight for each AI system. These are standard practice.

The same principle works for solo operators. You need a simpler version.

The Three Gates You Need

Every AI workflow that touches the outside world or modifies your data needs at least one gate.

The Send Gate. Nothing goes external without your OK. Emails, social posts, client communications, any message that carries your name. Your agent drafts, delivers to you, and waits. You review and approve the send. That’s the gate most people need first.

The Change Gate. Nothing modifies your systems without your OK. File edits, database updates, configuration changes. Your agent identifies what needs to change, shows you the proposed change with context, and waits for confirmation.

The Spend Gate. Nothing costs money without your OK. Paid API calls above a threshold, tool purchases, subscription changes. Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses and asks you.

Each gate protects something different: your reputation, your data, your wallet. You don’t need all three on day one. Start with the Send Gate.

Gate #1: The Send Gate (Start Here)

That’s the one that fixes the wrong-name-in-an-email problem. The setup takes about 15 minutes. You build a workflow where the agent drafts everything, but you control the final step.

The workflow has four steps:

Step 1. The agent drafts the content. An email, a social post, a client response, anything.

Step 2. The agent delivers the draft to you through chat, email, or a file. It doesn’t send it anywhere, just hands it to you for review.

Step 3. You review the draft and fix anything that needs fixing. Reply with your approval or your corrections.

Step 4. The agent sends or publishes the approved version. If you asked for changes, it revises and shows you the updated version.

In Hermes, the Send Gate works best as two separate pieces. The first is a standing rule in project memory. The second is the cronjob or task that runs under that rule. If project memory still feels abstract, my article on infinite memory explains why these standing rules matter.

Example 1: Save this in Hermes memory

Treat this as a starter template, not a fixed script. You may need to change the approval words, the delivery channel, or the types of content it covers based on how Hermes is set up in your project.

You are my Content Assistant. Your job is to draft content for review.

When I give you a content request (email, social post, client response):

1. Draft the content based on my instructions and the project brief.
2. Present the draft clearly labeled "DRAFT [FOR REVIEW]".
3. Don't send, publish, or share the content anywhere.
4. Wait for my approval or my requested changes.
5. If I request changes, apply them and present the revised draft.
6. Only when I explicitly say "approved" or "send it", take the final action.

Always include a brief note at the end explaining what you did and why.

Example 2: Use this as a Hermes cronjob

This works best when Hermes already has access to the inputs it needs, such as meeting notes, a calendar, or a project brief, and already knows where to send drafts back to you. You may need to change the schedule, the source it reads from, or the format of the output to fit your workflow.

Every weekday at 9:00 AM, review yesterday's meeting notes and draft any follow-up emails that need to be sent.

Present each email as "DRAFT [FOR REVIEW]".
Do not send anything automatically.
Wait for my approval before any email goes out.

If there are no follow-ups to draft, tell me that no action is needed today.

If you want to see cronjobs in action before you build this one, the Hermes morning briefing workflow shows a complete example.

Watch Out: If your agent has tool access that lets it send emails or post to social media directly, make sure the prompt overrides those tools. The approval step must be the only path to external action.

If you’re still wiring up Hermes tools, memory, and integrations, the Hermes setup guide covers the stack behind workflows like this.

Gate #2: The Change Gate (Level Up)

Once your Send Gate works, add protection against unwanted system changes. This gate matters when your agent interacts with files, databases, or any system where a bad edit breaks something real.

The agent identifies what needs to change: which record, which field, and what the new value should be. Vague requests like “update the database” fail this gate.

The agent shows you the proposed change with full context: current state, new state, why the change is needed, and what happens if the change goes wrong.

You approve or reject. If you reject, the change never happens. If you approve, the agent executes it and confirms the result.

The rollback plan is simple. If a change causes problems, you tell the agent to reverse it. Because the agent showed you what it wanted to change before doing it, it can undo the change on request.

Use this prompt for a research agent that updates your knowledge base:

When you find information that should update the project knowledge base:

1. Show me the proposed change with this format:
   - Current value: [what exists now]
   - Proposed value: [what you want to change it to]
   - Reason: [why this change is needed]
   - Source: [where you found this]

2. Wait for my approval before making any changes.
3. If I approve, make the change and confirm what was updated.

4. If I reject, don't make the change. Log the rejection in the project notes.

Never modify files, databases, or project memory without going through this process first.

This takes about 20 minutes on top of your Send Gate. The time investment is worth it the first time your agent wants to overwrite a file with outdated information.

Gate #3: The Spend Gate (Advanced)

This gate protects your wallet. AI agents can run API calls, subscribe to tools, and make purchases if you give them access. Without a spend gate, a runaway loop of API calls can cost hundreds before you notice.

The setup relies on spending thresholds in your project memory. Set a dollar limit that matches your comfort level, whether that’s $5 per transaction or $50. Pick the number that lets you sleep well.

Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses, shows you the estimate, and waits for approval.

Add this to your prompt:

Before taking any action that costs money (API calls, tool purchases, subscriptions):

1. Estimate the cost.
2. If the cost is below $10, proceed automatically and log the expense.
3. If the cost is $10 or above, pause and show me:
   - What you want to do
   - Why it is needed
   - Estimated cost
   - Free alternatives, if any
4. Wait for my approval before proceeding with actions that cost $10 or more.

Keep a running total of all expenses in the project notes.

Adjust the threshold to your needs. The point is to keep you informed when the agent is about to spend money that matters. This takes about 15 minutes on top of the others. If your agent doesn’t have spending access, skip this one.

To Gate or Not to Gate

You can’t gate every action. If you do, your agent becomes a slow typist that asks permission before every keystroke. At that point, you might as well do the work yourself.

Low risk, high volume. No gate needed. File organization, summarization, categorization, formatting. The worst thing that happens is a slightly messy summary, and you fix that in seconds.

Medium risk, moderate volume. Review gate. Draft emails, content suggestions, data analysis. The agent produces the work, you review it before it goes anywhere. The Send Gate handles this category.

High risk, low volume. Full gate. External communications, system changes, spending. The agent pauses, explains what it wants to do, and waits for explicit approval. All three gates cover this category.

To apply this to your own workflows, list every task your agent handles. Write down the worst thing that could go wrong. If the mistake costs you money, damages your reputation, or breaks a system, gate it.

If the mistake is annoying but easy to fix, let it run free and correct course when needed.

The Three Mistakes People Make

Most failures come from one of these three patterns.

Gating everything. This turns your AI into a slow typist. You spend an hour approving every sentence and paragraph, then realize you could have done the work yourself. The fix: apply the decision framework above. Gate only what needs gating.

Gating nothing. That’s how your AI sends wrong emails to clients, overwrites production files, and racks up unexpected charges. You give the agent full autonomy on day one. Something goes wrong. You stop using the agent entirely. The fix: start with the Send Gate. Add the others as your trust grows.

Gating without context. A vague approval request forces you to dig through the agent’s reasoning to figure out if the change is safe. The fix: require the agent to show the current state, the proposed change, and the reason for the change. A good gate gives you everything you need to decide in under 10 seconds.

Trust the Process but Keep the Net

You stop babysitting and stop opening every draft with a knot in your stomach. You give your agent a task, trust the process, and review the output at the checkpoint. Most of the time, you approve it without changes. Occasionally, you catch something and fix it. Either way, the work moves forward.

The trick is to start tight and loosen up over time. In week one, you review every draft. By month two, you move the Send Gate to sample mode: review every third draft, trust the rest. You haven’t caught a mistake in weeks. The gate stays in place, but you use it less.

That’s the goal. Reduce gates as the system matures. Real delegation is only possible when you know the safety net works. Once you see the net catching problems, you can let the agent fly higher.

Building the right gates is all delegation requires. Once they’re in place, you can let it run.

DEV Community