Running automation agents for a while now. Most work fine hands-off. But one of them - my project status agent - kept making decisions that felt right in isolation but wrong in context.
So I added a human approval step. Not a "review and confirm" UI widget. An actual veto gate in the workflow itself: the agent drafts the action, pauses, and waits for my explicit go-ahead before doing anything irreversible.
Here's what I didn't expect: the first thing to break wasn't the agent logic. It was my own habits.
The Problem I Was Trying to Solve
My status reporting agent does three things: pulls data from Jira, formats a weekly PM summary, and posts it to the team Slack channel. Straightforward automation.
Except twice in three months it posted something embarrassing. Once it included a stale blocker that had been resolved 48 hours prior. Once it flagged a team member's ticket as overdue when they'd actually shipped early and the tracker hadn't caught up.
Neither was catastrophic. Both were awkward.
The traditional fix would be "add better logic to catch these cases." I tried that. Added freshness checks, added resolved-status validation. Still leaked edge cases.
So I took a different approach: make human review a structural part of the workflow, not a safety net I bolt on when things go wrong.
What I Actually Built
The architecture is boring. Agent generates the Slack message draft, posts to a private review channel, waits for a thumbs-up emoji reaction, only then posts to the team channel.
If no reaction within 2 hours, it pings me directly and kills the task. No silent failures.
The human approval isn't optional and it isn't a fallback. It's a required step in the sequence. The workflow can't progress without it.
I got the idea from reading about Microsoft Conductor, which open-sources a similar pattern for multi-agent orchestration. Human approval as a default workflow step, not a retrofit. Their framing stuck with me: designed-in, not bolted-on.
What Actually Broke
I expected the agent to break. It didn't.
I expected me to approve everything in under 5 minutes. I did, mostly.
What I didn't expect: I stopped trusting my own review. The first week, I read every draft carefully. By week three, I was rubber-stamping. My brain had offloaded judgment to "well the agent probably got it right." The approval gate existed, but the actual human review stopped happening.
This is the invisible failure mode nobody talks about. You add a human step. The human shows up. But they're not really there.
The fix was embarrassingly simple: I added friction. Required a comment, not just an emoji. Had to type at least one word before the workflow could advance. Stupid? Maybe. Effective? Completely.
Turns out the veto gate only works if the human has to engage to use it.
Three Things I Didn't Know I Needed
1. Escalation hooks, not just approval gates.
Not all decisions are equal. Minor formatting choices don't need a veto. Anything that posts externally or modifies data does. I ended up building a simple severity classifier: low = auto-approve, medium = soft review prompt, high = hard gate. Saved probably 70% of the friction without sacrificing coverage where it mattered.
2. A timeout that fails loudly.
My 2-hour window was too long. If I'm in back-to-back meetings, the agent just hung. Switched to 30 minutes with an escalating Slack ping. Now if I haven't approved it, I can't miss it.
3. A clear distinction between "irreversible" and "annoying-to-undo."
I started gating everything. Caught myself adding a veto to the agent that sends me my own daily brief. Nobody else sees that. No irreversible action involved. Human gate added zero value there, only friction.
The useful mental model: if I had to undo this action at 11pm on a Friday, would I care? Yes = human gate. No = let it run.
Why This Matters More Than It Looks
Most of the "AI safety" conversation in enterprise is about governance frameworks and audit trails. That's real. But the practical engineering question is simpler:
Which steps in your agent workflow require a human in the loop by design, not by accident?
Design it from the start and the workflow is reliable. Bolt it on after an incident and you're playing catch-up forever.
The Microsoft Conductor open-source was notable to me not for the code but for the default: human approval ON unless you opt out. Most agent frameworks do the opposite. They default to autonomous and assume you'll add guardrails when you need them.
I think that's backwards. Especially for agents touching anything external: posting, sending, modifying.
Where I'm Landing
The veto gate has been running about 6 weeks now. Two incidents caught before they shipped. One case where my review actually improved the draft - not just filtered a bad one. About 3 minutes of daily overhead.
Worth it. But only because I designed the friction deliberately. The version without the comment requirement was almost worse than no gate at all - it gave me false confidence in an approval that wasn't really happening.
If you're building agent workflows that touch anything irreversible, the question I'd ask first: what happens if this runs at 2am and you're asleep? Whatever you wouldn't want to explain the next morning - that's where your human gate goes.
If any of this maps to workflows you're building, curious what the "irreversible action" problem looks like on your end.
Top comments (1)
the case where rubber-stamping kills the gate is the one i didn't think to test. week one you're reading every draft. week three you're clicking approve before you've even opened the preview. curious if others have hit the same problem - what made the review feel worth doing again?