Mykola Kondratiuk

Posted on May 29

I Added a Human Veto to My PM Agent — Here's What Broke First

#ai #vibecoding #agents #discuss

The rubber-stamping trap in manual reviews

Running automation agents for a while now. Most work fine hands-off. But one of them - my project status agent - kept making decisions that felt right in isolation but wrong in context.

So I added a human approval step. Not a "review and confirm" UI widget. An actual veto gate in the workflow itself: the agent drafts the action, pauses, and waits for my explicit go-ahead before doing anything irreversible.

Here's what I didn't expect: the first thing to break wasn't the agent logic. It was my own habits.

The Problem I Was Trying to Solve

My status reporting agent does three things: pulls data from Jira, formats a weekly PM summary, and posts it to the team Slack channel. Straightforward automation.

Except twice in three months it posted something embarrassing. Once it included a stale blocker that had been resolved 48 hours prior. Once it flagged a team member's ticket as overdue when they'd actually shipped early and the tracker hadn't caught up.

Neither was catastrophic. Both were awkward.

The traditional fix would be "add better logic to catch these cases." I tried that. Added freshness checks, added resolved-status validation. Still leaked edge cases.

So I took a different approach: make human review a structural part of the workflow, not a safety net I bolt on when things go wrong.

What I Actually Built

The architecture is boring. Agent generates the Slack message draft, posts to a private review channel, waits for a thumbs-up emoji reaction, only then posts to the team channel.

If no reaction within 2 hours, it pings me directly and kills the task. No silent failures.

The human approval isn't optional and it isn't a fallback. It's a required step in the sequence. The workflow can't progress without it.

I got the idea from reading about Microsoft Conductor, which open-sources a similar pattern for multi-agent orchestration. Human approval as a default workflow step, not a retrofit. Their framing stuck with me: designed-in, not bolted-on.

What Actually Broke

I expected the agent to break. It didn't.

I expected me to approve everything in under 5 minutes. I did, mostly.

What I didn't expect: I stopped trusting my own review. The first week, I read every draft carefully. By week three, I was rubber-stamping. My brain had offloaded judgment to "well the agent probably got it right." The approval gate existed, but the actual human review stopped happening.

This is the invisible failure mode nobody talks about. You add a human step. The human shows up. But they're not really there.

The fix was embarrassingly simple: I added friction. Required a comment, not just an emoji. Had to type at least one word before the workflow could advance. Stupid? Maybe. Effective? Completely.

Turns out the veto gate only works if the human has to engage to use it.

Three Things I Didn't Know I Needed

1. Escalation hooks, not just approval gates.

Not all decisions are equal. Minor formatting choices don't need a veto. Anything that posts externally or modifies data does. I ended up building a simple severity classifier: low = auto-approve, medium = soft review prompt, high = hard gate. Saved probably 70% of the friction without sacrificing coverage where it mattered.

2. A timeout that fails loudly.

My 2-hour window was too long. If I'm in back-to-back meetings, the agent just hung. Switched to 30 minutes with an escalating Slack ping. Now if I haven't approved it, I can't miss it.

3. A clear distinction between "irreversible" and "annoying-to-undo."

I started gating everything. Caught myself adding a veto to the agent that sends me my own daily brief. Nobody else sees that. No irreversible action involved. Human gate added zero value there, only friction.

The useful mental model: if I had to undo this action at 11pm on a Friday, would I care? Yes = human gate. No = let it run.

Why This Matters More Than It Looks

Most of the "AI safety" conversation in enterprise is about governance frameworks and audit trails. That's real. But the practical engineering question is simpler:

Which steps in your agent workflow require a human in the loop by design, not by accident?

Design it from the start and the workflow is reliable. Bolt it on after an incident and you're playing catch-up forever.

The Microsoft Conductor open-source was notable to me not for the code but for the default: human approval ON unless you opt out. Most agent frameworks do the opposite. They default to autonomous and assume you'll add guardrails when you need them.

I think that's backwards. Especially for agents touching anything external: posting, sending, modifying.

Where I'm Landing

The veto gate has been running about 6 weeks now. Two incidents caught before they shipped. One case where my review actually improved the draft - not just filtered a bad one. About 3 minutes of daily overhead.

Worth it. But only because I designed the friction deliberately. The version without the comment requirement was almost worse than no gate at all - it gave me false confidence in an approval that wasn't really happening.

If you're building agent workflows that touch anything irreversible, the question I'd ask first: what happens if this runs at 2am and you're asleep? Whatever you wouldn't want to explain the next morning - that's where your human gate goes.

If any of this maps to workflows you're building, curious what the "irreversible action" problem looks like on your end.

Top comments (9)

Mykola Kondratiuk • May 29

the case where rubber-stamping kills the gate is the one i didn't think to test. week one you're reading every draft. week three you're clicking approve before you've even opened the preview. curious if others have hit the same problem - what made the review feel worth doing again?

Om Shree • May 31

Great Work!

Mykola Kondratiuk • May 31

appreciate it - the rubber-stamp failure hit harder than expected, week 3 approval patterns are really hard to anticipate until they happen

Om Shree • May 31

Exactly,
But hard to anticipation is something that makes it more fun!

Mykola Kondratiuk • May 31

haha yeah, the unpredictability is kind of the whole point when you're stress-testing it. every weird edge case it finds is basically a free test you didn't write.

Om Shree • May 31

True

Mykola Kondratiuk • May 31

yep - every weird failure is just the test suite doing its job.

shogun 444 • May 30

The rubber-stamping point is the most interesting part of this.

Adding a human approval step sounds like a safety improvement, but if the human stops actively reviewing, you've just created the illusion of oversight. The requirement to leave a comment is a clever way to force engagement instead of passive approval.

"Human in the loop" only works when the human is actually in the loop.

Mykola Kondratiuk • May 30

yeah the mandatory comment helps but it's gameable too. we had people typing 'approved' or 'noted' just to clear the queue. what actually slowed the rubber-stamping was making the gate show the delta from the previous version - people felt dumb approving without reading when the diff was right there.