DEV Community

Cover image for I stopped trying to make my agent fully autonomous and made it ask my phone first
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I stopped trying to make my agent fully autonomous and made it ask my phone first

I stopped trying to make my agent fully autonomous and made it ask my phone first

The safest pattern I’ve found for agent workflows is not full autonomy.

It’s a pause-before-action approval step on your phone.

If the agent wants to charge a card, delete records, send a customer message, or change an account setting, it has to stop and ask first.

For custom Python agents, LangGraph already gives you this with interrupt().

For automation teams, n8n can do almost the same thing with a Wait node and a unique $execution.resumeUrl per run.

That one design choice has done more for production safety than any prompt tweak I’ve tried.

The problem with “fully autonomous” agents

A lot of agent demos look great right up until money or external side effects show up.

Draft a reply? Fine.

Summarize tickets? Fine.

Pull context from Notion, HubSpot, Linear, Gmail, and Slack? Great.

But the second the agent wants to do one of these, the mood changes:

  • send money
  • delete data
  • email a customer
  • change billing settings
  • publish externally
  • update production systems

That’s not a toy workflow anymore.

That’s an irreversible action.

And when teams get burned, it usually isn’t because GPT-5, Claude Opus, Grok, or Llama suddenly became useless.

It’s because the workflow had no clean stop condition.

The pattern: ask a human right before the side effect

I like this pattern because it’s boring.

Boring is good when the alternative is “the agent deleted the wrong records at 2:14 AM.”

The idea is simple:

  1. Let the agent do the expensive reasoning.
  2. Let it gather context, draft the action, and prepare the payload.
  3. Pause right before the risky tool call.
  4. Send an approval request to a human.
  5. Resume only if approved.

That gives you a useful middle ground:

  • not full autonomy
  • not useless read-only agents
  • not “hope the evals catch it”

Just a hard boundary before damage can happen.

LangGraph makes this surprisingly clean

If you’re building custom agents, LangGraph already has the right primitive: interrupt().

Minimal example:

from langgraph.types import interrupt

def approval_node(state):
    approved = interrupt({
        "action": "refund_charge",
        "customer_id": state["customer_id"],
        "amount": state["amount"],
        "reason": state["reason"]
    })
    return {"approved": approved}
Enter fullscreen mode Exit fullscreen mode

That pauses execution and waits for a human to resume it.

The important part is that LangGraph can persist state, so this is not a hacky sleep loop.

What you actually need for this to work

You need durable state and a thread ID.

Example:

config = {
    "configurable": {
        "thread_id": "refund-ord-123"
    }
}
Enter fullscreen mode Exit fullscreen mode

And your interrupt payload should be JSON-serializable if you want to send it to:

  • a mobile app
  • Slack
  • Telegram
  • a custom approval page
  • SMS + webhook flows

A more realistic sketch:

from langgraph.types import interrupt

def risky_action_gate(state):
    approval = interrupt({
        "type": "approval_required",
        "action": "change_billing_email",
        "account_id": state["account_id"],
        "before": state["current_email"],
        "after": state["proposed_email"],
        "requested_by": state["requested_by"]
    })

    if not approval.get("approved"):
        return {"status": "rejected"}

    return {"status": "approved"}
Enter fullscreen mode Exit fullscreen mode

Then your external approval handler can resume the graph after the human taps approve.

Why interrupt() changes the behavior of the whole agent

Before interrupt(), the agent is basically trusted to make the final call.

After interrupt(), the agent becomes a preparer.

That’s a much better role for LLMs in risky workflows.

Let the model:

  • gather context
  • decide what action it thinks should happen
  • build the draft payload
  • explain why
  • show a diff

But let a human own the irreversible yes/no.

That split is practical.

n8n can do this without custom agent infrastructure

This is the part more teams should pay attention to.

You do not need to build a full agent runtime to get this pattern.

In n8n, use the Wait node.

For approvals, the useful mode is:

  • On Webhook Call

The workflow can prepare the action, hit the Wait node, and send the unique {{$execution.resumeUrl}} somewhere a human can tap from their phone.

Examples:

  • Slack message with Approve / Reject buttons
  • Telegram bot message
  • email with approval link
  • internal mobile-friendly approval page

n8n approval flow example

A practical flow looks like this:

  1. Trigger from support ticket / CRM event / webhook
  2. Use OpenAI-compatible chat model to analyze the request
  3. Build a proposed action
  4. If action is risky, route to Wait node
  5. Send {{$execution.resumeUrl}} to approver
  6. Resume only on approval
  7. Execute Stripe / HubSpot / database / email action

The nice part is that each execution gets its own unique resume URL.

That means multiple runs can pause safely at the same time.

No weird global state.

No polling mess.

No “which request was this approval for?” confusion.

Example: n8n sends a phone approval before a refund

Imagine this workflow:

  • Customer asks for a refund
  • AI agent reviews order history and policy
  • Workflow calculates recommended refund amount
  • Human gets a phone approval link
  • Stripe refund only happens after approval

The approval message should include real details, not vague summaries.

Good:

Approve refund?
Customer: cus_123
Order: ord_456
Amount: $84.00
Reason: duplicate charge
Destination: Stripe refund to original payment method
Enter fullscreen mode Exit fullscreen mode

Bad:

Approve customer update?
Enter fullscreen mode Exit fullscreen mode

That second version is how people accidentally approve nonsense.

A quick implementation sketch for n8n

You can wire this up with something like:

  • AI node or HTTP Request node for model output
  • IF node to detect risky action
  • Wait node in webhook mode
  • HTTP Request / Slack / Telegram node to send approval link
  • downstream action node after resume

If you’re generating the message in an expression, you might do something like:

Action: refund_charge
Customer: {{$json.customerId}}
Amount: {{$json.amount}}
Reason: {{$json.reason}}
Approve: {{$execution.resumeUrl}}
Enter fullscreen mode Exit fullscreen mode

That’s enough to get a durable human-in-the-loop gate into a real workflow.

Which option should you use?

Option Best use case
LangGraph interrupts Custom Python agents that need precise pause/resume behavior before dangerous tool calls
n8n Wait node approvals Low-code or ops-heavy workflows that need a simple phone-friendly approval step
Slack escalation / human fallback Cases where the model is uncertain and needs review, not necessarily hard approval

My take:

  • If you already build agents in Python, use LangGraph.
  • If your team lives in n8n, use Wait nodes.
  • If the issue is uncertainty rather than danger, use escalation.

But for payments, deletes, sends, and account changes, I’d default to explicit approval.

Yes, this adds friction. That’s the point.

Developers often treat friction like failure.

For risky actions, friction is a feature.

You want a human to see:

  • the exact recipient
  • the amount
  • the records being deleted
  • the before/after diff
  • the destination account
  • the outbound message text

A 5-second approval pause is much cheaper than a 3-hour cleanup.

The trick is selective approval, not approval everywhere.

Do not make a human approve every low-risk classification or summary.

Do make a human approve things that can cost money, affect users, or break systems.

What still goes wrong

This pattern helps a lot, but it doesn’t solve everything.

The biggest failure mode is bad approval UX.

If the phone prompt is vague, the human is still approving blind.

Your approval request should show:

  1. What action will happen
  2. Who or what it affects
  3. Any amount, destination, or recipient
  4. A preview or diff
  5. Enough context to spot something weird
  6. A clear audit trail

If the human is approving a fluffy model-generated summary instead of concrete facts, that is not oversight.

That is just outsourcing the mistake to a smaller screen.

This also changes the economics of agent workflows

There’s another effect people miss.

Once you add a safe approval boundary, teams usually let agents do more real work before the final click.

That means more:

  • planning
  • retries
  • context gathering
  • tool orchestration
  • draft generation
  • verification passes

In other words: more inference.

And that’s exactly where per-token pricing starts to feel annoying.

If your agents are running all day across support, ops, billing, routing, and account workflows, you end up optimizing around cost instead of usefulness.

That’s why flat-rate compute is a much better fit for this style of automation.

With Standard Compute, you can run OpenAI-compatible agent workflows without babysitting token spend every time the agent needs another reasoning pass, another tool call, or another approval-prep step.

For teams building in n8n, Make, Zapier, OpenClaw, or custom stacks, that matters a lot more than people admit.

Concrete rule I’d use in production

Here’s the rule:

  • If the task is reversible and low-risk, automate it.
  • If the task is high-risk and irreversible, pause for approval.
  • If the task is ambiguous, escalate.

That’s it.

Simple rules beat fancy agent philosophy.

Final take

The best pattern I’ve found for risky agent work is not “make the model smarter until trust appears.”

It’s “make the boundary sharper.”

Put the checkpoint right before the side effect.

Let the agent think.

Let the human approve.

If I were designing a production workflow today in LangGraph, n8n, OpenClaw, Make, or Zapier, phone approval for risky actions would be a default primitive.

Not an enterprise add-on.

Not a future enhancement.

A default.

Because that’s the first agent pattern I’ve used that feels like something I’d actually trust on a Tuesday.

Top comments (0)