Lars Winstand

Posted on Jun 11 • Originally published at standardcompute.com

I stopped trying to make my agent fully autonomous and made it ask my phone first

#ai #agents #n8n #python

I stopped trying to make my agent fully autonomous and made it ask my phone first

The safest pattern I’ve found for agent workflows is not full autonomy.

It’s a pause-before-action approval step on your phone.

If the agent wants to charge a card, delete records, send a customer message, or change an account setting, it has to stop and ask first.

For custom Python agents, LangGraph already gives you this with interrupt().

For automation teams, n8n can do almost the same thing with a Wait node and a unique $execution.resumeUrl per run.

That one design choice has done more for production safety than any prompt tweak I’ve tried.

The problem with “fully autonomous” agents

A lot of agent demos look great right up until money or external side effects show up.

Draft a reply? Fine.

Summarize tickets? Fine.

Pull context from Notion, HubSpot, Linear, Gmail, and Slack? Great.

But the second the agent wants to do one of these, the mood changes:

send money
delete data
email a customer
change billing settings
publish externally
update production systems

That’s not a toy workflow anymore.

That’s an irreversible action.

And when teams get burned, it usually isn’t because GPT-5, Claude Opus, Grok, or Llama suddenly became useless.

It’s because the workflow had no clean stop condition.

The pattern: ask a human right before the side effect

I like this pattern because it’s boring.

Boring is good when the alternative is “the agent deleted the wrong records at 2:14 AM.”

The idea is simple:

Let the agent do the expensive reasoning.
Let it gather context, draft the action, and prepare the payload.
Pause right before the risky tool call.
Send an approval request to a human.
Resume only if approved.

That gives you a useful middle ground:

not full autonomy
not useless read-only agents
not “hope the evals catch it”

Just a hard boundary before damage can happen.

LangGraph makes this surprisingly clean

If you’re building custom agents, LangGraph already has the right primitive: interrupt().

Minimal example:

from langgraph.types import interrupt

def approval_node(state):
    approved = interrupt({
        "action": "refund_charge",
        "customer_id": state["customer_id"],
        "amount": state["amount"],
        "reason": state["reason"]
    })
    return {"approved": approved}

That pauses execution and waits for a human to resume it.

The important part is that LangGraph can persist state, so this is not a hacky sleep loop.

What you actually need for this to work

You need durable state and a thread ID.

Example:

config = {
    "configurable": {
        "thread_id": "refund-ord-123"
    }
}

And your interrupt payload should be JSON-serializable if you want to send it to:

a mobile app
Slack
Telegram
a custom approval page
SMS + webhook flows

A more realistic sketch:

from langgraph.types import interrupt

def risky_action_gate(state):
    approval = interrupt({
        "type": "approval_required",
        "action": "change_billing_email",
        "account_id": state["account_id"],
        "before": state["current_email"],
        "after": state["proposed_email"],
        "requested_by": state["requested_by"]
    })

    if not approval.get("approved"):
        return {"status": "rejected"}

    return {"status": "approved"}

Then your external approval handler can resume the graph after the human taps approve.

Why `interrupt()` changes the behavior of the whole agent

Before interrupt(), the agent is basically trusted to make the final call.

After interrupt(), the agent becomes a preparer.

That’s a much better role for LLMs in risky workflows.

Let the model:

gather context
decide what action it thinks should happen
build the draft payload
explain why
show a diff

But let a human own the irreversible yes/no.

That split is practical.

n8n can do this without custom agent infrastructure

This is the part more teams should pay attention to.

You do not need to build a full agent runtime to get this pattern.

In n8n, use the Wait node.

For approvals, the useful mode is:

On Webhook Call

The workflow can prepare the action, hit the Wait node, and send the unique {{$execution.resumeUrl}} somewhere a human can tap from their phone.

Examples:

Slack message with Approve / Reject buttons
Telegram bot message
email with approval link
internal mobile-friendly approval page

n8n approval flow example

A practical flow looks like this:

Trigger from support ticket / CRM event / webhook
Use OpenAI-compatible chat model to analyze the request
Build a proposed action
If action is risky, route to Wait node
Send {{$execution.resumeUrl}} to approver
Resume only on approval
Execute Stripe / HubSpot / database / email action

The nice part is that each execution gets its own unique resume URL.

That means multiple runs can pause safely at the same time.

No weird global state.

No polling mess.

No “which request was this approval for?” confusion.

Example: n8n sends a phone approval before a refund

Imagine this workflow:

Customer asks for a refund
AI agent reviews order history and policy
Workflow calculates recommended refund amount
Human gets a phone approval link
Stripe refund only happens after approval

The approval message should include real details, not vague summaries.

Good:

Approve refund?
Customer: cus_123
Order: ord_456
Amount: $84.00
Reason: duplicate charge
Destination: Stripe refund to original payment method

Bad:

Approve customer update?

That second version is how people accidentally approve nonsense.

A quick implementation sketch for n8n

You can wire this up with something like:

AI node or HTTP Request node for model output
IF node to detect risky action
Wait node in webhook mode
HTTP Request / Slack / Telegram node to send approval link
downstream action node after resume

If you’re generating the message in an expression, you might do something like:

Action: refund_charge
Customer: {{$json.customerId}}
Amount: {{$json.amount}}
Reason: {{$json.reason}}
Approve: {{$execution.resumeUrl}}

That’s enough to get a durable human-in-the-loop gate into a real workflow.

Which option should you use?

Option	Best use case
LangGraph interrupts	Custom Python agents that need precise pause/resume behavior before dangerous tool calls
n8n Wait node approvals	Low-code or ops-heavy workflows that need a simple phone-friendly approval step
Slack escalation / human fallback	Cases where the model is uncertain and needs review, not necessarily hard approval

My take:

If you already build agents in Python, use LangGraph.
If your team lives in n8n, use Wait nodes.
If the issue is uncertainty rather than danger, use escalation.

But for payments, deletes, sends, and account changes, I’d default to explicit approval.

Yes, this adds friction. That’s the point.

Developers often treat friction like failure.

For risky actions, friction is a feature.

You want a human to see:

the exact recipient
the amount
the records being deleted
the before/after diff
the destination account
the outbound message text

A 5-second approval pause is much cheaper than a 3-hour cleanup.

The trick is selective approval, not approval everywhere.

Do not make a human approve every low-risk classification or summary.

Do make a human approve things that can cost money, affect users, or break systems.

What still goes wrong

This pattern helps a lot, but it doesn’t solve everything.

The biggest failure mode is bad approval UX.

If the phone prompt is vague, the human is still approving blind.

Your approval request should show:

What action will happen
Who or what it affects
Any amount, destination, or recipient
A preview or diff
Enough context to spot something weird
A clear audit trail

If the human is approving a fluffy model-generated summary instead of concrete facts, that is not oversight.

That is just outsourcing the mistake to a smaller screen.

This also changes the economics of agent workflows

There’s another effect people miss.

Once you add a safe approval boundary, teams usually let agents do more real work before the final click.

That means more:

planning
retries
context gathering
tool orchestration
draft generation
verification passes

In other words: more inference.

And that’s exactly where per-token pricing starts to feel annoying.

If your agents are running all day across support, ops, billing, routing, and account workflows, you end up optimizing around cost instead of usefulness.

That’s why flat-rate compute is a much better fit for this style of automation.

With Standard Compute, you can run OpenAI-compatible agent workflows without babysitting token spend every time the agent needs another reasoning pass, another tool call, or another approval-prep step.

For teams building in n8n, Make, Zapier, OpenClaw, or custom stacks, that matters a lot more than people admit.

Concrete rule I’d use in production

Here’s the rule:

If the task is reversible and low-risk, automate it.
If the task is high-risk and irreversible, pause for approval.
If the task is ambiguous, escalate.

That’s it.

Simple rules beat fancy agent philosophy.

Final take

The best pattern I’ve found for risky agent work is not “make the model smarter until trust appears.”

It’s “make the boundary sharper.”

Put the checkpoint right before the side effect.

Let the agent think.

Let the human approve.

If I were designing a production workflow today in LangGraph, n8n, OpenClaw, Make, or Zapier, phone approval for risky actions would be a default primitive.

Not an enterprise add-on.

Not a future enhancement.

A default.

Because that’s the first agent pattern I’ve used that feels like something I’d actually trust on a Tuesday.

DEV Community

I stopped trying to make my agent fully autonomous and made it ask my phone first

I stopped trying to make my agent fully autonomous and made it ask my phone first

The problem with “fully autonomous” agents

The pattern: ask a human right before the side effect

LangGraph makes this surprisingly clean

What you actually need for this to work

Why `interrupt()` changes the behavior of the whole agent

n8n can do this without custom agent infrastructure

n8n approval flow example

Example: n8n sends a phone approval before a refund

A quick implementation sketch for n8n

Which option should you use?

Yes, this adds friction. That’s the point.

What still goes wrong

This also changes the economics of agent workflows

Concrete rule I’d use in production

Final take

Top comments (0)

I stopped trying to make my agent fully autonomous and made it ask my phone first

The problem with “fully autonomous” agents

The pattern: ask a human right before the side effect

LangGraph makes this surprisingly clean

What you actually need for this to work

Why interrupt() changes the behavior of the whole agent

n8n can do this without custom agent infrastructure

n8n approval flow example

Example: n8n sends a phone approval before a refund

A quick implementation sketch for n8n

Which option should you use?

Yes, this adds friction. That’s the point.

What still goes wrong

This also changes the economics of agent workflows

Concrete rule I’d use in production

Final take

Why `interrupt()` changes the behavior of the whole agent