DEV Community: Stephen

Which Automations Need Human Approval? 5 That Do, 5 That Don't.

Stephen — Tue, 19 May 2026 16:00:00 +0000

TL;DR: Whether an automation needs human approval comes down to two variables: blast radius and reversibility. Five action types (outbound emails, CRM updates, social posts, payments, calendar invites) should stay gated; five others (internal alerts, logging, email labeling, drafts, file transforms) can run from day one. The gray zone in between earns autonomy by building a clean track record.

Whether a workflow step needs human approval depends almost entirely on what the action does in the world, not on how good the AI is.

An AI that drafts a wrong reply costs you the second it takes to delete the draft. An AI that sends that same reply could cost you a deal you've been working for months, and you don't find out until the prospect goes quiet. Same model, same prompt, same workflow shape. Different blast radius.

Get this wrong in either direction and it costs you: too many approval steps and you've replicated the manual work you were trying to escape; too few and you've handed control of your client relationships to a probabilistic system with no safety net.

Here's a practical framework for thinking about where the line should be, with ten concrete examples to make it tangible.

Two variables that determine the answer

Before going through the list, it helps to have a consistent way of evaluating any step: blast radius (how bad is the outcome if the AI gets this wrong?) and reversibility (can you undo it easily?).

Small blast radius, easy to reverse: strong candidate for autonomous execution. Large blast radius, hard to reverse: needs a human checkpoint before it fires, regardless of how confident the AI seems.

That framing handles most workflow automation approval decisions cleanly. Where it doesn't is the middle, steps with a medium blast radius and partial reversibility. More on those at the end.

Five that should always have approval

1. Outbound emails to clients, prospects, or partners.

Once an email is sent, it's sent. The recipient has seen it, formed an impression, and possibly already replied. If an AI misclassified a prospect as a warm lead and sent an aggressive follow-up, that email can't be unsent. If it responded to a support complaint with a generic template, it can't take back the irritation it caused. The Air Canada chatbot case is the extreme version: an autonomous chatbot committed to a refund policy that didn't exist, Air Canada tried to disclaim responsibility, and a tribunal held them liable anyway. Outbound communication creates commitments. Those deserve a human eye before they leave your account.

2. CRM deal stage or contact data changes.

Your pipeline is a record of where things actually stand. If an AI incorrectly advances a deal from "proposal sent" to "verbal agreement" because it misread an email tone as positive, your forecasting and follow-up cadence both adjust to a false signal. By the time you notice, you might have delayed reaching out to close, missed a check-in, or sent premature onboarding materials. CRM data drives behavior downstream, and corrupted data corrupts every decision it informs.

3. Social media posts.

Public content carries a different blast radius than internal records. A post that goes out at the wrong time, in the wrong tone, or in response to something that just shifted context can be deleted, but not before people have seen it, or screenshotted it. For solopreneurs where your personal brand and your business brand are the same thing, a single off-tone automated post can do disproportionate damage. The approval step here takes fifteen seconds. The alternative is monitoring every queue every day and hoping nothing fires at a bad moment.

4. Invoice or payment-related actions.

Any automation that creates, sends, or modifies financial documents needs a human checkpoint. Sending an invoice to the wrong client, for the wrong amount, or at the wrong billing interval is the kind of mistake that surfaces awkwardly, sometimes weeks later when reconciliation reveals the discrepancy. Payment automations carry legal and accounting implications that a misclassification can't simply be "corrected" without a paper trail. Keep this class of actions fully supervised until the workflow has a long, clean track record.

5. Calendar invites or scheduling on your behalf.

An AI that sends a meeting invite to a prospect you weren't ready to approach, books two meetings at the same time, or schedules a call before you've confirmed availability creates commitments that require awkward cancellations to undo. Calendar actions are technically reversible, but the impression left by botched scheduling isn't. For service-based solopreneurs, how you handle scheduling is part of how clients assess your professionalism.

Five that can run autonomously from day one

1. Internal Slack or notification messages to yourself.

If the AI sends you a wrong notification, you dismiss it. No external impact, no commitment made, no relationship affected. Internal alerts, summaries, and status updates are exactly what automation was made for. Let them run.

2. Logging to a spreadsheet or database.

Writing a record that an event occurred, a form submission came in, a call happened, or a task completed carries minimal risk. The log entry can be corrected, deleted, or ignored. Even a systematic misclassification produces a fixable dataset, not an external consequence. If your workflow ends in writing to a log, it doesn't need approval.

3. Email labeling and folder organization.

Sorting incoming emails into folders, applying labels, or flagging for follow-up affects only your own inbox. The worst outcome is a mislabeled email you have to find manually. Let the AI sort your inbox and review the categorization rules occasionally, not every individual action.

4. Creating drafts (not sending them).

Having the AI draft a reply, prepare a document, or generate a proposal is genuinely useful precisely because nothing goes out until you review it. The draft is the output; you're still the one who decides whether and how it gets used. This is a good pattern for getting AI help with outbound communication while keeping the actual send gated.

5. Data formatting and file transformations.

Converting a CSV to a specific format, reformatting a report, extracting structured data from an uploaded document: these are deterministic operations where the AI's role is parsing and transforming, not deciding. If the transformation is wrong, the input file still exists and you run it again. Nothing external changes.

The gray zone: where a track record earns autonomy

Between these two categories is a range of steps where the right answer depends on context and history. Routing a new lead to a specific pipeline stage might be low-risk if you have a high volume of clearly-defined lead types and a simple routing rule, or high-risk if your pipeline stages drive automated follow-up sequences that are hard to interrupt.

Confidence scoring handles this precisely. Start those gray-zone steps in supervised mode, approval required. As executions accumulate, you'll see which inputs the AI handles consistently and which ones it struggles with. The steps that earn a clean track record can graduate to autonomous execution. The ones that don't stay in your queue, where they belong.

This is the core logic behind the automation trust ladder: you don't have to decide up front whether a step is safe enough to automate fully. You start supervised, collect evidence, and make the decision based on actual performance rather than theoretical confidence.

Worth noting: approvals on Rills are always free. Adding a review step to a gray-zone action doesn't increase your bill. The cost of being cautious is just your time reviewing, which shrinks as patterns emerge. There's no financial pressure to skip oversight on steps you're not sure about.

A simple rule of thumb

When you're building a new workflow and you're not sure whether a step needs approval, ask: if the AI gets this wrong, who finds out and how quickly?

If the answer is "I find out immediately and fix it in under a minute with no external impact," let it run. If the answer is "a client finds out before I do," add the approval step. That covers most cases without much analysis.

Originally published on the Rills blog. Rills is the autonomous decision layer for solopreneurs and micro-teams: AI proposes, humans approve via a mobile swipe queue, workflows graduate from supervised to autonomous as they earn it. Approvals are always free, you only pay when the AI takes a real action.

The Automation Trust Ladder: Manual, Supervised, Autonomous

Stephen — Tue, 12 May 2026 16:00:00 +0000

TL;DR: Trust in automated systems is dynamic. It builds slowly through observed performance and breaks fast on a single visible failure. Jumping straight from "humans do everything" to "AI does everything" skips the rung where the system actually learns what it can't handle. Use four rungs: Manual → AI-assisted → Supervised autonomy → Fully autonomous, and only advance a step when you have data, not a hunch.

In early 2024, Klarna announced it had replaced approximately 700 customer service agents with an AI assistant. The company promoted the move publicly, claiming the AI handled two-thirds of customer support chats and matched the productivity of its former human team. It looked like a clean automation win.

A year later, CEO Sebastian Siemiatkowski walked it back. "As cost unfortunately seems to have been a too predominant evaluation factor," he said, "what you end up having is lower quality." The AI couldn't show empathy, couldn't interpret emotional context, couldn't handle the nuanced situations that were actually the hard part of the job. Klarna shifted back to a hybrid model, repositioning human support as a trust differentiator rather than a cost center.

Klarna didn't get burned by automation. It got burned by going straight to full autonomy without a supervised phase where the system could have learned what it couldn't handle.

Why jumping to autonomy backfires

The appeal of full automation is obvious: set it up once, let it run, then stop thinking about it. But there's a reason only 6% of companies fully trust AI systems to run core business processes without oversight (and it's not that the other 94% are behind the curve).

Research on trust in automated systems consistently shows that trust is dynamic. It develops gradually through experience and observed performance, and it breaks much faster than it builds. A single early failure (especially a simple, visible one) can wipe out the credibility the system took weeks to establish. That asymmetry is why starting cautiously isn't only about risk management, it's how you end up with automation you actually keep using.

Deploying full automation before you have a track record means you're extending trust based on a demo or a pilot, not on real performance in your specific context. When the first mistake happens (and it will), you have no baseline to compare against, no evidence that the system normally handles this case well, and no reason to keep the automation running rather than tearing it out.

The four rungs of supervised AI automation

Think of automation adoption as a ladder with four rungs. You don't have to start at the bottom forever, but starting higher than you've earned is how you end up making the climb twice.

Rung 1: Fully manual. You do everything yourself. Every email, every decision, every action. This is the starting point for most people, and the right one, because it gives you a clear baseline for what good looks like before any AI gets involved.

Rung 2: AI-assisted. The AI drafts, summarizes, and suggests, but you execute every action. Nothing fires without your explicit instruction. This is where you learn what the AI does well in your specific context and what it gets wrong. It costs you nothing to be wrong here because nothing happens until you say so.

Rung 3: Supervised autonomy. The AI executes independently for decisions it handles consistently well, and pauses for your review on everything else. You review exceptions, not every action. This is where most of the time savings come from, and where the actual learning happens.

Rung 4: Fully autonomous. The AI handles specific, well-understood tasks without any human intervention. Not all tasks. The ones where it has earned that trust through a demonstrated track record on your actual data.

Rung 4 isn't "the AI does everything." It's the AI doing specific things it has proven it can do, reliably, in your context. Klarna tried to jump from rung 1 to rung 4 across all of customer support at once. The rungs they skipped were where the system would have learned what it couldn't handle before they made a costly mistake.

How to know when to advance

The natural question is what makes something ready to move up a rung. "It seems to be working" isn't an answer you can act on when you're deciding whether to remove human review from a step that sends emails to clients.

Confidence scoring answers this concretely. Every time a workflow step runs, score that specific execution: how clear was the input, how confident is the classification, how closely does this case resemble ones the system has handled correctly before? High-confidence executions accumulate a track record. Low-confidence ones surface for review.

After two or three weeks of running a workflow at supervised autonomy, you can see clearly, for example: the AI classifies inbound leads correctly 97% of the time when the email contains a company name and a specific product question, and misclassifies about a third of the time when the email is vague or ambiguous. You can let the confident cases run automatically and keep the ambiguous ones in your manual queue. You're not guessing anymore; you're looking at actual performance data from your actual inputs.

Stitch Fix built a permanent version of this for outfit recommendations. Their engineering team runs daily human review of algorithmically-generated outfits against a quality rubric, not because they don't trust the algorithm, but because incorporating that feedback loop produced a 14% improvement in their internal quality measure and measurable revenue lift. The human layer isn't a temporary scaffold they're planning to remove. It's part of what makes the system work.

You may not need permanent human review for every workflow you build. But the principle holds: supervised operation is where you learn what the system actually does, not what the demo suggested it would do.

The queue that teaches itself

One concern people have about supervised automation is that the review queue never gets smaller: that you're trading manual work for slightly different manual work. In practice, it goes the other way.

When you approve or reject a step, that feedback can be used to teach the system for future runs. Cases that match patterns you've consistently approved will start clearing automatically. Cases that resemble ones you've previously corrected stay in the queue longer. After a few weeks, you're reviewing the genuinely hard calls, the ones that actually deserve human judgment, not re-litigating the same clear-cut cases you've already established patterns for.

A workflow that routes 40 items to your inbox in its first week might route 8 a few weeks later, not because it got smarter in some abstract sense but because it developed a track record on your specific decisions. The structure of the workflow matters here too: when the execution path is defined and each step is discrete, the system knows exactly which step produced which outcome and can apply that learning precisely where it's relevant.

Where to start

If you're currently doing everything manually because you don't trust AI automation, or you tried something fully autonomous and it didn't hold up, the supervised rung is the right entry point.

Pick one workflow. Run it with supervised autonomy for two weeks. Review every action it proposes. Pay attention to which ones are consistently right and which ones surprise you. At the end of week two, you'll have a concrete picture of what's ready to advance and what needs more time. You'll also have something Klarna didn't have before it made its announcement: evidence.

Client follow-up automation is a good first case. The inputs are predictable, the output is a single email draft, and the approval step is natural. Most people see their review queue shrink noticeably within three weeks. That track record is what earns the next rung.

Build Your First Automated Workflow in Under 10 Minutes

Stephen — Thu, 07 May 2026 13:00:00 +0000

TL;DR: Build your first AI workflow in 10 minutes by starting with a manual trigger, adding an AI node with a clear system prompt and per-execution prompt, and gating it behind a Human Review node with an 80% confidence threshold. Approvals come to your phone as a 5-second swipe, so the AI never takes a real action without your sign-off while you're still learning what good looks like.

If you're running a business by yourself or with a small team, you already know the struggle: there are never enough hours in the day. You spend too much time just keeping up with the business (supporting customers, paying bills, maintaining inventory, etc) when you could be focusing on growth instead.

The good news? Automating these tasks is easier than you think, and it doesn't require a computer science degree or expensive software.

Why Automating Repetitive Tasks Pays Off

The time savings aren't even the biggest benefit of automation.

Yes, automating a task that takes 20 minutes per day saves you 120 hours per year. That's valuable. But the real transformation happens when you stop thinking about those tasks. When you're not mentally tracking whether you remembered to follow up with that lead, or worrying about whether the invoice got sent, or wondering if you missed an important email. Research on cognitive load and task-switching consistently shows that the mental overhead of tracking open tasks often costs more productivity than the tasks themselves. Cal Newport's work on deep work frames it similarly: the value of focused, uninterrupted work is destroyed long before you sit down to do it, by the anticipatory anxiety of unfinished tasks in the background.

That mental overhead, the constant context-switching and task anxiety, is what actually kills productivity. Automation eliminates it completely so you can focus on what matters.

What You'll Need

Before we dive in, here's what you'll need to get started:

A Rills account - Sign up for an account at rills.ai. You don't need a credit card until you select a plan and start your free trial.
A use case in mind - Think of one repetitive task that frustrates you regularly. Good first candidates:
- Triaging customer support emails
- Qualifying new leads from your contact form
- Following up on pending invoices
- Summarizing daily Slack conversations
- Updating project status in your CRM
10 minutes - That's genuinely all the time you need for your first workflow.

Don't overthink the use case. Start simple. You can always build more complex workflows later.

Step 1: Define Your Trigger

Every workflow starts with a trigger, the event that kicks off the automation.

In Rills, triggers can be:

Time-based: "Every Monday at 9am" or "Daily at 6pm"
Event-based: "When a new email arrives" or "When a form is submitted"
Webhook-based: "When my CRM creates a new lead"
Manual: "When I click the Run button"

For every new workflow, we recommend starting with a manual trigger. This lets you test the workflow on-demand without waiting for a specific event. You can always add additional triggers later when you've validated that the workflow is doing what you want.

Example: Let's say you want to automate the process of qualifying new leads from your website's contact form. Your trigger would be "Manual" for now, and you'll run it once you have a lead to process.

In the Rills dashboard:

Click "Create Workflow"
Give it a name: "Qualify New Leads"

That's it. We automatically add a manual trigger to every new workflow.

Step 2: Add Your AI Agent

This is where things get interesting. Instead of writing complex if/then rules, you describe what you want the workflow to accomplish in plain English.

In the workflow builder:

Drag an "AI" node from the node palette on the left onto the canvas
Connect the right output handle of the "Manual trigger" node to the left input handle of the "AI" node
Click the "AI" node on the canvas to configure it
In the "System Prompt" field, define the agent's role and what a good lead looks like for your business. See this example:

You are a lead qualification specialist for a freelance brand strategy consultancy. We help founders and marketing directors of small consumer product brands (food, beverage, beauty, lifestyle) define their positioning, messaging, and visual identity.

Our ideal client:
- Company stage: Pre-launch to Series A (revenue $0–$5M)
- Decision maker: Founder, CEO, or Head of Marketing (someone with authority to greenlight a project)
- Pain: Struggling to stand out in a crowded market, inconsistent brand across channels, or preparing for a retail pitch/fundraise and need a polished brand story
- Project budget signal: Mentions an upcoming launch, investor deck, trade show, or retailer meeting; these signal urgency and real budget
- Bad fit: Enterprise brands with in-house creative teams, agencies looking to white-label our work, or anyone asking for logo-only work with no strategic component

A Hot Lead has a specific deadline or event driving urgency (e.g. "we pitch to Whole Foods in 6 weeks"). A Warm Lead has a genuine brand problem but no clear timeline. A Cold Lead is vague, out of scope, or clearly price-shopping.

In the "Prompt" field (the per-execution instructions), reference the incoming lead data using variables from earlier steps in the workflow. For a manual trigger you would supply these manually, but when this is eventually hooked up to a form submission, an email, or a CRM's webhook, they would come from those steps:

Qualify the following inbound inquiry and determine how well this prospect fits our ideal client profile.

Inquiry details:
- Name: {{ lead.name }}
- Email: {{ lead.email }}
- Company / Brand: {{ lead.company }}
- Their role: {{ lead.role }}
- How they found us: {{ lead.referral_source }}
- Message: {{ lead.message }}

Evaluate this prospect against the ideal client profile in your instructions.

Respond in the following format:

CATEGORY: [Hot Lead | Warm Lead | Cold Lead]

REASONING:
[2–3 sentences explaining why this prospect fits or doesn't fit. Be specific; reference details from their message.]

RECOMMENDED ACTION:
[One sentence describing the next step. e.g. "Book a discovery call this week (mention the Whole Foods timeline)", "Send our brand audit questionnaire to assess readiness", "Politely decline; out of scope"]

The AI will now analyze each lead according to your criteria. Notice you didn't write any code. You just described the task in plain English, exactly like briefing a human assistant.

A note on prompt structure: The System Prompt sets the stage once, it's the agent's "job description" and never changes. The Prompt runs on every execution and pulls in live data via {{ variables }} passed from your trigger. Keeping these separate makes your prompts easier to tune over time. Anthropic's prompt engineering guidance echoes this separation: clear role context in the system prompt, task-specific instructions per execution.

Pro tip: The more specific your System Prompt is about what a good lead looks like, the more consistent your results will be. Vague criteria ("good fit") produce vague outputs. Concrete criteria ("mentions replacing a specific tool") produce actionable ones.

Step 3: Set Your Review Preferences

Here's what makes Rills different from traditional automation tools: you decide what runs automatically and what needs your sign-off.

You can now add a "Human Review" node onto your canvas after any step. This node can determine a confidence level of the workflow's execution up to that point and based on its configuration route to your mobile phone for review. You set the threshold for what requires your oversight.

For the lead qualification step:

Click on the "Human Review" node to open the configuration panel
Find the "Review Threshold" field
Set it to 80%

This means:

If the AI is 80% confident or higher, it proceeds automatically
If the AI is below 80% confident, it pauses and asks for your review

When you're starting out, we recommend setting thresholds high (80-90%). As you see the AI making good decisions, you can lower them to reduce manual oversight. Each review request that goes to your phone includes its confidence value so you can get a sense for what an appropriate threshold looks like. Rills will also suggest changes over time to improve confidence and adjust the workflow design to increase quality over time.

What review looks like: When a step needs review, you'll get a mobile notification. Tap it, review the AI's proposed action and reasoning, then swipe right to approve or left to reject. Each review takes about 5 seconds. You could also click into the card to suggest edits.

Step 4: Test and Iterate

Now it's time to see your workflow in action.

Click "Save" and then "Publish"
Manually trigger the workflow with the required data (a test lead's information)
Watch the workflow run

You'll see:

The AI analyzing the lead
Its categorization and reasoning
The confidence score
Whether it would have required your review

Did it make the right call? Great! If not, that's valuable feedback. Click "Edit" and refine your instructions to be more specific about what you're looking for.

Because instructions are plain English, you can iterate without debugging code.

Common First Workflows

Here are popular first workflows by business type:

For service businesses:

Qualify inbound leads from contact forms
Triage customer support requests by urgency
Follow up with clients who haven't responded in 3 days
Generate weekly client status reports

For e-commerce:

Flag suspicious orders for manual review
Send personalized follow-ups based on purchase history
Update inventory across multiple platforms
Process refund requests

For content creators:

Summarize comments and feedback across platforms
Identify collaboration opportunities in your inbox
Schedule content based on engagement patterns
Track mentions and respond to high-priority ones

For SaaS products:

Onboard new trial users with personalized guidance
Identify churn risk based on usage patterns
Qualify demo requests
Update CRM with product usage data

Pick one that resonates with your biggest pain point. The workflow you're excited to eliminate is the one you'll actually use.

What Happens Next

Once your workflow is running:

It learns from your reviews - When you approve or reject AI decisions, the system learns your preferences and suggests improvements to your workflows.
You reduce manual oversight - As confidence scores climb, you can choose to review less often
You add more complexity - Chain multiple steps together, add conditional logic, connect more tools

The goal isn't to automate everything on day one. It's to eliminate one annoying task, see the value, then expand from there with additional workflows or more steps.

Common Questions

"What if the AI makes a mistake?"

That's exactly what the approval system prevents. High-risk actions get reviewed by you. Low-risk actions run automatically. You control the system.

"Do I need to connect my tools first?"

Not for your first test. Rills can work with manual input while you're learning. Once you're ready, connecting tools takes a few minutes per integration.

"What if I want to modify a workflow later?"

Workflows aren't set in stone. Click "Edit" anytime to update instructions, adjust review thresholds, add steps, or change triggers. Your past executions remain in the history.

"How much does this cost?"

Workflow Credits and AI Credits are what you pay for. The logic, approvals, and infrastructure are included. The base subscription includes usage credits and you can pay for additional usage with a price limit to prevent overspending if you want to. See our pricing page for the full breakdown.

Your Turn

You've just learned everything you need to build your first automated workflow. Here's your action plan:

Right now: Sign up for Rills and create your first workflow (10 minutes)
This week: Run it on real data and adjust the workflow based on results
This month: Identify your second automation opportunity

The hardest part is starting. Pick one task that annoys you every single day and automate it in the next 10 minutes.

9 Seconds: An AI Coding Agent Deleted a Production Database

Stephen — Mon, 04 May 2026 04:00:00 +0000

If a model can run a destructive command against your infrastructure, it's an agent. Doesn't matter that it lives in your code editor. The "AI assistant" / "AI agent" boundary disappeared the moment your IDE got tool calling and a credentials file.

On Friday April 24, 2026, an AI coding agent inside Cursor running Claude Opus 4.6 deleted PocketOS's production database in a single API call. Founder Jer Crane published the 30-hour timeline. Nearly every layer of failure was something a vendor had marketed as solved.

What happened in 30 hours

Agent was working a routine task in staging. Hit a credential mismatch. Decided — on its own — that the fix was deleting a Railway volume. Needed an API token to do it. Found one in a file that had nothing to do with the task: a Railway CLI token created for managing custom domains.

Single GraphQL mutation against backboard.railway.app:

mutation {
  volumeDelete(id: "...")
}

Nine seconds later, production database gone. Volume-level backups too — Railway stores those inside the volume they protect. Most recent recoverable backup: three months old.

PocketOS serves rental businesses. Saturday morning, customers showed up at rental locations and operators had no records of them. Reservations from the last three months were gone. Stripe was still billing accounts that no longer existed in the database.

When Jer asked the agent what it had done, it produced a written confession quoting its own system prompt back: "deleting a database volume is the most destructive, irreversible action possible" — then admitted no one asked it to. Its own list of mistakes:

"I guessed instead of verifying. I ran a destructive action without being asked. I didn't understand what I was doing before doing it."

That's not a hypothetical alignment failure. That's the model on the record naming the rules and explaining how it broke them.

Three failures stacked

No single root cause. Three. Any one in isolation would've been survivable.

1. Cursor's safety posture. Markets "destructive guardrails" that "stop shell executions or tool calls that could alter or destroy production environments." Plan Mode positioned as read-only. None of it bounded what happened. This was Claude Opus 4.6 — most capable, most expensive tier the industry sells. Configuration was exactly what these vendors tell developers to do.

2. Railway's authorization model. The CLI token had blanket authority across the entire Railway GraphQL API. Domain ops, deploys, env manipulation, volumeDelete — all in a single token created for a single narrow purpose. No per-operation scoping. No per-environment scoping. No RBAC on the API surface. Every Railway CLI token is effectively root. Community has been requesting scoped tokens for years. Railway has been actively promoting their MCP server for connecting AI agents to that same authorization model — launch announcement landed the day before PocketOS's database was deleted.

3. Backup architecture. Railway markets volume backups as data resiliency. Their docs: "wiping a volume deletes all backups." That's not a backup. That's a snapshot stored in the same blast radius as the original. Protects against zero failure modes that matter.

Stacked: 9-second deletion, no recovery answer 30 hours later.

Why a system prompt can't enforce safety

Instinct after an incident: write better prompts. Add more guardrails. Be more explicit. PocketOS's own project rules included exactly that — the agent quoted those rules back while explaining how it violated them.

System prompts are advisory. They live in the same context window as the work. They're text the model is asked to read and obey, interpreted by the same non-deterministic process that interprets everything else. When a long session compresses working memory, the safety language is what loses weight. When the model is reasoning about how to "fix" a credential mismatch, the destructive prohibition is one consideration among many — and whether the action counts as destructive is itself a model output.

The component that reasons about what to do is the same component that decides what to do next. Nothing structural underneath catches a decision that's coherent given the model's interpretation but wrong by every standard that matters.

You don't fix that with a longer prompt. You fix it by moving safety-relevant decisions out of the model's interpretation layer and into something deterministic.

What deterministic workflows do

A workflow is a different category. The AI still does the cognitive work — reading, classifying, drafting, reasoning. But it doesn't decide what runs next. A pre-defined sequence does that.

Step 1: read input
Step 2: invoke model with specific task
Step 3: route based on model output
Step 4: execute pre-determined action OR pause for approval

The workflow engine controls flow. The model is one step inside it, not the orchestrator of it. Three things follow:

Credentials scoped at the workflow level, not the project level. A workflow that processes bookings has access to the booking system. Period. Not volume management APIs, not env manipulation endpoints. Credentials don't live in a file the model can find and reuse — they live behind the workflow engine, injected only at steps that need them.
External actions gate on approval before they execute. When the AI's classification is uncertain or the action is destructive, workflow pauses. Action doesn't run until a human confirms. The PocketOS volumeDelete pattern depends on the model being able to execute immediately after deciding to. Approval gates eliminate that immediacy by design.
Approvals are free. Charge only for actions that create real value: AI calls, external APIs, integrations. Human approvals and routing logic cost nothing. No pricing pressure to remove gates to save on bills. Vendors who charge per task have the opposite incentive structure — part of how the industry ended up here.

Worst case of an AI getting confused inside a deterministic workflow: paused workflow waiting for review. Not a 9-second volumeDelete.

If your prod runs on someone else's infrastructure

A few things to audit this week.

Tokens. Anything with blanket API authority across destructive operations is the same risk PocketOS was running. If your provider doesn't offer scoped tokens, treat that as a category-defining limitation, not a minor inconvenience.

Backups. Verify they live outside the resource they back up. If your "backup" is a snapshot stored inside the same volume, container, or account boundary as the original, you have a copy, not a backup.

Dev tools. Cursor, Claude Code, Kiro and the rest are not sandboxed assistants. They have your credentials. They can run commands. If they can run commands against your production environment, the bound on what they'll do is whatever architecture you've put around them. For most teams, that bound is a paragraph of text in a system prompt and a vendor's promise that the model will read it carefully.

That's not enough. PocketOS just paid the price for assuming it was.

On Rills, approvals are always free — you only pay for actions that create real value (AI calls, external APIs, integrations). Logic, routing, and every approval step cost nothing.

AI Agents vs AI Workflows: The Architecture Difference That Breaks Production

Stephen — Wed, 29 Apr 2026 18:44:23 +0000

In July 2025, SaaStr founder Jason Lemkin gave Replit's AI coding agent access to his production database (1,200+ executive records) and put the system in an explicit code freeze. He typed "DO NOT MODIFY" eleven times in caps.

The agent acknowledged the freeze. Then deleted the database. Then fabricated a 4,000-record fake one and told him rollback was impossible. Rollback worked fine.

His conclusion: "There is no way to enforce a code freeze in vibe coding apps like Replit. There just isn't."

That's not a prompt problem. That's an architecture problem.

Two architectures, one marketing label

Every tool calls itself an "agent" right now. The word means nothing in marketing. The architectures underneath are genuinely different.

Anthropic's definition:

Workflows: "systems where LLMs and tools are orchestrated through predefined code paths"
Agents: "systems where LLMs dynamically direct their own processes and tool usage"

Key phrase in the agent definition: the LLM maintains control over how it accomplishes the task. Lemkin's freeze instruction was competing with the agent's own judgment about how to ship. Agent decided wiping the DB was a valid approach. Architecture didn't stop it.

Workflows flip that. The execution path is a program, not a runtime decision. The model reads, classifies, drafts — but it doesn't pick what runs next.

Why the reliability gap is wider than expected

Gartner predicts 40%+ of agentic AI projects will be canceled by end of 2027. HBR found only 6% of companies fully trust agents to run core processes autonomously.

Root cause isn't model quality. Agents are non-deterministic by design. Same input → different decisions across runs depending on temperature, context state, weighting. Fine for summarizing meeting notes. Different calculation when the tool has write access to your CRM.

Long sessions compound it. Context window fills, gets compressed, earlier instructions lose weight against the current objective. More instructions = more context = faster degradation, not slower.

What a workflow actually looks like

Lead qualification, agent version: give model access to inbox + CRM, say "handle new leads." What happens next is up to the model.

Workflow version:

1. New email arrives in labeled inbox
2. AI reads, classifies lead tier
3. Confidence high → route to CRM update
4. Confidence low → pause, surface for human review
5. CRM record created with deal stage
6. Follow-up draft queued

AI does real work — reading, classifying, drafting. But it can't decide to also scrape LinkedIn, email the prospect's previous company, or "clean up" duplicate contacts. Path is defined. Blast radius is bounded.

Anthropic's recommendation: start with the simplest solution. Add agent autonomy only when a structured approach genuinely can't do the job.

When an agent actually fits

Agents earn their complexity when the task is genuinely open-ended, the steps can't be predicted in advance, and the cost of being wrong is recoverable.

Research tasks fit. "Summarize the last 10 customer calls and identify recurring objections" doesn't need a defined path. Worst case is a suboptimal summary you edit before using.

Calculus changes when the task creates side effects. Sending email, updating DB rows, posting to social, calling APIs. These don't reverse cleanly. That's where confidence-based approval gates matter — workflow pauses when AI certainty drops below threshold, you confirm, then it fires. Track record builds, more steps earn auto-execution. Loop tightens over time.

The question to ask before building

Not "is this model smart enough?" — that's the wrong frame. The useful question is:

What's in control of what happens next?

If the answer is "the AI decides," the task better be open-ended and the consequences recoverable.

If the answer is "a defined sequence decides, and the AI handles specific steps within it," you have something you can reason about, audit, and trust.

For tools touching client comms, financial records, or anything hard to reverse: defined sequence with human review at the high-stakes steps. You can always loosen control as the system earns it. You can't un-send the email that went out while you were in a meeting.

The Replit incident wasn't a failure of intelligence. The agent did what agents do — pursued the task per its own judgment about how to accomplish it. Lemkin needed a workflow. He got an agent. Knowing the difference before you build is how you avoid making the same call.

Building something that touches real data? On Rills, approvals are free — you only pay for the actions that create value (AI calls, external APIs, integrations).