Why your AI agent needs a Will-actions queue: separating agent-doable from human-required

#ai #agents #autonomy #buildinpublic

The agent that knew it was stuck

Five weeks into running Atlas — an AI agent I built to operate my startup, Whoff Agents, end-to-end — I noticed something weird in its logs.

Every 30 minutes, the heartbeat loop would fire. Read state. Pick one action. Execute. Log. Sleep.

And every 30 minutes, for about a week straight, the same three items showed up at the top of the loop's "verify" step:

Will action verifications (all still NEGATIVE):
  - YT token scopes = [youtube.upload] only. force-ssl absent.
  - webhook/config.json price_to_repo contains 0 atlas-starter-kit matches.
  - webhook/check_purchases.py populate_price_maps not applied.

The agent wasn't stuck on a hard technical problem. It was stuck on three things that physically required me — a human — to do. And it knew. It logged the blockers every loop, picked the highest-value action it could take, and kept moving.

That little three-line block at the top of every loop is, after five weeks of iteration, the most important pattern I've added to the agent. I call it the Will-actions queue.

This is the post I wish someone had written before I built my first long-running agent.

The naive design: "the agent can do anything"

Most agent demos pretend the agent is omnipotent. Browser? Sure. Shell? Of course. API keys? It has them all. The demo runs once, the agent finishes the task, applause.

This is fine for a demo. It is catastrophic for a 30-minutes-forever heartbeat loop.

Real agents — agents that have to keep running across days and weeks — hit four classes of work they cannot complete:

Auth / consent surfaces. OAuth re-consent screens. 2FA challenges. CAPTCHAs. Account creation. The agent has the API token, but the token doesn't include the scope it needs, and the only way to add the scope is for a human to click "Allow" in a browser tab while logged into the right account.
Policy / risk decisions. Should we refund this customer? Should we publish this controversial post? Should we ship the price change live? An agent can technically do these. It shouldn't, because the cost of "wrong" is asymmetric and the human signed up to be the one wearing the consequences.
Real-world physical actions. Signing a contract. Wiring money. Showing up to a meeting. Photographing a product. The agent will never do these.
Environment / infra changes the agent can't safely make. Rotating a production secret. Editing .env. Deleting a database. Touching shared infrastructure where the blast radius isn't recoverable.

Each of these blocks the agent. And in a naive design, what does a blocked agent do?

It retries. Forever. Burning tokens, spamming logs, sometimes spamming your customers, and definitely spamming you.

What the Will-actions queue actually is

It's a markdown table. That's it.

## Pending Will Actions (BLOCKING REVENUE)

| # | Action                                                    | Impact   | Filed       |
|---|-----------------------------------------------------------|----------|-------------|
| 1 | Map Atlas Starter Kit price_id in webhook/config.json     | CRITICAL | 2026-05-10  |
| 2 | Apply populate_price_maps refactor in check_purchases.py  | HIGH     | 2026-05-09  |
| 3 | Re-run reauth_youtube_fullscope.py for youtube.force-ssl  | MEDIUM   | 2026-05-10  |
| 4 | Decide deliverable for Atlas Starter Kit purchase         | PREREQ   | Open        |
| 5 | Audit secondary Stripe payment link 5kQ4gB7Nd1Jj3nx1AN... | MEDIUM   | 2026-05-10  |

Five columns. Lives in .paul/STATE.md. The agent appends to it, the human empties it.

That sounds boring. It's load-bearing.

The four properties that make it work

I tried three other shapes of this before settling on the table. Slack DMs to myself. A Notion database. A GitHub project board. They all failed in subtle ways. Here's what the final design got right:

1. It's in the agent's primary state file.

The Will-actions queue lives in the same file the agent reads at the top of every heartbeat loop. Not a separate system the agent has to remember to check. Not an email I might miss. It's structurally impossible for the agent to skip looking at it, because reading STATE.md is step one of every loop.

2. Every row has an Impact column.

CRITICAL, HIGH, MEDIUM. The agent uses this to decide whether to keep working around the blocker or to surface it harder (e.g., post a louder reminder, escalate to the human via a different channel). And it tells me, the human, what to do first when I sit down. "What's critical?" is the only question I have to answer.

3. Every row has a Filed date.

This is the one most people skip. The Filed date does two things:

It tells the agent how long a blocker has been blocking. After 3+ days on a CRITICAL row, the agent's priority logic shifts — it starts spending some of its loop budget working around the blocker (e.g., routing customers to a different working product instead of the broken one) instead of just waiting.
It tells me, the human, how badly I've let the agent down. "Filed 2026-05-09" on a CRITICAL row reading today's date in the terminal is a visceral feedback signal.

4. The agent doesn't add rows lightly.

I have a rule for Atlas: before adding a Will-action, the agent must try at least two workarounds. If both fail and the agent confidently knows what's needed, then it files the row, with a one-line repro of what it tried.

This matters because the queue's value is inversely proportional to its length. If the queue has 47 items, I will read none of them. If it has 3, I will fix all of them tonight.

The "drift detection" half

Half the value of the Will-actions queue isn't in the queue itself — it's in what the agent does because the queue exists.

When the agent hits a wall, the question becomes: "Is this a thing I should try to do, or is this a thing for the queue?"

That single forced choice prevents a category of bug I now call executor drift: the agent, faced with a task it shouldn't be doing, convinces itself it can do it and starts curling endpoints / running osascript / driving browsers it shouldn't be driving. In agentic systems, drift is how you wake up to find your AI agent posted on the wrong social account, sent an email from the wrong inbox, or wired money to the wrong vendor.

The queue gives the agent a clean escape hatch: "this is a Will-action, not a me-action." It's the difference between an agent that knows its limits and an agent that hallucinates capabilities.

Anti-patterns I learned the hard way

Things I tried that broke:

Queue in Slack. Agent DMs me when blocked. Problem: notification fatigue, I muted the channel, blockers piled up invisibly. Also, agent can't easily read its own past DMs to know what's still outstanding.
Queue in GitHub Issues. Felt clean. Problem: closing the loop required the agent to use the GitHub MCP, which itself sometimes broke, which then needed a Will-action to fix the Will-action queue. Recursion of pain.
Auto-paging on CRITICAL. Agent SMSes me at 3am for any CRITICAL row. Problem: not every CRITICAL is 3am-urgent. Defined "urgent" as a separate column, kept paging only for that subset.
Free-form text instead of a table. Tried just letting the agent journal what it was blocked on. Problem: I couldn't scan the journal in 10 seconds, so I didn't, so blockers rotted. The table forces brevity and scanability.

How to add this to your own agent

If you have a long-running agent, four-line patch:

Add a ## Pending Human Actions section to whatever state file your agent reads at loop entry.
Give the agent one rule: before doing a thing that would require human auth/policy/infra, try two workarounds first. If still stuck, append a row.
Give the agent one rule: at the top of each loop, re-verify open rows in the queue and update status if the human has resolved any.
Give yourself one rule: when you sit down at the machine, the queue is the first thing you clear.

That's it. No new infra. No new MCP. No new vector store. A markdown table and two rules.

The bigger lesson

The interesting work in agentic systems right now isn't the agent's capabilities. It's the seams between agent and human — the places where the work hands off, and especially where it hands off back.

Most agent frameworks I've seen optimize the autonomous half of the loop and treat the human half as an afterthought ("oh, just notify them on Slack"). But after five weeks of running an agent that actually has to make a startup work, I'm convinced the human-handback seam is where most of the production failures live.

A Will-actions queue is a tiny, dumb, markdown-table-shaped solution to a problem most agent designs don't even name yet. Steal it.

Atlas is the AI agent running Whoff Agents. This post is part of a series on what 5+ weeks of autonomous operations actually teaches you about agent design.