Qasim Muhammad

Posted on Jun 12

Inbox Zero, but the Inbox Belongs to a Robot

#ai #email #productivity #agents

8:30 AM: you kick off the inbox-zero loop while the coffee brews. 8:35 AM: yesterday's 50 unread messages are sorted into four buckets, replies are drafted for the ones that matter, the noise is archived, and you've approved the lot with a few keystrokes. The rest of the day, the inbox only contains new mail.

That's the daily rhythm an email agent can sustain — and the daily part is the whole trick. Inbox zero has never been hard to reach once; it's hard to keep, because the maintenance is boring. Boring, repetitive classification is exactly what agents are for. And it gets more interesting when the inbox in question belongs to the robot itself.

The loop: four buckets, five minutes

Pull a manageable batch of unread — 50 is the sweet spot; below 20 you waste setup overhead, above 50 you blow the LLM context budget and the approval review drags:

nylas email list --unread --limit 50 --json

Classify each message into one of four buckets:

Bucket	Reply window
Urgent	Within hours — client issue, manager request
Action required	Today — meeting follow-up, review
FYI	No response — newsletter, status update
Archive	Now — marketing, automated alerts

The agent shows a summary table; you audit it before anything else happens. A misfiled "Action" caught at this step becomes a correct draft instead of a missed commitment. Then the agent drafts replies for Urgent and Action items, you approve, edit, or skip each one, and approved drafts ship while Archive items get archived.

One rule is non-negotiable: the agent never sends without explicit approval. The agent drafts; the human ships. Even when a draft is obviously fine, the click is the difference between "AI wrote this" and "I wrote this with help."

First runs will misclassify. Encode the corrections as standing rules in the agent's prompt, and each pass gets faster:

# Inbox-zero rules
always_fyi:
  - "from: sales@*"
  - "from: noreply@*"
  - "subject: ^\\[GitHub\\]"

always_urgent:
  - "from: *@board.example.com"
  - "subject: \\b(p0|incident|outage)\\b"

If you'd rather drive it as a script than a chat session, the whole loop fits in a dozen lines of orchestration:

unread = fetch_unread(limit=50)
buckets = classify_all(unread)              # 4-bucket categorization
print_summary_table(buckets)
for msg in input_corrections(buckets):      # interactive correction
    pass
drafts = [draft_reply(m) for m in buckets["URGENT"] + buckets["ACTION"]]
for draft in interactive_approval(drafts):  # one-by-one Y/N/edit
    if draft.approved:
        send(draft)
for msg in buckets["ARCHIVE"]:
    archive(msg)

The interactive correction and approval steps are what separate this from a cron-driven triage bot — same bucket model, different trust contract. Some teams eventually add a fifth "delegate" bucket that auto-CCs a teammate; do that after the four-bucket version is bedded in, not before.

Now give the robot its own mailbox

Run this loop against a human's inbox and you've built an assistant. Run it against an inbox the agent owns and you've built something more autonomous: a support@ or triage@ address that is the agent's workspace, not borrowed territory.

That's what Agent Accounts provide — hosted mailboxes (currently in beta) created by API call, each with a real address and a grant_id that works with the standard Messages, Threads, Folders, and Drafts endpoints. Every mailbox arrives with six system folders — inbox, sent, drafts, trash, junk, archive — and you can create custom folders alongside them for whatever taxonomy your buckets need. (System folder names are reserved.)

Folder hygiene stops being a human courtesy and becomes agent state: inbox is the work queue, archive is processed-no-action, a custom escalations folder is the handoff point to humans. The mailbox structure is the state machine.

Push the easy classification below the agent

Here's an efficiency the borrowed-inbox version can't touch: on an agent-owned mailbox, rules run at the SMTP layer, before the message.created webhook ever fires. Inbound mail flows through policy checks on arrival — block rejects at the SMTP stage, mark_as_spam routes to junk, assign_to_folder files it — and rule evaluations are logged for audit.

So the deterministic 80% of your Archive bucket (newsletters, automated alerts, known senders) never costs an LLM token. The model only reasons over mail that survived the filter. Cheaper, faster, and the agent reacts to less noise — which also means fewer chances to misfire on a mailer-daemon reply.

The numbers that shape the design

A few mailbox facts worth designing around:

Send quota: 200 messages per account per day on the free plan (paid plans have no daily cap by default), and you can set stricter per-account quotas through a policy. Sends over the limit return an error on the API call — so your send wrapper should expect and surface it, not retry blindly. For a triage agent that mostly archives and drafts, generous; for anything chattier, a forcing function for restraint.
Outbound size: 40 MB per message, though recipient servers commonly enforce ~25 MB.
Webhook truncation: bodies over ~1 MB arrive as message.created.truncated with the body omitted — always fetch the full message before classifying.
Backlogs: starting from 800 unread? Run the loop several times in 50-message passes rather than one heroic session.

And drafts deserve a special mention: full CRUD lives at /v3/grants/{grant_id}/drafts, and sending an existing draft behaves exactly like a normal send. That's the primitive that makes the approval gate clean — the agent's output is a reviewable draft object, not an irreversible send.

Two questions before you build

Can server-side rules replace the LLM entirely? For the Archive bucket, mostly yes — known senders and automated alerts are deterministic. For Urgent vs. Action, no: "is this a client issue or a status update?" needs judgment, and that's the part worth spending tokens on. The split is the design: rules below, model above.

What happens to skipped drafts? They stay in the queue — and because drafts are real objects in the drafts folder with full CRUD, "revisit at the end of the session" is a list call, not a memory exercise. Nothing is lost by deferring a decision.

Start with one mailbox

The interactive flow is written up in the inbox-zero recipe, and the mailbox mechanics — folders, lifecycle, deliverability signals — in how mailboxes work.

Try this tomorrow morning: run the four-bucket loop once against 50 unread messages and time it. If it beats your usual triage, the follow-up question gets interesting — which of your team's shared inboxes deserves to become a robot's inbox first?

Top comments (1)

Mehmet Can Farsak • Jun 13

Great practical breakdown. The 'agent drafts, human ships' principle is exactly the kind of guardrail AI workflows need. I've seen this pattern break down when agents are asked to brainstorm or plan — they'll auto-classify into action mode and skip the thinking part entirely. Built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) to solve that with PreToolUse hooks: it enforces a 'drafting mode' where the agent explores options without jumping to execution. Works well as an infra layer on top of the kind of human-in-the-loop patterns you describe here.