Qasim Muhammad

Posted on Jun 15

Email Tools for Claude: Tool Use With an Agent Mailbox

#ai #llm #email #tutorial

Claude can operate a real mailbox with three tool definitions and about forty lines of glue code.

tools = [
    {
        "name": "read_emails",
        "description": "List recent emails from the agent's inbox. Returns JSON.",
        "input_schema": {
            "type": "object",
            "properties": {
                "limit": {"type": "integer", "default": 10},
                "unread_only": {"type": "boolean", "default": False},
            },
        },
    },
    {
        "name": "search_emails",
        "description": "Search the agent's mailbox for messages matching a query.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    },
    {
        "name": "send_email",
        "description": "Send an email from the agent's own address.",
        "input_schema": {
            "type": "object",
            "properties": {
                "to": {"type": "string"},
                "subject": {"type": "string"},
                "body": {"type": "string"},
            },
            "required": ["to", "subject", "body"],
        },
    },
]

The interesting part isn't the schemas — it's what backs them. Instead of pointing these tools at a human's Gmail over OAuth, you can point them at a Nylas Agent Account: a hosted mailbox the agent owns outright, created with one command on a registered domain:

nylas agent account create agent@yourdomain.com

Agent Accounts are in beta, but they behave like any other grant, which means the same CLI commands and API endpoints work unchanged.

Why subprocess tools instead of raw OAuth

If you hand-roll Gmail OAuth, you're writing roughly 300 lines of token plumbing before the agent does anything useful. Add Microsoft Graph and you're at 600. Add IMAP fallback and you're past 1,000. The LLM agent with tools recipe takes a different route: shell out to the nylas CLI and let it handle auth, refresh, and provider differences. The implementations are short:

import json, subprocess

def _run(cmd: list[str]) -> str:
    out = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
    return out.stdout if out.returncode == 0 else f"Error: {out.stderr}"

def read_emails(limit: int = 10, unread_only: bool = False) -> str:
    cmd = ["nylas", "email", "list", "--limit", str(limit), "--json"]
    if unread_only:
        cmd.append("--unread")
    return _run(cmd)

def search_emails(query: str) -> str:
    return _run(["nylas", "email", "search", query, "--limit", "5", "--json"])

def send_email(to: str, subject: str, body: str) -> str:
    return _run(["nylas", "email", "send", "--to", to, "--subject", subject,
                 "--body", body, "--yes", "--json"])

Two flags matter more than they look. --yes skips the interactive "send this?" confirmation — without it, the send command blocks forever waiting for a keypress no agent will ever make. --json returns structured output the model can actually parse instead of human-formatted text.

The Claude loop

Anthropic's tool-use flow is a loop: call the model, execute any tool_use blocks, feed results back, repeat until the model answers in plain text.

import anthropic

client = anthropic.Anthropic()
DISPATCH = {"read_emails": read_emails, "search_emails": search_emails,
            "send_email": send_email}

messages = [{"role": "user", "content": "Did anyone reply about the contract?"}]
while True:
    resp = client.messages.create(
        model="claude-sonnet-4-5", max_tokens=1024,
        tools=tools, messages=messages,
    )
    messages.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason != "tool_use":
        print(resp.content[0].text)
        break
    results = [
        {"type": "tool_result", "tool_use_id": block.id,
         "content": DISPATCH[block.name](**block.input)}
        for block in resp.content if block.type == "tool_use"
    ]
    messages.append({"role": "user", "content": results})

Claude may issue several tool calls before producing a final answer — search first, read a specific message, then draft a reply. The loop shape doesn't care.

What a multi-turn run looks like

Give the loop a real task and the trace is more interesting than the code. For "Did anyone reply about the contract?", a typical run goes:

Claude emits a tool_use block: search_emails with {"query": "contract"}.
The CLI returns five matches as JSON. Claude notices one is a reply from yesterday but the snippet is truncated.
Claude calls read_emails with {"limit": 10} to pull recent messages with full context.
With both results in the conversation, stop_reason comes back as end_turn and Claude answers in plain text: who replied, when, and what they said.

No step in that sequence was scripted. The model decided to search before reading, and decided two tool calls were enough. That's the whole appeal of tool use over a hardcoded pipeline — and also why the guardrails below matter.

Guarding the send tool

read_emails and search_emails are harmless. send_email is not, so it deserves three layers of restraint:

Put the guardrail in the description. The cookbook's schema for the send tool reads: "Confirm recipient, subject, and body with the user before calling." Claude treats tool descriptions as instructions, so this one line meaningfully reduces surprise sends in interactive use.
Keep the timeout. The timeout=30 on every subprocess.run call isn't decoration. A CLI command waiting on a prompt or a slow network would otherwise hang the loop forever — exactly the failure mode --yes exists to prevent, caught a second time.
Scope the credential. The CLI acts on whichever grant is active. For a multi-tenant agent, run a per-tenant CLI process or pass --api-key explicitly so one tenant's loop can never touch another tenant's mailbox.

Keeping context under control

nylas email list --limit 100 produces a wall of JSON that'll eat your context window. The cookbook's advice: cap limit aggressively in the schema itself — the default of 10 is deliberate, and 5 is a reasonable floor for list calls. Let error strings through too. Subprocess failures come back as stderr text, and the model is surprisingly good at deciding what to do with "grant expired" versus "rate limited."

One more operational note: the CLI acts on whichever grant is currently active in nylas auth list. An Agent Account shows up there with Provider: Nylas, so after creating one, switch to it before starting the loop — otherwise your agent cheerfully sends from your personal address.

Why the agent should own the mailbox

Backing these tools with the agent's own address changes the safety story. Replies land in an inbox your application controls. There's no human whose sent folder fills with machine-written mail, and no OAuth consent that breaks when that human leaves the company. The mailbox sends, receives, and threads like any normal account.

Subprocess, MCP, or SDK?

There are three ways to wire Claude to this mailbox, and they suit different runtimes:

Route	Best for	What it takes
Subprocess + CLI (this post)	Custom Python loops you fully control	Three wrapper functions, ~40 lines
MCP	Hosts that already speak MCP, like Claude Code	`nylas mcp install --assistant claude-code` — registers 16 email, calendar, and contacts tools, no wrappers
SDK / raw API	Production services	`pip install nylas`, then call `{base_url}/v3/grants/{grant_id}/{resource}` with a Bearer API key

The SDK route trades the CLI's convenience for explicitness: every call carries the grant_id, errors come back as structured JSON with an error.type field (unauthorized, rate_limit_error, invalid_request_error), and nothing depends on local CLI state. The autonomous agents quickstart covers the CLI and MCP routes, and the coding agents guide covers the SDK path if you'd rather call the API directly.

Try giving the loop a task that requires multiple turns — "find the latest invoice email and forward a summary to accounting" — and watch which tools Claude chains together. What's the first tool you'd add beyond these three?

Top comments (1)

Alex Shev • Jun 15

The dedicated mailbox pattern is underrated. It gives the agent a real inbox without giving it your whole personal communication surface.

I would treat the mailbox less like "Claude can read email" and more like an integration boundary: limited scopes, predictable JSON, explicit actions, and a human-readable audit trail for what the agent saw.