Claude can operate a real mailbox with three tool definitions and about forty lines of glue code.
tools = [
{
"name": "read_emails",
"description": "List recent emails from the agent's inbox. Returns JSON.",
"input_schema": {
"type": "object",
"properties": {
"limit": {"type": "integer", "default": 10},
"unread_only": {"type": "boolean", "default": False},
},
},
},
{
"name": "search_emails",
"description": "Search the agent's mailbox for messages matching a query.",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
{
"name": "send_email",
"description": "Send an email from the agent's own address.",
"input_schema": {
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"},
},
"required": ["to", "subject", "body"],
},
},
]
The interesting part isn't the schemas — it's what backs them. Instead of pointing these tools at a human's Gmail over OAuth, you can point them at a Nylas Agent Account: a hosted mailbox the agent owns outright, created with one command on a registered domain:
nylas agent account create agent@yourdomain.com
Agent Accounts are in beta, but they behave like any other grant, which means the same CLI commands and API endpoints work unchanged.
Why subprocess tools instead of raw OAuth
If you hand-roll Gmail OAuth, you're writing roughly 300 lines of token plumbing before the agent does anything useful. Add Microsoft Graph and you're at 600. Add IMAP fallback and you're past 1,000. The LLM agent with tools recipe takes a different route: shell out to the nylas CLI and let it handle auth, refresh, and provider differences. The implementations are short:
import json, subprocess
def _run(cmd: list[str]) -> str:
out = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
return out.stdout if out.returncode == 0 else f"Error: {out.stderr}"
def read_emails(limit: int = 10, unread_only: bool = False) -> str:
cmd = ["nylas", "email", "list", "--limit", str(limit), "--json"]
if unread_only:
cmd.append("--unread")
return _run(cmd)
def search_emails(query: str) -> str:
return _run(["nylas", "email", "search", query, "--limit", "5", "--json"])
def send_email(to: str, subject: str, body: str) -> str:
return _run(["nylas", "email", "send", "--to", to, "--subject", subject,
"--body", body, "--yes", "--json"])
Two flags matter more than they look. --yes skips the interactive "send this?" confirmation — without it, the send command blocks forever waiting for a keypress no agent will ever make. --json returns structured output the model can actually parse instead of human-formatted text.
The Claude loop
Anthropic's tool-use flow is a loop: call the model, execute any tool_use blocks, feed results back, repeat until the model answers in plain text.
import anthropic
client = anthropic.Anthropic()
DISPATCH = {"read_emails": read_emails, "search_emails": search_emails,
"send_email": send_email}
messages = [{"role": "user", "content": "Did anyone reply about the contract?"}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-5", max_tokens=1024,
tools=tools, messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
print(resp.content[0].text)
break
results = [
{"type": "tool_result", "tool_use_id": block.id,
"content": DISPATCH[block.name](**block.input)}
for block in resp.content if block.type == "tool_use"
]
messages.append({"role": "user", "content": results})
Claude may issue several tool calls before producing a final answer — search first, read a specific message, then draft a reply. The loop shape doesn't care.
What a multi-turn run looks like
Give the loop a real task and the trace is more interesting than the code. For "Did anyone reply about the contract?", a typical run goes:
- Claude emits a
tool_useblock:search_emailswith{"query": "contract"}. - The CLI returns five matches as JSON. Claude notices one is a reply from yesterday but the snippet is truncated.
- Claude calls
read_emailswith{"limit": 10}to pull recent messages with full context. - With both results in the conversation,
stop_reasoncomes back asend_turnand Claude answers in plain text: who replied, when, and what they said.
No step in that sequence was scripted. The model decided to search before reading, and decided two tool calls were enough. That's the whole appeal of tool use over a hardcoded pipeline — and also why the guardrails below matter.
Guarding the send tool
read_emails and search_emails are harmless. send_email is not, so it deserves three layers of restraint:
- Put the guardrail in the description. The cookbook's schema for the send tool reads: "Confirm recipient, subject, and body with the user before calling." Claude treats tool descriptions as instructions, so this one line meaningfully reduces surprise sends in interactive use.
-
Keep the timeout. The
timeout=30on everysubprocess.runcall isn't decoration. A CLI command waiting on a prompt or a slow network would otherwise hang the loop forever — exactly the failure mode--yesexists to prevent, caught a second time. -
Scope the credential. The CLI acts on whichever grant is active. For a multi-tenant agent, run a per-tenant CLI process or pass
--api-keyexplicitly so one tenant's loop can never touch another tenant's mailbox.
Keeping context under control
nylas email list --limit 100 produces a wall of JSON that'll eat your context window. The cookbook's advice: cap limit aggressively in the schema itself — the default of 10 is deliberate, and 5 is a reasonable floor for list calls. Let error strings through too. Subprocess failures come back as stderr text, and the model is surprisingly good at deciding what to do with "grant expired" versus "rate limited."
One more operational note: the CLI acts on whichever grant is currently active in nylas auth list. An Agent Account shows up there with Provider: Nylas, so after creating one, switch to it before starting the loop — otherwise your agent cheerfully sends from your personal address.
Why the agent should own the mailbox
Backing these tools with the agent's own address changes the safety story. Replies land in an inbox your application controls. There's no human whose sent folder fills with machine-written mail, and no OAuth consent that breaks when that human leaves the company. The mailbox sends, receives, and threads like any normal account.
Subprocess, MCP, or SDK?
There are three ways to wire Claude to this mailbox, and they suit different runtimes:
| Route | Best for | What it takes |
|---|---|---|
| Subprocess + CLI (this post) | Custom Python loops you fully control | Three wrapper functions, ~40 lines |
| MCP | Hosts that already speak MCP, like Claude Code |
nylas mcp install --assistant claude-code — registers 16 email, calendar, and contacts tools, no wrappers |
| SDK / raw API | Production services |
pip install nylas, then call {base_url}/v3/grants/{grant_id}/{resource} with a Bearer API key |
The SDK route trades the CLI's convenience for explicitness: every call carries the grant_id, errors come back as structured JSON with an error.type field (unauthorized, rate_limit_error, invalid_request_error), and nothing depends on local CLI state. The autonomous agents quickstart covers the CLI and MCP routes, and the coding agents guide covers the SDK path if you'd rather call the API directly.
Try giving the loop a task that requires multiple turns — "find the latest invoice email and forward a summary to accounting" — and watch which tools Claude chains together. What's the first tool you'd add beyond these three?
Top comments (1)
The dedicated mailbox pattern is underrated. It gives the agent a real inbox without giving it your whole personal communication surface.
I would treat the mailbox less like "Claude can read email" and more like an integration boundary: limited scopes, predictable JSON, explicit actions, and a human-readable audit trail for what the agent saw.