Guardrails for Autonomous Email Agents: Policies Deep Dive

#security #ai #email #api

What's actually stopping your autonomous email agent from doing something stupid at 3 a.m.?

If the answer is "the system prompt," you don't have a guardrail — you have a suggestion. Prompts are advisory. The model can be jailbroken, the retry loop can go runaway, the classifier can misfire. Real guardrails live below the application layer, where the agent's code can't talk its way past them.

That's the job of Policies, Rules, and Lists on Nylas Agent Accounts (both in beta). They're admin-scoped resources — no grant ID in the path, enforced by the platform — that control what an agent's mailbox can send, receive, and store, regardless of what the LLM decides.

The shape of the system

Three resources form a chain, and a fourth applies it:

Lists hold values — typed collections of domains, TLDs, or email addresses.
Rules match mail (inbound or outbound) against conditions, including list membership via the in_list operator, and run actions: block, mark_as_spam, assign_to_folder, archive, trash, and more.
Policies bundle limits and spam settings.
Workspaces carry one policy_id plus an array of rule_ids, and every Agent Account in the workspace inherits both.

That last point is the part people miss: you never attach a policy to an individual grant. Workspaces carry policies and rules. Every application has a default workspace that holds any account you haven't placed elsewhere, so attaching a policy there covers all of your unassigned agents at once. Want different rules for your sales agents versus your support agents? Separate workspaces, separate policies.

Everything is optional. With no workspace policy, an account runs at your billing plan's maximums and delivers every inbound message to the inbox. You reach for these resources when you want stricter behavior.

What a policy actually controls

Two categories: limits (attachment size and count, allowed MIME types, total message size, per-account storage, daily send quotas, and inbox/spam retention) and spam detection (DNSBL checking, header anomaly detection, and a spam_sensitivity dial running from 0.1 to 5.0 — higher is more aggressive).

curl --request POST \
  --url "https://api.us.nylas.com/v3/policies" \
  --header "Authorization: Bearer <NYLAS_API_KEY>" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "Standard Agent Account Policy",
    "limits": {
      "limit_attachment_size_limit": 26214400,
      "limit_attachment_count_limit": 20,
      "limit_inbox_retention_period": 365,
      "limit_spam_retention_period": 30
    },
    "spam_detection": {
      "use_list_dnsbl": true,
      "use_header_anomaly_detection": true,
      "spam_sensitivity": 1.5
    }
  }'

Omit any limit and it defaults to your plan's maximum. Ask for a value above the plan maximum and the API returns an error — you can only constrain downward, which is exactly the right shape for a guardrail.

Rules run in both directions, isolated

Each rule has a trigger: inbound rules run on received mail, outbound rules run on sends before the message reaches the email provider. The two never cross — inbound rules don't evaluate during sends, and the stored sent copy is never re-checked against inbound rules. Conditions match on from.address, from.domain, and from.tld for inbound; outbound rules add recipient.* fields (which match any recipient, including BCC and SMTP envelope recipients — useful for data-loss prevention) and outbound.type, which distinguishes a reply from a fresh compose.

Rules run in priority order, lowest first, on a 0–1000 scale (default 10). The block action is terminal: inbound, it rejects the message at the SMTP level so your mailbox never stores it; outbound, it rejects the send with HTTP 403 before delivery and no sent copy is created.

Capacity caps are generous but fixed: 50 conditions per rule, 20 actions per rule, 10 lists referenced per in_list condition, 500 characters per condition value.

One asymmetry worth internalizing: outbound non-blocking actions (archive, mark_as_read, assign_to_folder, and friends) modify only the stored sent copy after the send succeeds. They never change what the recipient receives. If you want to affect delivery on the outbound side, block is the only action that does.

Wiring the chain end-to-end

Here's the full sequence for a working blocklist, because the resource chain is easier to see in calls than in prose. Create a typed list, load it (up to 1,000 items per request — values are lowercased, trimmed, validated against the list's type, and duplicates are silently ignored):

curl --request POST \
  --url "https://api.us.nylas.com/v3/lists/<LIST_ID>/items" \
  --header "Authorization: Bearer <NYLAS_API_KEY>" \
  --header "Content-Type: application/json" \
  --data '{ "items": ["spam-domain.com", "another-bad-domain.net"] }'

Then create an inbound rule with from.domain / in_list pointing at the list ID and a block action, and add that rule's ID to the workspace's rule_ids. From that moment, every Agent Account in the workspace rejects mail from those domains at SMTP. When the blocklist grows next week, you update the list — the rule and workspace never change, which means a non-engineer with list access can run your block program without a deploy.

Placement is equally mechanical. Set workspace_id when you create a grant on POST /v3/connect/custom, or move an existing account with PATCH /v3/grants/{grant_id}. A workspace with auto_group: true claims new Agent Accounts whose email domain matches its domain automatically — handy for the one-workspace-per-customer pattern, where each customer's domain maps to a workspace carrying that customer's quota and spam tolerance. One catch on the default workspace: policy_id and rule_ids are the only fields you can update there; Nylas manages the rest.

The fail-closed detail that earns trust

What happens when the rule engine itself hiccups — say, a list lookup fails mid-evaluation on a block rule? The engine fails closed: the message is blocked rather than waved through. These infrastructure blocks surface as retryable errors (an API send gets 503 instead of 403; inbound SMTP answers with a 451 tempfail so the sending server retries instead of bouncing), and the audit record carries blocked_by_evaluation_error: true so you can tell an outage from a genuine match.

A security control that fails open is decorative. This one doesn't.

Every evaluation leaves a trail

Each time the engine evaluates a message — inbound, SMTP envelope, or outbound send — an audit entry is recorded. GET /v3/grants/{grant_id}/rule-evaluations lists them newest-first, with the evaluation stage, the normalized sender/recipient data considered, which rules matched, and which actions applied. When your agent asks "why didn't I get that email?" or your customer asks "why was my send rejected?", this endpoint is the answer.

Tuning advice from the docs worth repeating

Start spam_sensitivity at 1.0 and adjust from observed behavior. Order rules so specific matches (is, in_list against a small list) run before broad contains matches, since the first block wins. Prefer lists over inline rule values whenever the set might grow — one list can feed many rules and be updated without touching rule definitions. And set both retention values together: spam retention must be shorter than inbox retention.

One policy, attached to your default workspace, takes about two minutes to set up and applies to every agent you provision afterward. The full guide covers the complete schemas. Create the policy before you write the agent — which limit would have saved you on your last automation incident?