Spam Detection for Inbound Agent Mail

#security #email #ai #api

Spam aimed at a human wastes attention; spam aimed at an autonomous agent becomes input — so filter it before the model ever sees it:

curl --request POST \
  --url "https://api.us.nylas.com/v3/policies" \
  --header "Authorization: Bearer $NYLAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "Agent inbound hygiene",
    "spam_detection": {
      "use_list_dnsbl": true,
      "use_header_anomaly_detection": true,
      "spam_sensitivity": 1.0
    }
  }'

That's a policy for Agent Accounts — the Nylas-hosted mailboxes (currently in beta) built for AI agents and system identities. Attach it to a workspace and every account in that workspace inherits the spam settings. Here's what each knob does and how to tune it for a reader that never gets suspicious on its own.

What the policy actually checks

An LLM agent will earnestly process whatever lands in its inbox: phishing, junk, auto-replies, and messages whose entire purpose is to manipulate the model. The threat isn't annoyance, it's contaminated context. Policy-level spam detection gives you two independent signals, evaluated when mail arrives over SMTP:

use_list_dnsbl checks the sending server against DNS blocklists — the classic reputation lookup that catches known-bad infrastructure regardless of what the message says.
use_header_anomaly_detection looks for structural weirdness in the message headers, the kind of malformation that legitimate mail servers don't produce.

Both run before your application sees anything, which is the right place for this work. Filtering at the mailbox layer is cheaper than teaching every downstream prompt to be skeptical, and per the mailboxes guide, inbound filtering also keeps the agent from reacting to loops and mailer-daemon noise.

Where flagged mail goes

A message that trips spam detection routes to the junk folder — one of the six system folders every account ships with — instead of inbox. The agent's normal read path (listing inbox messages, reacting to inbound webhooks for new mail) simply doesn't encounter it, but nothing is destroyed: you can inspect junk when you're tuning, and false positives are recoverable.

Retention is part of the same policy. You can set limit_spam_retention_period and limit_inbox_retention_period independently, with one constraint worth knowing up front: the spam window must be shorter than the inbox window, so junk clears out ahead of real mail. For an agent that handles transient workflows, aggressive spam retention is free hygiene — there's no reason to store a month of junk for a mailbox whose job resolves in hours.

When to block instead of flag

Spam detection is probabilistic; sometimes you know the answer in advance. For senders you've already judged, a rule with a block action rejects the message at the SMTP layer — it's never stored, never delivered, never an event your application has to ignore:

curl --request POST \
  --url "https://api.us.nylas.com/v3/rules" \
  --header "Authorization: Bearer $NYLAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "Block spam-domain.com",
    "priority": 1,
    "trigger": "inbound",
    "match": {
      "conditions": [
        { "field": "from.domain", "operator": "is", "value": "spam-domain.com" }
      ]
    },
    "actions": [{ "type": "block" }]
  }'

There's a softer middle ground too: a rule with a mark_as_spam action routes matching mail to junk deterministically, without the terminal finality of block. Use it for gray-area senders — newsletters, notification floods — that you want out of the agent's way but available for review. Rules run in priority order from 0 to 1000, lower numbers first, so put your specific known-bad rules ahead of broad pattern matches.

One scoping note: inbound rules match only sender fields — from.address, from.domain, and from.tld — with the operators is, is_not, contains, and in_list. String matching is case-insensitive, so you don't need variants for SPAM-Domain.com.

Let the blocklist grow without redeploys

Hardcoding domains into rule conditions works until the third spam wave, when someone has to edit rule definitions again. Lists fix the maintenance problem: a list is a typed collection (domain, tld, or address) that rules reference through the in_list operator, so updating who's blocked means updating the list — every rule that points at it picks up the change immediately, and a non-engineer can do it.

# Add to the blocklist — up to 1000 items per request
curl --request POST \
  --url "https://api.us.nylas.com/v3/lists/$LIST_ID/items" \
  --header "Authorization: Bearer $NYLAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "items": ["spam-domain.com", "another-bad-domain.net"]
  }'

Values are lowercased and trimmed on write, validated against the list's type (a domain list rejects full email addresses), and duplicates are silently ignored — so an automated "report spam" pipeline can append blindly. The rule side just swaps the operator: { "field": "from.domain", "operator": "in_list", "value": ["<LIST_ID>"] } with the same block action.

Audit why a message was junked

Tuning blind is guessing. Every time the rule engine evaluates an inbound message for an Agent Account, Nylas records an audit entry you can list with GET /v3/grants/{grant_id}/rule-evaluations. Each record shows the evaluation stage — smtp_rcpt if the message was rejected before acceptance, inbox_processing if it was evaluated after — plus the normalized sender data considered, which rules matched, and which actions applied.

Two details earn their place in your runbook:

Fail-closed blocks are labeled. If a block rule can't be evaluated because of a transient error (a list lookup failing mid-in_list, say), Nylas blocks the message rather than letting it through, and the audit record carries blocked_by_evaluation_error: true. On the SMTP side this surfaces as a 451 tempfail, so the sending server retries instead of bouncing — legitimate mail delayed, not lost.
Spam-flagged mail is queryable. Cross-reference matched_rule_ids with the Rules API to see exactly which condition caught a message, which turns "the filter ate a customer email" from a mystery into a one-line fix.

Tuning the sensitivity dial

spam_sensitivity ranges from 0.1 to 5.0, higher meaning more aggressive. The docs' advice is to start at 1.0 and adjust from evidence: go up if junk is reaching the inbox, down if legitimate mail is landing in junk. Resist the instinct to start high — an agent mailbox that silently junks a real customer email fails worse than one that occasionally reads a newsletter, because nobody is watching the junk folder day to day.

Tuning needs feedback, so check what the filter is doing during the first weeks: list the junk folder periodically and skim what's accumulating. Every false positive you find is a data point for lowering sensitivity or adding an explicit allow-pattern rule ahead of the spam check.

Set it before the agent reads anything

The ordering mistake teams make is shipping the agent first and adding inbound hygiene after the first weird incident. Do it the other way: the policy is one API call, the workspace attachment is one more, and from that point every account you add inherits the protection. The full schema — limits, retention, sensitivity, and the rule actions that pair with it — is in Policies, Rules, and Lists.

Next time you provision an agent mailbox, create the policy in the same script that creates the account — then send yourself one obvious junk message and one legitimate one, and verify each lands where you expected.

Top comments (1)

xulingfeng • Jun 15

The line 'contaminated context' perfectly captures why agent inbox hygiene is a different game from human spam filtering. For us spam is an annoyance. For the agent it becomes part of the truth it operates on.