<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Qasim Muhammad</title>
    <description>The latest articles on DEV Community by Qasim Muhammad (@qasim157).</description>
    <link>https://dev.to/qasim157</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837851%2F1a2b79c0-c959-45ef-b215-a68515f17bef.jpg</url>
      <title>DEV Community: Qasim Muhammad</title>
      <link>https://dev.to/qasim157</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/qasim157"/>
    <language>en</language>
    <item>
      <title>Email Triage Taxonomies for LLM Classification</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:40:16 +0000</pubDate>
      <link>https://dev.to/qasim157/email-triage-taxonomies-for-llm-classification-3o1j</link>
      <guid>https://dev.to/qasim157/email-triage-taxonomies-for-llm-classification-3o1j</guid>
      <description>&lt;p&gt;The most important design decision in an email classifier isn't the model — it's the label set, and here's the one I keep coming back to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You triage email into one of four categories:

URGENT  — production incidents, executive requests; reply within 1 hour
ACTION  — code reviews, meeting follow-ups; reply same day
FYI     — informational, no response needed
NOISE   — newsletters, marketing, automated notifications

From:    {sender}
Subject: {subject}
Snippet: {snippet}

Return ONLY the category name. Nothing else.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the working prompt from the &lt;a href="https://developer.nylas.com/docs/cookbook/agents/email-triage-agent/" rel="noopener noreferrer"&gt;Nylas email triage recipe&lt;/a&gt;, and almost every line encodes a taxonomy-design lesson worth unpacking. Most people building email agents obsess over model choice and prompt phrasing. The recipe's quiet thesis is that the label set itself does the heavy lifting — get the taxonomy right and a cheap model classifies well; get it wrong and no model saves you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why four is the magic number
&lt;/h2&gt;

&lt;p&gt;The recipe states it flatly: four is the right number. Three loses fidelity — everything important collapses into one overloaded bucket and you've built a binary classifier with extra steps. Five and the model starts confusing categories, because the boundaries between adjacent labels get too thin to express in a definition.&lt;/p&gt;

&lt;p&gt;Notice what makes these four work. They aren't topics — they're &lt;em&gt;response obligations&lt;/em&gt;. URGENT means "reply within the hour," ACTION means "reply today," FYI means "no response needed," NOISE means "archive." Each label maps to exactly one behavior. That's the test I'd apply to any email taxonomy: if two labels lead to the same action, merge them; if one label leads to two different actions depending on content, split it.&lt;/p&gt;

&lt;p&gt;The same principle shows up in the sales context. The &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/" rel="noopener noreferrer"&gt;Agent Accounts overview&lt;/a&gt; describes an outreach agent classifying replies as interested / not now / unsubscribe — three labels, because the workflow has exactly three branches: book the meeting, schedule a follow-up, stop emailing. The taxonomy is the decision tree, flattened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The taxonomy is the dispatch table
&lt;/h2&gt;

&lt;p&gt;The recipe makes this literal. Its entire action loop is a &lt;code&gt;for&lt;/code&gt; over unread messages with one branch per label:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;fetch_unread&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;URGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACTION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;draft_reply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NOISE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;archive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# FYI: do nothing
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details here do more work than they appear to. First, the agent never sends — URGENT and ACTION produce &lt;em&gt;drafts&lt;/em&gt; a human reviews, because the cost of a wrong send (wrong person, wrong tone) is far higher than the friction of one extra click. Second, the loop is idempotent by construction: it only pulls &lt;code&gt;--unread&lt;/code&gt; messages, so anything already triaged falls out of the next run without a dedup table. The taxonomy didn't just classify the mail; it shaped the control flow into something you can run from cron unattended.&lt;/p&gt;

&lt;p&gt;Drafting also runs at different settings than classification — &lt;code&gt;temperature=0.7&lt;/code&gt; with a "three sentences max" instruction, versus the classifier's &lt;code&gt;temperature=0&lt;/code&gt;. Deterministic decisions, natural prose. Same pipeline, two different jobs, and the recipe is blunt that the sentence cap is load-bearing: without it you get drafts that read like a politely overcompensating intern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Definitions are examples, not adjectives
&lt;/h2&gt;

&lt;p&gt;Look again at the prompt. URGENT isn't defined as "very important and time-sensitive" — it's "production incidents, executive requests." Concrete instances, not abstract qualities. LLMs pattern-match far better against examples than against adjectives, and ambiguous adjectives are where classifiers drift: one model's "important" is another's "routine."&lt;/p&gt;

&lt;p&gt;The deadline annotations ("reply within 1 hour," "reply same day") double as tie-breakers. When a message sits between two buckets, the model can ask the implicit question — does this need an answer in an hour or a day? — which is a much sharper discriminator than topical similarity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constrain the output like you mean it
&lt;/h2&gt;

&lt;p&gt;Taxonomy design extends to the response format. The recipe runs classification at &lt;code&gt;temperature=0&lt;/code&gt; with &lt;code&gt;max_tokens=10&lt;/code&gt;: deterministic output, one category name, no room for an explanatory paragraph. And it still validates — the code checks the response against the four valid strings and falls back to &lt;code&gt;FYI&lt;/code&gt; on anything unrecognized, because LLMs occasionally invent a category. An unrecognized label defaulting to "leave it alone" is the safe failure; defaulting to NOISE would silently archive real mail.&lt;/p&gt;

&lt;p&gt;Input is constrained just as aggressively: sender, subject, and a 200-character snippet — never the full body. That's enough for over 90% accuracy on this task, and it keeps costs almost ignorable. The recipe's math: GPT-4o-mini runs about $0.15 per million input tokens, a snippet plus prompt is roughly 150 tokens, so 100 emails cost around $0.002. Drafting uses the pricier GPT-4o, but only on the URGENT and ACTION subset — typically under 20% of the inbox — so a heavy 200-message day still costs about a nickel. Cheap classification is what makes the whole pattern viable as a cron job running every fifteen minutes rather than a precious resource you ration. And for mail that can't leave your infrastructure, the recipe's privacy mode swaps in a local Ollama endpoint: Llama 3.1 classifies nearly as well as GPT-4o-mini on this task, though drafting quality drops unless you're running a 70B+ parameter model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument: rigid buckets lose information
&lt;/h2&gt;

&lt;p&gt;The pushback I hear: a fixed taxonomy throws away nuance — why not let the model return free-form tags, or scores along multiple axes? Honestly, sometimes that's right. If you're building analytics over a support inbox, richer structure (category plus urgency plus confidence, as the multi-day support patterns do) earns its complexity, since downstream consumers can aggregate it.&lt;/p&gt;

&lt;p&gt;But for an agent that has to &lt;em&gt;act&lt;/em&gt;, free-form output is a liability. Every distinct output the model can produce is a code path you have to handle, and "handle" means test. Four labels means four branches you can reason about, load-test, and audit. Forty emergent tags means a routing layer that's effectively another model call. The recipe's discipline — closed vocabulary, validated output, one action per label — is what makes the agent's behavior predictable enough to run unattended against a real mailbox. (Agent Accounts, where you'd give that agent its own inbox to triage, are in beta — taxonomy patterns apply to any mailbox.)&lt;/p&gt;

&lt;p&gt;One refinement worth planning for: taxonomies are per-inbox, not universal. The recipe notes that engineering inboxes hit URGENT differently than sales inboxes, so the category definitions — not the category count — are where you customize.&lt;/p&gt;

&lt;p&gt;Here's an exercise that takes twenty minutes: pull the last 50 messages from whatever inbox your agent will manage and hand-label them with the four buckets above. Wherever you hesitate, write down why — those hesitations are your category boundaries telling you they need sharper example lists. What labels did your inbox force you to add, and what did they map to in terms of action?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>email</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Designing Agent Email Addresses That Humans Trust</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:40:13 +0000</pubDate>
      <link>https://dev.to/qasim157/designing-agent-email-addresses-that-humans-trust-5e4a</link>
      <guid>https://dev.to/qasim157/designing-agent-email-addresses-that-humans-trust-5e4a</guid>
      <description>&lt;p&gt;You've built the agent, the reply loop works, the demo lands — and then someone asks the question you didn't budget time for: "so what address does it send from?" Suddenly you're staring at &lt;code&gt;noreply-svc-prod2@yourcompany.com&lt;/code&gt; realizing that the first thing every recipient sees isn't your prompt engineering. It's the From line.&lt;/p&gt;

&lt;p&gt;An agent's email address is an interface. Humans parse it before the subject, mail servers judge it before the body, and spam filters score it before any human sees it at all. Once your agent has a real mailbox — which is what Nylas Agent Accounts (currently in beta) provide — address design becomes a product decision with three layers: the local part, the domain, and the disclosure question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local part: role beats persona beats hash
&lt;/h2&gt;

&lt;p&gt;Take three candidate addresses for the same scheduling agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;scheduling@&lt;/code&gt; — a role. Tells recipients what the mailbox does and implies that emailing it is how you use the service.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jane.ai@&lt;/code&gt; — a persona. Friendlier in a sidebar avatar, but it invites recipients to treat the sender as a colleague, with all the expectations that carries.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bot-7f3a@&lt;/code&gt; — an artifact. Screams "auto-generated," gets mentally filed next to spam, and gives a human nothing to anchor on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The docs consistently model the first pattern — &lt;code&gt;sales-agent@&lt;/code&gt;, &lt;code&gt;support@&lt;/code&gt;, &lt;code&gt;scheduling@&lt;/code&gt; appear throughout the &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/" rel="noopener noreferrer"&gt;Agent Accounts overview&lt;/a&gt; — and I think that's right for a reason deeper than convention. A role address makes an honest promise about capability: &lt;code&gt;scheduling@&lt;/code&gt; claims it can schedule, nothing more. A persona address makes an implicit promise of general competence that current agents can't keep. When &lt;code&gt;jane.ai@&lt;/code&gt; fails to understand a simple request, it reads as a person being obtuse; when &lt;code&gt;scheduling@&lt;/code&gt; fails, it reads as a tool hitting its limits. Same failure, different trust damage.&lt;/p&gt;

&lt;p&gt;The overview's framing is worth internalizing: an agent identity should be like any other user in your organization — reachable, persistent, accountable. Address design is how that accountability becomes visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claiming the address is one API call
&lt;/h2&gt;

&lt;p&gt;Once a domain is registered and verified, the address itself costs nothing but the decision. Creating the account with your chosen alias is a single request — &lt;code&gt;"provider": "nylas"&lt;/code&gt;, no OAuth, no refresh token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/connect/custom"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
    "provider": "nylas",
    "settings": {
      "email": "scheduling@agents.yourcompany.com"
    }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response carries a &lt;code&gt;grant_id&lt;/code&gt; that drives the mailbox from then on. The same creation works from the CLI — &lt;code&gt;nylas agent account create scheduling@agents.yourcompany.com&lt;/code&gt; — or from the Dashboard, where you pick a registered domain and an alias and the account is live immediately.&lt;/p&gt;

&lt;p&gt;That cheapness cuts both ways. It means you can prototype three naming patterns in an afternoon and reply-test them on real recipients. It also means nothing stops you from minting &lt;code&gt;bot-7f3a@&lt;/code&gt; because a UUID was lying around. Remember that the address is the one part of the system whose history you can't transplant: months of threads, recipients' contact lists, allowlists in customer systems, accumulated sender reputation. Provisioning is disposable; identity isn't. Choose the local part like you'll keep it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The domain: a subdomain is cheap insurance
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/provisioning/" rel="noopener noreferrer"&gt;provisioning docs&lt;/a&gt; recommend a dedicated subdomain for production — &lt;code&gt;agents.yourcompany.com&lt;/code&gt; — and the reasoning is reputation isolation: sender reputation accrues per domain, so your agents' sending behavior never contaminates the domain your humans and your marketing depend on. If an experiment misbehaves, the blast radius is the subdomain.&lt;/p&gt;

&lt;p&gt;Registering a domain is a one-time event per organization: add it from the Dashboard or API, pick the data center region (US or EU), publish the MX and TXT records, and verification happens automatically once DNS propagates. After that, you mint addresses under it freely. A few patterns from the docs scale this up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-customer domains&lt;/strong&gt; in multi-tenant apps: &lt;code&gt;scheduling@customer-a.com&lt;/code&gt;, &lt;code&gt;scheduling@customer-b.com&lt;/code&gt;, each on the customer's own verified domain with its own send quota — 200 messages per account per day on the free plan — and its own sender reputation. One application can manage an unlimited number of registered domains, so this doesn't fragment your infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment separation&lt;/strong&gt;: &lt;code&gt;agents.staging.yourcompany.com&lt;/code&gt; vs &lt;code&gt;agents.yourcompany.com&lt;/code&gt;, so test traffic never touches production reputation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reputation sharding&lt;/strong&gt;: high-volume senders split across &lt;code&gt;sales-a.yourcompany.com&lt;/code&gt;, &lt;code&gt;sales-b.yourcompany.com&lt;/code&gt; so one domain's deliverability issue doesn't spread.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One caveat on subdomains: don't let isolation drift into obfuscation. &lt;code&gt;agents.yourcompany.com&lt;/code&gt; signals "automated, but officially ours." A lookalike domain that hides the relationship signals something much worse.&lt;/p&gt;

&lt;p&gt;(For prototyping, trial addresses use the format &lt;code&gt;alias@&amp;lt;your-application&amp;gt;.nylas.email&lt;/code&gt; — fine for testing, but the format itself tells recipients nothing about you, which is one more reason to move to your own domain before anything customer-facing.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The disclosure question
&lt;/h2&gt;

&lt;p&gt;Should the address admit the sender is an agent? My position: the &lt;em&gt;system&lt;/em&gt; should disclose, and the address is the cheapest place to do it. Role addresses on an agents subdomain disclose structurally — no recipient who glances at &lt;code&gt;scheduling@agents.acme.com&lt;/code&gt; will mistake it for a person, and nobody feels deceived later.&lt;/p&gt;

&lt;p&gt;The counterargument deserves a fair hearing: persona addresses measurably improve open and reply rates in outreach, and plenty of teams ship &lt;code&gt;alex@&lt;/code&gt; agents for exactly that reason. Short-term, they're probably right. But that uplift is borrowed against the moment a recipient realizes they've been having a heartfelt back-and-forth with a language model that signed off as Alex. The reply-rate gain is yours; the trust debt lands on your whole domain — the same domain all your other mail comes from. That trade looks worse every quarter as recipients get better at spotting synthetic correspondence.&lt;/p&gt;

&lt;p&gt;There's a middle path that keeps most of the warmth: role address, human-adjacent display name. &lt;code&gt;"Sam (Acme Scheduling Assistant)" &amp;lt;scheduling@agents.acme.com&amp;gt;&lt;/code&gt; reads friendly in the inbox list and honest on inspection.&lt;/p&gt;

&lt;p&gt;Before your next agent ships, write down its From line — display name, local part, domain — and show it to someone outside the team with one question: "who do you think sent this, and what happens if you reply?" If their answer doesn't match your architecture, fix the address before you fix the prompt. Which pattern have you shipped, and did recipients ever call it out?&lt;/p&gt;

</description>
      <category>ux</category>
      <category>email</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>From Chatbot to Mailbox: Persistent Agent Memory in Threads</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:39:59 +0000</pubDate>
      <link>https://dev.to/qasim157/from-chatbot-to-mailbox-persistent-agent-memory-in-threads-4ce0</link>
      <guid>https://dev.to/qasim157/from-chatbot-to-mailbox-persistent-agent-memory-in-threads-4ce0</guid>
      <description>&lt;p&gt;Day 1, 4:02 p.m.: a customer asks your agent a billing question and gets an answer. Day 6, 9:30 a.m.: they reply "actually, that didn't work." If your agent lives in a chat widget, that second message starts from zero — the session died with the tab, the context is gone, and the customer gets to repeat themselves. If your agent lives in a mailbox, the reply arrives &lt;em&gt;inside the conversation&lt;/em&gt;, with the full history attached by the protocol itself.&lt;/p&gt;

&lt;p&gt;That's the argument in one before/after: chat sessions evaporate; email threads persist. And for agents that work across days rather than minutes, the thread is the most underrated memory substrate available.&lt;/p&gt;

&lt;h2&gt;
  
  
  The protocol already built your memory layer
&lt;/h2&gt;

&lt;p&gt;Email threading runs on three headers, as the &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/email-threading/" rel="noopener noreferrer"&gt;threading docs&lt;/a&gt; lay out. Every message carries a globally unique &lt;code&gt;Message-ID&lt;/code&gt;. A reply adds &lt;code&gt;In-Reply-To&lt;/code&gt; (the ID it's answering) and &lt;code&gt;References&lt;/code&gt; (the full chain of IDs, oldest to newest). By the time a thread is five messages deep, &lt;code&gt;References&lt;/code&gt; holds five Message-IDs in order — a complete, tamper-evident record of the conversation's shape, maintained by every mail client on earth.&lt;/p&gt;

&lt;p&gt;Compare that to what we hand-roll for chatbots: session stores, conversation tables, context windows we serialize and rehydrate. Email gives you the equivalent for free, federated across organizations, and — this is the part I find most compelling — &lt;em&gt;human-auditable&lt;/em&gt;. Anyone with mailbox access can read exactly what the agent's memory contains, because the memory is the correspondence itself. No vector store inspection tools required.&lt;/p&gt;

&lt;p&gt;With Nylas Agent Accounts (in beta), the agent owns the mailbox where this accrues, and you never parse headers by hand. The Threads API groups messages by their header chain; each thread object gives you ordered &lt;code&gt;message_ids&lt;/code&gt;, &lt;code&gt;participants&lt;/code&gt;, and activity timestamps. When a reply fires &lt;code&gt;message.created&lt;/code&gt;, the payload includes a &lt;code&gt;thread_id&lt;/code&gt; — fetch the thread, walk its messages, and the agent has its full conversational past before deciding anything. Tip from the docs: &lt;code&gt;fields=include_basic_headers&lt;/code&gt; fetches just the three threading headers when you need them raw, skipping a header payload that's often larger than the message body.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't reconstruct memory from subject lines
&lt;/h2&gt;

&lt;p&gt;One tempting shortcut deserves a warning. Plenty of implementations match replies by subject: if it starts with &lt;code&gt;Re:&lt;/code&gt; and contains the original subject, it must be a reply. The &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/email-threading/" rel="noopener noreferrer"&gt;threading docs&lt;/a&gt; list exactly how that breaks. Recipients edit subjects — "Q3 budget review" comes back as "Re: Q3 budget review — updated numbers attached." Two prospects receive the same "Following up on your demo request," and a reply to either matches both. A forwarded thread keeps its subject while losing its conversational context entirely. Headers reference specific Message-IDs, not human-editable text; match on them first, and treat subject matching as a last-resort fallback for ancient mail clients.&lt;/p&gt;

&lt;p&gt;The write side is symmetric and just as automatic: pass &lt;code&gt;reply_to_message_id&lt;/code&gt; on the send and Nylas populates &lt;code&gt;In-Reply-To&lt;/code&gt; and &lt;code&gt;References&lt;/code&gt; for you, so the reply threads correctly in every recipient's client. Better still, the memory works across access paths. If the agent sends through the API and a human supervisor later replies from Apple Mail over IMAP, everything stays in one thread, because grouping follows the header chain rather than the send mechanism. One transcript, multiple writers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Threads remember what was said — not what you were doing
&lt;/h2&gt;

&lt;p&gt;Now the honest limitation, which the docs are upfront about: the thread is episodic memory, not working memory. It knows the words exchanged. It doesn't know which task the agent was on, which workflow step, which ticket. That mapping lives in your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// On outbound: bind the thread to internal state.&lt;/span&gt;
&lt;span class="nx"&gt;threadState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sentMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;threadId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;taskId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;awaiting_reply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// On inbound webhook: restore context, or treat as new.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;threadState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inboundMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;threadId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;resumeTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;taskId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;inboundMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;triageNewMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inboundMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production that map belongs in Postgres or Redis, not memory — conversations span days, and an in-memory map doesn't survive a deploy. So the architecture is two layers: the thread holds the durable transcript, your store holds a thin pointer from &lt;code&gt;thread_id&lt;/code&gt; to agent state. The heavy content lives in the mailbox; you persist only the index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dormancy is a feature you have to design for
&lt;/h2&gt;

&lt;p&gt;Persistence cuts both ways: threads come back from the dead. The &lt;a href="https://developer.nylas.com/docs/cookbook/use-cases/act/support-agent-multi-day-threads/" rel="noopener noreferrer"&gt;multi-day support agent recipe&lt;/a&gt; treats revival as a first-class case with concrete policies worth stealing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reclassify on every reply.&lt;/strong&gt; A thread that opened as a "general" question can become a billing dispute by message two. The recipe re-runs classification on the full transcript each turn and only auto-replies above a 0.85 confidence threshold.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cap the loop.&lt;/strong&gt; After 6 turns, escalate to a human — an agent still going back and forth at turn seven isn't converging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat long silence as a state change.&lt;/strong&gt; If a thread has been quiet for more than 168 hours and the customer suddenly returns, the recipe escalates rather than letting the agent resume as if nothing happened. Context that old deserves human eyes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch the escalation rate.&lt;/strong&gt; The recipe's operational rule: if more than 40–50% of tickets end up with a human, the agent isn't pulling its weight — tune the knowledge base or narrow the categories it handles rather than lowering the confidence bar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one captures the design mindset: a chatbot architecture asks "is the session alive?" A mailbox architecture asks "what does this silence mean?" — a genuinely richer question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the chat people have a point
&lt;/h2&gt;

&lt;p&gt;The fair counterargument: email is slow and threads are noisy. Latency is measured in minutes to days, quoted text and signatures pollute the transcript you feed the model, and a CC'd third party can wander into the "memory" mid-conversation. For interactive flows — debugging a config live, navigating a UI — chat's immediacy wins, and nothing here argues otherwise.&lt;/p&gt;

&lt;p&gt;But most agent work that matters commercially isn't interactive. Support, scheduling, procurement, follow-ups — these are inherently asynchronous, multi-day processes, and forcing them into session-shaped memory is why so many "AI assistants" forget you between Tuesday and Friday. Match the memory model to the conversation's natural tempo.&lt;/p&gt;

&lt;p&gt;A concrete way to test the idea: take one workflow where your agent currently loses context between sessions, give it a mailbox, and store nothing yourself except the &lt;code&gt;thread_id&lt;/code&gt; → state mapping. Run it for two weeks. My bet is the surprising part won't be the persistence — it'll be how much easier debugging becomes when you can read your agent's memory in an email client.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>email</category>
      <category>agents</category>
    </item>
    <item>
      <title>Scheduling Without a Human Calendar</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:39:56 +0000</pubDate>
      <link>https://dev.to/qasim157/scheduling-without-a-human-calendar-g6k</link>
      <guid>https://dev.to/qasim157/scheduling-without-a-human-calendar-g6k</guid>
      <description>&lt;p&gt;A human assistant borrows the boss's calendar; a scheduling bot owns its own. That one difference dissolves most of what makes calendar automation miserable.&lt;/p&gt;

&lt;p&gt;The borrowed-calendar model is how nearly every scheduling tool works today: connect to a person's Google or Microsoft account via OAuth, request calendar scopes, and act on their behalf. It works, but the seams show everywhere. The human's calendar fills with bookings the bot manages. Delegation permissions vary by provider and admin policy. Tokens expire when the person changes their password or leaves the company. And the bot has no address of its own — every invite, every confirmation email, appears to come from a person who didn't write it.&lt;/p&gt;

&lt;p&gt;Nylas Agent Accounts (currently in beta) invert this. Each account is a real mailbox &lt;em&gt;and&lt;/em&gt; &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/calendars/" rel="noopener noreferrer"&gt;a real calendar&lt;/a&gt;, provisioned automatically, owned by your application. From a participant's perspective there's nothing special about it — it's just another attendee on the invite.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "owning availability" actually changes
&lt;/h2&gt;

&lt;p&gt;When the bot's calendar is its own, availability stops being a permissions question and becomes a query. The agent calls the free/busy endpoint against its own primary calendar, gets back busy blocks for a time window, and proposes open slots. No delegation, no scopes negotiation, no "the admin needs to approve calendar sharing for service accounts."&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://developer.nylas.com/docs/cookbook/use-cases/act/scheduling-agent-with-dedicated-identity/" rel="noopener noreferrer"&gt;scheduling-agent tutorial&lt;/a&gt; wires the whole loop: a meeting request lands at &lt;code&gt;scheduling@agents.yourcompany.com&lt;/code&gt;, a &lt;code&gt;message.created&lt;/code&gt; webhook fires, an LLM parses duration and timezone, the agent checks its own free/busy and replies with 3 candidate slots. When the human picks one, the agent creates the event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/grants/&amp;lt;GRANT_ID&amp;gt;/events?calendar_id=primary&amp;amp;notify_participants=true"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
    "title": "Product demo",
    "when": { "start_time": 1744387200, "end_time": 1744390800 },
    "participants": [{ "email": "alice@example.com" }]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;notify_participants=true&lt;/code&gt;, an invitation goes out from the agent's own address as a standard ICS request. Google Calendar, Microsoft 365, and Apple Calendar all render it as a normal invite, because under the hood it &lt;em&gt;is&lt;/em&gt; one — plain iCalendar, the same protocol humans have used for decades.&lt;/p&gt;

&lt;h2&gt;
  
  
  RSVPs flow back without polling
&lt;/h2&gt;

&lt;p&gt;The return path is the part borrowed-calendar setups handle worst, and here it's automatic. When Alice clicks &lt;strong&gt;Yes&lt;/strong&gt; in Gmail, Google sends the response to the agent's mailbox, the platform parses it, updates the event's &lt;code&gt;participants[].status&lt;/code&gt;, and fires an &lt;code&gt;event.updated&lt;/code&gt; webhook. The agent learns who accepted without reading a single email.&lt;/p&gt;

&lt;p&gt;The same machinery runs in reverse when the agent is the invitee. Someone adds the agent's address to their meeting; the invitation hits the mailbox, a matching event appears on the agent's calendar with its status as &lt;code&gt;noreply&lt;/code&gt;, and &lt;code&gt;event.created&lt;/code&gt; fires. The agent responds through the &lt;code&gt;send-rsvp&lt;/code&gt; endpoint with one of 3 statuses — &lt;code&gt;yes&lt;/code&gt;, &lt;code&gt;no&lt;/code&gt;, or &lt;code&gt;maybe&lt;/code&gt; — and the organizer sees it accepted like any other attendee. A plain reply email won't update anyone's calendar; the endpoint exists because RSVP is calendar protocol, not prose.&lt;/p&gt;

&lt;p&gt;Changes propagate the same way: &lt;code&gt;PUT /events/{id}&lt;/code&gt; updates the time or title on every participant's calendar, wherever they're looking at it, and &lt;code&gt;DELETE /events/{id}&lt;/code&gt; removes it everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The small print that saves you a debugging session
&lt;/h2&gt;

&lt;p&gt;A few mechanics from the &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/calendars/" rel="noopener noreferrer"&gt;calendar docs&lt;/a&gt; that aren't obvious until they bite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Two webhooks fire for one invite.&lt;/strong&gt; Because the agent is always a mailbox too, an inbound invitation triggers &lt;code&gt;event.created&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; &lt;code&gt;message.created&lt;/code&gt; for the invite email itself. Pick one to drive your logic and deliberately ignore the other, or you'll process every meeting twice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;notify_participants&lt;/code&gt; is a sharp tool.&lt;/strong&gt; Every create, update, and delete with &lt;code&gt;notify_participants=true&lt;/code&gt; sends real email. Pass &lt;code&gt;false&lt;/code&gt; for bulk backfill or pre-staged events the agent will announce later — but not when cancelling, because deleting an event without notification leaves the meeting sitting on everyone's calendar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The agent has no time zone.&lt;/strong&gt; A human calendar carries a default; an Agent Account doesn't. Pass &lt;code&gt;timezone&lt;/code&gt; explicitly on create, or stick to epoch &lt;code&gt;start_time&lt;/code&gt;/&lt;code&gt;end_time&lt;/code&gt; values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One agent can run several calendars.&lt;/strong&gt; The primary is provisioned automatically and can't be deleted while other calendars exist, and you can create more up to your plan's cap — a &lt;code&gt;sales-calls&lt;/code&gt; calendar and an &lt;code&gt;internal&lt;/code&gt; calendar on the same agent keeps concerns separate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The honest limits
&lt;/h2&gt;

&lt;p&gt;Two things the dedicated-identity model doesn't fix, and one operational note.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Negotiation isn't an endpoint.&lt;/strong&gt; Counter-proposing a time isn't first-class today — the documented pattern is to RSVP &lt;code&gt;no&lt;/code&gt; or &lt;code&gt;maybe&lt;/code&gt; and reply with an alternative by email, letting the organizer recreate the event. If your flow is negotiation-heavy (propose slots to many people, collect picks, book the winner), the docs point you at Scheduler, which is built for round-trips and works with Agent Accounts. The Events API shines when the agent already knows the time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parsing is still the hard part.&lt;/strong&gt; Owning the calendar removes the permissions problem, not the language problem. "Sometime late next week, ideally afternoon, I'm in Lisbon" still needs an LLM, and the tutorial is candid that wrong intent extraction creates real calendar chaos — it recommends human confirmation for first-time senders and high-value meetings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quotas apply.&lt;/strong&gt; Every proposal and confirmation email counts against the account's send cap — 200 messages per account per day on the free plan — and for a busy agent it's worth setting an explicit policy before launch rather than discovering the ceiling in production.&lt;/p&gt;

&lt;p&gt;The counterargument I'd actually respect: if your bot schedules &lt;em&gt;on behalf of a specific person&lt;/em&gt; — managing a real executive's actual calendar — then borrowing that calendar is correct, because the availability that matters is theirs. Dedicated identity wins when the bot is the service itself: demo bookings, interview coordination, support callbacks. Pick the model that matches whose time is being scheduled.&lt;/p&gt;

&lt;p&gt;If you've got an Agent Account already, here's a 15-minute experiment: create an event with yourself as participant and &lt;code&gt;notify_participants=true&lt;/code&gt;, accept it from your phone, and watch the &lt;code&gt;event.updated&lt;/code&gt; webhook arrive with your status change. Once you've seen the loop close end to end, the architecture mostly designs itself.&lt;/p&gt;

</description>
      <category>calendar</category>
      <category>ai</category>
      <category>agents</category>
      <category>api</category>
    </item>
    <item>
      <title>Mailboxes as Cattle: Ephemeral Email Infrastructure</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:39:53 +0000</pubDate>
      <link>https://dev.to/qasim157/mailboxes-as-cattle-ephemeral-email-infrastructure-4f3k</link>
      <guid>https://dev.to/qasim157/mailboxes-as-cattle-ephemeral-email-infrastructure-4f3k</guid>
      <description>&lt;p&gt;When was the last time you deleted an email account on purpose?&lt;/p&gt;

&lt;p&gt;For most teams the answer is never, and that tells you something. We treat mailboxes the way we treated servers in 2008: hand-built, carefully named, kept alive indefinitely because recreating one is painful. They're pets. Meanwhile every other piece of our infrastructure — compute, queues, databases — became cattle: numbered, provisioned by code, destroyed without sentiment when the job ends.&lt;/p&gt;

&lt;p&gt;Email is finally catching up. With Nylas Agent Accounts (in beta), a mailbox is created with one call and destroyed with another, and that symmetry is the whole point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The full lifecycle in two commands
&lt;/h2&gt;

&lt;p&gt;Provisioning, from the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas agent account create signup-agent@agents.yourdomain.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via &lt;code&gt;POST /v3/connect/custom&lt;/code&gt; with &lt;code&gt;"provider": "nylas"&lt;/code&gt; — no OAuth, no refresh token, just an address on a domain you've registered. Teardown is equally unceremonious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas agent account delete signup-agent@agents.yourdomain.com &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(The API equivalent is a &lt;code&gt;DELETE&lt;/code&gt; on the grant.) The &lt;a href="https://developer.nylas.com/docs/cookbook/agent-accounts/sign-up-for-a-service/" rel="noopener noreferrer"&gt;signup automation recipe&lt;/a&gt; treats this as a loop: provision a fresh inbox, point a third-party signup form at it, catch the verification email through a &lt;code&gt;message.created&lt;/code&gt; webhook, follow the confirmation link, delete the grant. No human inbox involved at any step, and nothing left behind.&lt;/p&gt;

&lt;p&gt;The middle of that loop is about twenty lines of webhook handler, and the recipe's version filters hard before acting — right grant, right sender, right URL shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;grant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;grant_id&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;AGENT_GRANT_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@saas-you-care-about.example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="sr"&gt;/https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;saas-you-care-about&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;example&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;com&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;confirm&lt;/span&gt;&lt;span class="se"&gt;\?&lt;/span&gt;&lt;span class="sr"&gt;token=&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nylas fires &lt;code&gt;message.created&lt;/code&gt; within a second or two of mail arriving, so the whole signup round-trip typically finishes before a human would have found the email.&lt;/p&gt;

&lt;p&gt;The one-time pet in this story is the &lt;em&gt;domain&lt;/em&gt;, not the mailbox. You register a domain once per organization — picking a US or EU data center region — publish the MX and TXT records, and then mint as many addresses under it as your plan allows. Trial &lt;code&gt;*.nylas.email&lt;/code&gt; subdomains skip even that for prototyping. See &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/provisioning/" rel="noopener noreferrer"&gt;provisioning and domains&lt;/a&gt; for the flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What cattle-class mailboxes are good for
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Per-run test inboxes.&lt;/strong&gt; E2E tests that assert on real email delivery have always been awkward — shared test accounts accumulate state, and yesterday's run pollutes today's assertions. A fresh mailbox per CI run resets the world. The signup recipe's advice applies directly: if you're running a large test matrix, provision multiple grants rather than reusing one, because quotas are per account — 200 messages per account per day on the free plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-workflow identities.&lt;/strong&gt; A research agent that needs a developer account on a data source, a QA bot that registers for a SaaS every run, a purchasing agent that needs a marketplace profile. Each gets an address, does its job, and gets reaped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-environment separation.&lt;/strong&gt; The provisioning docs describe running &lt;code&gt;agents.staging.yourcompany.com&lt;/code&gt; and &lt;code&gt;agents.yourcompany.com&lt;/code&gt; in the same application, so staging traffic never touches the production domain's reputation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-customer pools.&lt;/strong&gt; Sender-reputation isolation by splitting volume across &lt;code&gt;sales-a.yourcompany.com&lt;/code&gt;, &lt;code&gt;sales-b.yourcompany.com&lt;/code&gt;, and so on — a deliverability problem on one domain doesn't contaminate the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cattle still need fences
&lt;/h2&gt;

&lt;p&gt;Disposable doesn't mean unmanaged — arguably the opposite. Two practices from the docs are worth treating as non-negotiable.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;reap your herd&lt;/strong&gt;. The recipe is blunt: don't ship per-run agents without teardown, because inactive grants accumulate. Delete on completion &lt;em&gt;or failure&lt;/em&gt; — the failure path is where orphans come from. If your CI provisions a mailbox, the cleanup step belongs in a &lt;code&gt;finally&lt;/code&gt; block, not at the happy-path end.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;fence the inbox&lt;/strong&gt;. A disposable address that leaks into a spam list keeps receiving junk until you delete it. Pair an allowlist of expected sender domains with a &lt;code&gt;block&lt;/code&gt; rule for everything else, so the inbox only accepts mail from the service you're actually testing against. And don't trust the first message that arrives — some services send a "Welcome" email before the verification email, so match the sender &lt;em&gt;and&lt;/em&gt; the expected URL pattern before your code clicks anything.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;fence the herd, not the animals&lt;/strong&gt;. Per-account configuration defeats the point of cattle. Workspaces fix that: pass a &lt;code&gt;workspace_id&lt;/code&gt; when you create the grant — or let &lt;code&gt;auto_group&lt;/code&gt; place accounts into a workspace by matching their domain — and attach a policy there once. Every disposable mailbox inherits the same send quota, retention window, and block rules with zero per-account setup, and moving an account later is a single &lt;code&gt;PATCH&lt;/code&gt; on the grant.&lt;/p&gt;

&lt;p&gt;One more option if humans occasionally need to peek inside a longer-lived account: set an &lt;code&gt;app_password&lt;/code&gt; at creation (18–40 printable ASCII characters, with an uppercase letter, a lowercase letter, and a digit) and connect a normal IMAP client. Nylas stores it as a bcrypt hash, so you can't retrieve it later — only reset it. Skip it for true cattle; protocol access stays disabled by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument: identity is the one thing you can't dispose
&lt;/h2&gt;

&lt;p&gt;Here's where the metaphor honestly breaks down. A container's identity doesn't matter; an email address's identity &lt;em&gt;is&lt;/em&gt; the product in some workflows. An address that customers know, that's whitelisted in their systems, that has months of threading history — that's a pet, and it should be. Reputation also accrues over time: a brand-new address sending cold outreach behaves differently from an established one.&lt;/p&gt;

&lt;p&gt;So the pets-vs-cattle question for email isn't "which model is right" but "which mailboxes are which." My rule of thumb: if a human would notice the address changing, it's a pet. If only your code ever reads it, it's cattle — and keeping cattle alive out of habit is just unmanaged inventory.&lt;/p&gt;

&lt;p&gt;Audit your own herd this week: list every automation-owned mailbox your team has, and ask of each one, "if this were deleted tonight, what breaks?" If the answer is "nothing," you've found your first candidate for a teardown script. How many pets are you feeding that should've been cattle?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>email</category>
      <category>architecture</category>
      <category>api</category>
    </item>
    <item>
      <title>The 5-Minute Mailbox</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:38:34 +0000</pubDate>
      <link>https://dev.to/qasim157/the-5-minute-mailbox-3ik</link>
      <guid>https://dev.to/qasim157/the-5-minute-mailbox-3ik</guid>
      <description>&lt;p&gt;The email mailbox just became an API resource, and that matters far more than the setup time it saves.&lt;/p&gt;

&lt;p&gt;For most of software history, a real email address — one that sends, receives, and threads — was an artifact of IT process. Someone created it in an admin console, someone else configured the client, and your application got access through OAuth consent screens and refresh tokens borrowed from a human. Compare that to how you get a database, a queue, or a TLS cert today: one API call, one ID back, done.&lt;/p&gt;

&lt;p&gt;Nylas Agent Accounts (currently in beta) close that gap. The &lt;a href="https://developer.nylas.com/docs/v3/getting-started/agent-accounts/" rel="noopener noreferrer"&gt;quickstart&lt;/a&gt; goes from API key to a sending-and-receiving mailbox in under 5 minutes, and the provisioning step is a single request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/connect/custom"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
    "provider": "nylas",
    "settings": {
      "email": "test@your-application.nylas.email"
    }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No refresh token, no OAuth dance — unlike OAuth providers, the &lt;code&gt;"nylas"&lt;/code&gt; provider needs only an email address on a registered domain. The response contains a &lt;code&gt;grant_id&lt;/code&gt;, and that one ID drives everything else: messages, drafts, threads, folders, attachments, calendar, webhooks. There are actually three ways to create an account — this API call, the Dashboard, or a single CLI command (&lt;code&gt;nylas agent account create&lt;/code&gt;) — but they all end at the same place: a live mailbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 5 minutes is a threshold, not a convenience
&lt;/h2&gt;

&lt;p&gt;Provisioning time isn't a linear cost. There's a threshold below which a resource changes category — from "thing you request" to "thing your code creates." Virtual machines crossed it with cloud APIs and we got autoscaling. TLS certs crossed it with ACME and we got HTTPS-by-default. Mailboxes crossing it means email addresses stop being scarce, pre-planned identities and start being something a program allocates when it needs one.&lt;/p&gt;

&lt;p&gt;What does that look like in practice?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System mailboxes without ceremony.&lt;/strong&gt; A &lt;code&gt;support@&lt;/code&gt; or &lt;code&gt;scheduling@&lt;/code&gt; address your app owns end-to-end — no consent screen, no integration that breaks when an employee offboards. The docs' framing is exact: a mailbox your application owns, not borrows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ephemeral inboxes.&lt;/strong&gt; Spin up an address for a test run, a workflow, or a single customer interaction, then delete the grant. A concrete version: your CI pipeline creates &lt;code&gt;e2e-run-4821@yourapp.nylas.email&lt;/code&gt;, your signup flow sends its verification email there, the test asserts on the real delivery — not a mock — and teardown is one delete on the grant. When creation costs one HTTP call, teardown becomes a reasonable habit instead of a cleanup chore you defer forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity for agents.&lt;/strong&gt; An AI agent with its own address can send, receive replies in-thread, and even RSVP to calendar invites — every account ships with a primary calendar that speaks standard iCalendar, so Google Calendar, Microsoft 365, and Apple Calendar treat it as a normal participant. The &lt;code&gt;send-rsvp&lt;/code&gt; endpoint means the agent's "yes" shows up like anyone else's.&lt;/p&gt;

&lt;h2&gt;
  
  
  The receive side is the half that's actually new
&lt;/h2&gt;

&lt;p&gt;Plenty of services let you send email programmatically in minutes. The unusual part here is that the mailbox &lt;em&gt;receives&lt;/em&gt;, and inbound mail fires the standard &lt;code&gt;message.created&lt;/code&gt; webhook — identical in shape to the same event for a Gmail or Outlook grant. Register a webhook once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas webhook create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://yourapp.example.com/webhooks/nylas &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--triggers&lt;/span&gt; message.created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and replies land in your handler seconds after they arrive. If you'd rather not run webhook infrastructure yet, polling &lt;code&gt;GET /v3/grants/{grant_id}/messages&lt;/code&gt; works too. Either way, the loop closes: your code can ask a question over email and react to the answer.&lt;/p&gt;

&lt;p&gt;The send half deserves a sentence as well. Outbound goes through the same &lt;code&gt;POST /v3/grants/{grant_id}/messages/send&lt;/code&gt; endpoint used for any connected grant, and the recipient sees a normal message from the agent's own address — no "sent via" branding, no relay footer. Outbound messages are capped at 40 MB total. The whole point of the design is that an Agent Account is just another grant: the &lt;code&gt;message.created&lt;/code&gt; payload is identical in shape to the one a Gmail or Outlook grant produces, and you branch on the grant's &lt;code&gt;provider&lt;/code&gt; field (&lt;code&gt;"nylas"&lt;/code&gt;) only if you need to tell them apart. Everything you've already built — pagination, attachment downloads, thread fetches — works unchanged.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest caveats
&lt;/h2&gt;

&lt;p&gt;Instant shouldn't mean careless, and there are two qualifiers worth stating plainly.&lt;/p&gt;

&lt;p&gt;First, the 5-minute path uses a &lt;code&gt;*.nylas.email&lt;/code&gt; trial subdomain — instant because there's no DNS involved. For production you'll want your own domain, which means publishing two kinds of DNS records (MX for inbound routing, TXT for ownership and SPF/DKIM) and waiting for propagation before verification completes. That's a one-time cost per domain, not per mailbox, but it's real. The docs recommend a dedicated subdomain like &lt;code&gt;agents.yourcompany.com&lt;/code&gt; so your agents' sender reputation stays isolated from your primary domain — see &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/provisioning/" rel="noopener noreferrer"&gt;provisioning and domains&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Second, cheap creation makes governance &lt;em&gt;more&lt;/em&gt; important, not less. The defaults are sane — the free plan caps sending at 200 messages per account per day, with 3 GB of storage per organization and 30-day inbox retention — but defaults aren't a strategy. Every account you create without an explicit &lt;code&gt;workspace_id&lt;/code&gt; lands in your application's default workspace, so attaching a policy and rules there governs all of your unassigned mailboxes in one move. The &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/policies-rules-lists/" rel="noopener noreferrer"&gt;policies and rules system&lt;/a&gt; exists for exactly that, and it's worth configuring before your mailbox count gets interesting.&lt;/p&gt;

&lt;p&gt;You could also argue most apps don't need a receiving mailbox at all — transactional send-only covers password resets fine. True. But the moment your product wants a conversation rather than a notification — support, scheduling, anything an agent does — the send-only model runs out, and historically the next step was a painful jump to "go provision accounts with an email admin." Now the next step is a POST request.&lt;/p&gt;

&lt;p&gt;Try the threshold test yourself: start a timer, follow the &lt;a href="https://developer.nylas.com/docs/v3/getting-started/agent-accounts/" rel="noopener noreferrer"&gt;quickstart&lt;/a&gt;, and stop when you've sent yourself an email from the new address and seen the reply come back through a webhook. If you beat 5 minutes, ask the more interesting question: what would you build if mailboxes were as disposable as containers?&lt;/p&gt;

</description>
      <category>email</category>
      <category>api</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
    <item>
      <title>Rate-Limit Your Own Agent Before Someone Else Does</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:38:17 +0000</pubDate>
      <link>https://dev.to/qasim157/rate-limit-your-own-agent-before-someone-else-does-33cb</link>
      <guid>https://dev.to/qasim157/rate-limit-your-own-agent-before-someone-else-does-33cb</guid>
      <description>&lt;p&gt;0.1%. That's the complaint rate that puts an email-sending account under review on Nylas Agent Accounts — one spam report per thousand sends. At 0.5%, sending is paused outright. For bounces, the review threshold is 5% and the pause kicks in at 10%. These aren't suggestions; they're enforced by the platform, and a pause &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/send-limits/" rel="noopener noreferrer"&gt;doesn't clear itself on a timer&lt;/a&gt; — you have to contact support with evidence of a fix.&lt;/p&gt;

&lt;p&gt;Here's my position: those numbers shouldn't be your rate limit. They should be your last line of defense, behind a stricter limit you set yourself. Rate-limit your own agent before someone else does it for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  An LLM loop has no natural stopping point
&lt;/h2&gt;

&lt;p&gt;Traditional email code sends when a human or a cron job tells it to. An autonomous agent sends when a model &lt;em&gt;decides&lt;/em&gt; to, and models inside feedback loops make weird decisions. A reply triggers a webhook, the webhook triggers a reply, and a benign bug becomes a thousand sends before lunch. Nothing in the model's reasoning says "this is my 400th message this hour, that seems off." That awareness has to live in infrastructure.&lt;/p&gt;

&lt;p&gt;Agent Accounts (in beta) bake the infrastructure in through &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/policies-rules-lists/" rel="noopener noreferrer"&gt;policies&lt;/a&gt;. A policy bundles daily send quotas, storage caps, retention windows, and spam settings, and applies to every account in a workspace. Without one, an account runs at your billing plan's maximums — 200 messages per account per day on the free plan — which is exactly what you don't want for an experiment that might loop. Every limit on a policy is optional; omit one and it defaults to the plan maximum, ask for more than the plan allows and the API rejects it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quotas are a statement about expected behavior
&lt;/h2&gt;

&lt;p&gt;The useful mental shift: a self-imposed quota isn't throttling, it's an assertion. "This support agent should never need more than 150 sends a day. If it asks for number 151, something upstream is wrong." That's the same logic as a circuit breaker in a service mesh — you're not limiting capacity, you're encoding an expectation so violations become visible instead of expensive.&lt;/p&gt;

&lt;p&gt;Policies let you encode different expectations per agent archetype. A prototype gets a tight quota; a production sales agent gets a higher one. The docs explicitly suggest separate workspaces per archetype, because a triage agent and an outreach agent have completely different send profiles.&lt;/p&gt;

&lt;p&gt;Outbound rules go a step further than volume — they constrain &lt;em&gt;direction&lt;/em&gt;. A rule with &lt;code&gt;trigger: "outbound"&lt;/code&gt; evaluates before the message reaches the provider, and a &lt;code&gt;block&lt;/code&gt; action rejects the send with a &lt;code&gt;403&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/rules"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "Block outbound to example.net",
    "trigger": "outbound",
    "match": {
      "conditions": [
        { "field": "recipient.domain", "operator": "is", "value": "example.net" }
      ]
    },
    "actions": [{ "type": "block" }]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;recipient.*&lt;/code&gt; fields match any recipient, including BCC and SMTP envelope recipients — so an agent can't smuggle a send past the rule by hiding the address. You can also match &lt;code&gt;outbound.type&lt;/code&gt; (&lt;code&gt;compose&lt;/code&gt; vs &lt;code&gt;reply&lt;/code&gt;) to, say, let an agent reply freely but block it from starting brand-new threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch the same telemetry the platform watches
&lt;/h2&gt;

&lt;p&gt;The bounce and complaint rates that trigger pauses are computed from events you can subscribe to: &lt;code&gt;message.transactional.delivered&lt;/code&gt;, &lt;code&gt;message.transactional.bounced&lt;/code&gt;, &lt;code&gt;message.transactional.complaint&lt;/code&gt;, and &lt;code&gt;message.transactional.rejected&lt;/code&gt; — four webhook triggers that are your only real-time window into those rates. The docs' advice is blunt: wire them up and pause your own outbound logic when bounces or complaints climb. You'll see the problem in your own telemetry before the platform tells you about it, and "we paused ourselves" is a much better incident report than "we got paused."&lt;/p&gt;

&lt;p&gt;It also helps to know what's actually being counted. Bounce rate only counts hard bounces — addresses that don't exist — divided by a recent representative send volume; soft bounces from full mailboxes or greylisting don't touch it, and healthy is under 2%. Complaint rate counts recipients clicking &lt;strong&gt;Mark this email as spam&lt;/strong&gt; or dragging your mail to junk, measured only across domains that send complaint feedback. That's why 0.1% is so easy to hit at low volume: a handful of annoyed recipients in a 2,000-send week puts the account under review.&lt;/p&gt;

&lt;p&gt;The error responses are worth knowing too. A reputation pause surfaces as a &lt;code&gt;400&lt;/code&gt; on send; a per-account or per-domain rate limit returns &lt;code&gt;429&lt;/code&gt; (back off and retry); an abuse restriction returns &lt;code&gt;403&lt;/code&gt; with &lt;code&gt;send blocked by abuse restriction&lt;/code&gt;. That last one can be scoped to a single sender address, a domain and its subdomains, a grant, or the entire application — and an application-level restriction stops every Agent Account under the app, not just the one that misbehaved. If your agent treats all send failures as retryable, it will hammer a paused account and learn nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Circuit breakers need receipts
&lt;/h2&gt;

&lt;p&gt;Two details make the rule layer trustworthy enough to bet on. First, evaluation &lt;strong&gt;fails closed&lt;/strong&gt;: if a &lt;code&gt;block&lt;/code&gt; rule can't be evaluated because of a transient infrastructure error — say, a list lookup failure during &lt;code&gt;in_list&lt;/code&gt; matching — Nylas blocks the message rather than letting it through. The failure is surfaced as retryable: an API send returns &lt;code&gt;503&lt;/code&gt; instead of &lt;code&gt;403&lt;/code&gt;, and inbound SMTP answers with a &lt;code&gt;451&lt;/code&gt; tempfail so the sending server retries instead of bouncing. A safety mechanism that silently disables itself under load isn't a safety mechanism.&lt;/p&gt;

&lt;p&gt;Second, every evaluation writes an audit record. &lt;code&gt;GET /v3/grants/{grant_id}/rule-evaluations&lt;/code&gt; lists, most recent first, which rules matched, what actions were applied, and the normalized sender and recipient data that was considered. When a block happened because evaluation errored rather than matched, the record carries &lt;code&gt;blocked_by_evaluation_error: true&lt;/code&gt;. So when your agent's send comes back &lt;code&gt;403&lt;/code&gt; at 2 a.m., "why was this blocked?" is one API call, not an archaeology project. A circuit breaker without observability is just a mystery outage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument: limits break legitimate bursts
&lt;/h2&gt;

&lt;p&gt;The honest objection is that real workloads spike. A support agent during an outage might legitimately need 5x its normal volume, and a hard quota turns your safety net into an availability incident. That's true — if the quota is a dead end.&lt;/p&gt;

&lt;p&gt;So don't make it a dead end. Make hitting the quota an escalation path: alert a human, queue the overflow, require an approval to raise the cap. The failure mode of a too-tight quota is a Slack ping and an hour of delayed email. The failure mode of no quota is a 10% bounce rate, a platform-level pause that requires a support ticket to lift, and a sender reputation you rebuild over weeks. Those aren't symmetric risks.&lt;/p&gt;

&lt;p&gt;There's also a softer dial worth knowing: policies expose &lt;code&gt;spam_sensitivity&lt;/code&gt; from 0.1 to 5.0 for inbound filtering. Inbound hygiene matters for outbound health, because agents that reply to junk generate complaints.&lt;/p&gt;

&lt;p&gt;Concrete next step: before your agent's next deploy, create one policy with a daily quota at roughly 2x the agent's observed peak, attach it to the workspace, and subscribe to the four &lt;code&gt;message.transactional.*&lt;/code&gt; triggers. Then deliberately make your agent hit the quota in staging and check that your alerting fires. If it doesn't, you've found the gap while it's still cheap.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>email</category>
      <category>security</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Common Pitfalls Building Email Agents (and Fixes)</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:18:04 +0000</pubDate>
      <link>https://dev.to/qasim157/common-pitfalls-building-email-agents-and-fixes-29kg</link>
      <guid>https://dev.to/qasim157/common-pitfalls-building-email-agents-and-fixes-29kg</guid>
      <description>&lt;p&gt;A team ships their first email agent on a Thursday. Demo went great, handler's deployed, webhook's registered. Friday morning the on-call wakes up to an inbox where the agent has been enthusiastically replying to its own replies all night, a customer who received the same answer three times, and a thread in Gmail that's shattered into five separate conversations. None of it was an exotic failure — every one of these is a known pitfall with a known fix, documented in the Nylas Agent Accounts cookbook (the product's in beta; the mistakes are timeless).&lt;/p&gt;

&lt;p&gt;Here are the nine I'd check before any launch.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The agent replies to itself
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;message.created&lt;/code&gt; webhook fires for &lt;em&gt;outbound&lt;/em&gt; messages too — when your agent sends a reply via the API, that sent message triggers the same event as inbound mail. Skip this check and you've built a perpetual motion machine: reply, webhook, reply.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; filter the agent's own address at the very top of the handler, before any other logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;AGENT_EMAIL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. No webhook deduplication
&lt;/h2&gt;

&lt;p&gt;Delivery is at-least-once. If your endpoint doesn't return &lt;code&gt;200&lt;/code&gt; fast enough, or the network hiccups, the same &lt;code&gt;message.created&lt;/code&gt; notification arrives again — and a naive handler replies twice. The &lt;a href="https://developer.nylas.com/docs/cookbook/agent-accounts/prevent-duplicate-replies/" rel="noopener noreferrer"&gt;dedup recipe&lt;/a&gt; calls this the most common source of duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; an atomic check-and-set on the message ID before processing — &lt;code&gt;INSERT ... ON CONFLICT DO NOTHING&lt;/code&gt; in Postgres, &lt;code&gt;SET id 1 NX EX 86400&lt;/code&gt; in Redis. Give records a TTL of 24–48 hours so late redeliveries still get caught without the table growing forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Dedup without locking
&lt;/h2&gt;

&lt;p&gt;Two concurrent workers (Lambda instances, ECS tasks) can race past the check-and-set in the same millisecond and both generate a reply. Dedup catches the same event delivered twice; it can't catch the same event &lt;em&gt;processed simultaneously&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; a per-thread lock with a 30-second TTL so a crashed worker self-releases — and a double-check inside the lock that inspects the thread's latest message and bails if the agent already replied. You need dedup &lt;em&gt;and&lt;/em&gt; locking; they cover different failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Trusting the webhook payload for the message body
&lt;/h2&gt;

&lt;p&gt;The webhook carries summary fields — &lt;code&gt;subject&lt;/code&gt;, &lt;code&gt;from&lt;/code&gt;, &lt;code&gt;snippet&lt;/code&gt; — not the full body. Worse, if a body exceeds roughly 1 MB, the event type becomes &lt;code&gt;message.created.truncated&lt;/code&gt; and the body is omitted entirely. Agents that parse the payload directly work in testing and fail on real-world mail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; always fetch the full message from the API using the ID in the payload, as the &lt;a href="https://developer.nylas.com/docs/cookbook/agent-accounts/handle-replies/" rel="noopener noreferrer"&gt;reply-handling recipe&lt;/a&gt; does, and handle the truncated event type explicitly.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Replies that don't thread
&lt;/h2&gt;

&lt;p&gt;Send a "reply" as a fresh message and it lands as a disconnected email in the recipient's client — no quoted context, no conversation grouping. Multiply by a few turns and the customer is hunting through five fragments of one discussion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; pass &lt;code&gt;reply_to_message_id&lt;/code&gt; on every reply. That makes the platform set the &lt;code&gt;In-Reply-To&lt;/code&gt; and &lt;code&gt;References&lt;/code&gt; headers so the message threads correctly in Gmail, Outlook, and the agent's own mailbox. Match incoming replies by &lt;code&gt;thread_id&lt;/code&gt;, never by subject line — subjects get edited, and two different threads can share one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nylas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_GRANT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;requestBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;replyToMessageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;replyBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Replying instantly to every message
&lt;/h2&gt;

&lt;p&gt;Humans send corrections. A recipient fires off a reply, spots a mistake, and sends a follow-up fifteen seconds later — and your agent has already answered the first message, so now it answers the second too, and the conversation forks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; a 30–60 second cooldown before responding in active threads, batching consecutive inbound messages into one considered reply.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. No outbound circuit breaker
&lt;/h2&gt;

&lt;p&gt;Even with dedup, locking, and self-filtering, a logic bug can still produce a reply storm — and an autonomous sender fails at machine speed. This is the safety net the dedup recipe says not to ship without.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; a per-thread send budget. If the agent has sent 3 or more messages on one thread within 5 minutes, stop sending and escalate to a human. A rate limit triggering is a page; a runaway agent is an apology tour.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Letting junk wake the agent
&lt;/h2&gt;

&lt;p&gt;Spam, bounce-backs, and out-of-office auto-replies all fire &lt;code&gt;message.created&lt;/code&gt;. If every one of them reaches your LLM, you're paying inference costs to reason about garbage — and risking the agent &lt;em&gt;answering&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; push filtering below your application using &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/policies-rules-lists/" rel="noopener noreferrer"&gt;rules&lt;/a&gt;. A &lt;code&gt;block&lt;/code&gt; rule rejects known-bad senders at the SMTP level so your code never sees the message; &lt;code&gt;assign_to_folder&lt;/code&gt; routes automated notifications away from the inbox so your handler can skip folders the agent shouldn't answer. Rules run in priority order (0–1000, lower first), so put specific matches before broad &lt;code&gt;contains&lt;/code&gt; rules — the first matching &lt;code&gt;block&lt;/code&gt; is terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Treating a blocked send as a retryable error
&lt;/h2&gt;

&lt;p&gt;If your workspace has outbound rules, a send matching a &lt;code&gt;block&lt;/code&gt; rule returns &lt;code&gt;403&lt;/code&gt; — and no retry will ever deliver it, because the rule rejected it before the provider was involved. An agent with generic retry logic will hammer that send forever and report a flaky network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; treat &lt;code&gt;403&lt;/code&gt; on send as terminal. Log it, then query &lt;code&gt;GET /v3/grants/{grant_id}/rule-evaluations&lt;/code&gt; to see exactly which rule matched and what data was evaluated — that endpoint is the fastest answer to "why didn't this send?"&lt;/p&gt;

&lt;p&gt;There's one nuance worth encoding in your error handler. Rule evaluation fails closed: if a &lt;code&gt;block&lt;/code&gt; rule can't be evaluated because of a transient infrastructure problem (say, a list lookup failure during &lt;code&gt;in_list&lt;/code&gt; matching), the send is blocked anyway — but it comes back as a &lt;code&gt;503&lt;/code&gt;, not a &lt;code&gt;403&lt;/code&gt;, and the audit record carries &lt;code&gt;blocked_by_evaluation_error: true&lt;/code&gt;. So the rule is simple: retry &lt;code&gt;503&lt;/code&gt;, never retry &lt;code&gt;403&lt;/code&gt;. Conflating the two is how agents either give up on deliverable mail or hammer undeliverable mail.&lt;/p&gt;




&lt;p&gt;The pattern across all nine: email agents fail at the seams between at-least-once infrastructure and autonomous action, not in the LLM prompt. The fixes are boring — a filter, a lock, a cap, a rule — and that's the point. Boring is what you want standing between a language model and a real human's inbox.&lt;/p&gt;

&lt;p&gt;Turn this into a pre-launch checklist: nine items, and your load test should specifically exercise #2 and #3 by firing duplicate webhooks from concurrent connections. Which of these has bitten you in production — and was the fix on this list?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>email</category>
      <category>agents</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>Idempotency Lessons From an Email Agent</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 17:17:54 +0000</pubDate>
      <link>https://dev.to/qasim157/idempotency-lessons-from-an-email-agent-2ocb</link>
      <guid>https://dev.to/qasim157/idempotency-lessons-from-an-email-agent-2ocb</guid>
      <description>&lt;p&gt;A customer emails your support agent at 9:14 a.m. At 9:15 they get a helpful reply. At 9:16 they get the same reply again, word for word. Nothing crashed. No exception was thrown. Your agent just did exactly what it was told — twice.&lt;/p&gt;

&lt;p&gt;I think email agents are the best teacher of idempotency I've seen in years, because the failure mode is so visceral. A duplicate database row is invisible. A duplicate email lands in a human's inbox and makes your product look broken. Building a reply loop on &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/" rel="noopener noreferrer"&gt;Nylas Agent Accounts&lt;/a&gt; (currently in beta) forced me to internalize lessons that apply to any event-driven system, not just email.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: at-least-once is the honest contract
&lt;/h2&gt;

&lt;p&gt;The instinct is to blame the platform: "why did I get the same webhook twice?" But at-least-once delivery is the only honest guarantee a webhook system can make. Per the &lt;a href="https://developer.nylas.com/docs/cookbook/agent-accounts/prevent-duplicate-replies/" rel="noopener noreferrer"&gt;duplicate-reply docs&lt;/a&gt;, if your endpoint doesn't return &lt;code&gt;200&lt;/code&gt; fast enough, or a transient network blip eats the response, the &lt;code&gt;message.created&lt;/code&gt; notification gets delivered again. The alternative — exactly-once — would mean the platform silently drops events whenever it's unsure, and a dropped event is worse than a repeated one.&lt;/p&gt;

&lt;p&gt;So duplicates aren't a bug to report. They're a contract to design for. The fix is an atomic check-and-set keyed on the message ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;processedMessages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setIfAbsent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;receivedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The atomicity matters more than the storage. In Postgres that's &lt;code&gt;INSERT ... ON CONFLICT DO NOTHING&lt;/code&gt;; in Redis it's &lt;code&gt;SET messageId 1 NX EX 86400&lt;/code&gt;. A read-then-write sequence reintroduces the race you're trying to close. And give the records a TTL — 24 to 48 hours covers redeliveries without growing the table forever. After that window, a webhook for the same message ID is almost certainly a bug in your own system, not a redelivery, and you &lt;em&gt;want&lt;/em&gt; it to surface.&lt;/p&gt;

&lt;p&gt;There's a quieter corollary: acknowledge before you act. The docs' example handler calls &lt;code&gt;res.status(200).end()&lt;/code&gt; as its first line and only then starts processing. Every second your endpoint spends on an LLM call before responding is a second in which the platform may decide the delivery failed and queue a retry. You can't eliminate redeliveries, but you can stop manufacturing them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: dedup and locking solve different problems
&lt;/h2&gt;

&lt;p&gt;Here's the part most people miss. Deduplication catches the &lt;em&gt;same event delivered twice&lt;/em&gt;. It does nothing about the &lt;em&gt;same event processed twice concurrently&lt;/em&gt;. If your handler runs on Lambda or multiple worker processes, two instances can blow past the check-and-set within the same millisecond window.&lt;/p&gt;

&lt;p&gt;The docs recommend a per-thread lock with a 30-second TTL, so a crashed worker releases automatically. And inside the lock, a double-check against ground truth: fetch the thread, look at &lt;code&gt;latestDraftOrMessage&lt;/code&gt;, and bail if the &lt;code&gt;from&lt;/code&gt; address is the agent's own. Between the webhook arriving and your lock being acquired, another worker may have finished the whole job — the thread itself is the only record that can't lie about it.&lt;/p&gt;

&lt;p&gt;That layered structure — dedup, then lock, then verify state — generalizes. Idempotency isn't one mechanism. It's a stack of cheap checks, each catching what the previous one can't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: the best coordination is no coordination
&lt;/h2&gt;

&lt;p&gt;The thorniest duplicates don't come from infrastructure at all. They come from two actors watching the same inbox — two agents, or an agent and a human, both deciding the same message needs a reply. You can't dedup your way out of that; it's not a duplicate event, it's a coordination problem.&lt;/p&gt;

&lt;p&gt;The cleanest fix is architectural: one agent, one inbox. Agent Accounts make that nearly free, since each agent gets its own address and its own webhook stream — &lt;code&gt;sales-agent@&lt;/code&gt;, &lt;code&gt;support-agent@&lt;/code&gt;, &lt;code&gt;scheduling@&lt;/code&gt;, each filtering on its own &lt;code&gt;grant_id&lt;/code&gt;. No overlap means no conflict to resolve. When humans need visibility, they get read-only &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/mail-clients/" rel="noopener noreferrer"&gt;IMAP access&lt;/a&gt; instead of becoming a second writer.&lt;/p&gt;

&lt;p&gt;This is the distributed-systems lesson in miniature: partitioning beats locking whenever you can afford it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: assume your logic is the next bug
&lt;/h2&gt;

&lt;p&gt;Even with all three layers, you can still build a reply storm. Outbound sends fire &lt;code&gt;message.created&lt;/code&gt; too. If your handler forgets to skip the agent's own messages, the agent replies to itself, which triggers another webhook, forever. The first guard is two lines at the top of every handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// First check in every handler — skip messages from the agent itself.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;AGENT_EMAIL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second guard is a per-thread send budget: more than 3 sends within 5 minutes means something's wrong, so stop and escalate to a human instead of sending.&lt;/p&gt;

&lt;p&gt;That's idempotency's underrated cousin — a circuit breaker for when your &lt;em&gt;correct&lt;/em&gt; code does something incorrect at volume. Dedup protects you from the platform. The rate limit protects you from yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: the cheapest event to handle is the one that never fires
&lt;/h2&gt;

&lt;p&gt;One more layer sits below all of this. Agent Accounts support server-side &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/policies-rules-lists/" rel="noopener noreferrer"&gt;rules&lt;/a&gt; that sort inbound mail before your webhook handler ever sees it — route automated notifications to a folder the agent doesn't reply in, block spam at the SMTP layer, archive what needs no response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/rules"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
    "match": [{ "field": "from.domain", "operator": "equals", "value": "noreply.example.com" }],
    "actions": [{ "action": "assign_to_folder", "value": "notifications" }],
    "description": "Route automated notifications to a separate folder"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your handler then checks which folder a message landed in and skips folders the agent shouldn't touch. Every message you filter out declaratively is a message your idempotency stack never has to be correct about. Shrinking the input space is the idempotency strategy nobody writes blog posts about, because it looks like configuration instead of engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterargument worth taking seriously
&lt;/h2&gt;

&lt;p&gt;"This is a lot of machinery for sending email." Fair. If your agent handles ten messages a day from a single-threaded process, a dedup table alone will carry you a long way, and the lock may be premature. The docs themselves note that synthetic concurrent load testing is the only way to surface the race — which implies a single-threaded deployment won't hit it.&lt;/p&gt;

&lt;p&gt;But the cost asymmetry should drive the decision. The whole stack is maybe forty lines of code. A double reply to a customer is a trust incident you can't un-send. I'd rather carry the forty lines.&lt;/p&gt;

&lt;p&gt;One more habit worth stealing: log every skip. When a message is dropped because it's a duplicate or another worker holds the lock, write that down. Silent idempotency is correct but undebuggable.&lt;/p&gt;

&lt;p&gt;If you're building a reply loop, read the &lt;a href="https://developer.nylas.com/docs/cookbook/agent-accounts/prevent-duplicate-replies/" rel="noopener noreferrer"&gt;prevention recipe&lt;/a&gt; end to end, then write a load test that fires the same webhook payload at your handler from five concurrent connections. If exactly one reply goes out, you've earned the right to ship. What's the worst duplicate-action bug you've shipped — and which layer would have caught it?&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>webhooks</category>
      <category>email</category>
      <category>ai</category>
    </item>
    <item>
      <title>Observability for Email Agents</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 17:17:50 +0000</pubDate>
      <link>https://dev.to/qasim157/observability-for-email-agents-4egn</link>
      <guid>https://dev.to/qasim157/observability-for-email-agents-4egn</guid>
      <description>&lt;p&gt;You can't watch an email agent work, but everything it did yesterday is one API call away:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; GET &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/grants/&amp;lt;GRANT_ID&amp;gt;/messages?limit=50"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the strange, underrated property of building agents on email. Most autonomous systems need observability bolted on — tracing, structured logs, replay tooling. An agent that lives in a mailbox gets three observability primitives for free, because the medium &lt;em&gt;is&lt;/em&gt; the record. Here's how to use each one, drawn from how &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/mailboxes/" rel="noopener noreferrer"&gt;Agent Account mailboxes&lt;/a&gt; (currently in beta) actually behave.&lt;/p&gt;

&lt;h2&gt;
  
  
  The event stream: webhooks
&lt;/h2&gt;

&lt;p&gt;Every inbound message fires &lt;code&gt;message.created&lt;/code&gt; — typically within seconds of the SMTP handoff — with a payload that includes the &lt;code&gt;thread_id&lt;/code&gt; your agent needs to reconstruct conversation state. That's your input-side event stream, no instrumentation required.&lt;/p&gt;

&lt;p&gt;One wrinkle to handle: when a message body exceeds roughly 1 MB, the trigger becomes &lt;code&gt;message.created.truncated&lt;/code&gt; and the body is omitted from the payload — fetch the full message by ID in that case, or your agent will silently reason over nothing.&lt;/p&gt;

&lt;p&gt;The output side is richer than most people expect, because the platform owns the SMTP path end-to-end and reports back on every send:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;What it tells you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;message.send_success&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The recipient's server accepted the message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;message.send_failed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The send died first — outbound rule block, policy limit, or deliverability gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;message.bounce_detected&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The remote server bounced it, hard or soft&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pipe those three into whatever metrics system you already run and you have per-message delivery telemetry for an autonomous sender. A climbing &lt;code&gt;send_failed&lt;/code&gt; count is your earliest signal that something upstream — a rule, a quota, a reputation problem — is throttling the agent.&lt;/p&gt;

&lt;p&gt;Two refinements. If your workflow is batch rather than real-time, you don't need webhooks at all for the input side — &lt;code&gt;GET /messages&lt;/code&gt; with &lt;code&gt;received_after&lt;/code&gt; polls fine; webhooks earn their keep in near-real-time agent loops. And aggregate the outbound triggers per &lt;em&gt;domain&lt;/em&gt;, not just per grant: sender reputation is shared across every Agent Account on a given domain, so one misbehaving agent's bounce rate quietly degrades its siblings' deliverability. Fleet observability is a domain-level concern wearing per-account clothes.&lt;/p&gt;

&lt;h2&gt;
  
  
  State you can read: folders
&lt;/h2&gt;

&lt;p&gt;An agent mailbox comes with six system folders — &lt;code&gt;inbox&lt;/code&gt;, &lt;code&gt;sent&lt;/code&gt;, &lt;code&gt;drafts&lt;/code&gt;, &lt;code&gt;trash&lt;/code&gt;, &lt;code&gt;junk&lt;/code&gt;, &lt;code&gt;archive&lt;/code&gt; — and they double as a state machine you can inspect. &lt;code&gt;junk&lt;/code&gt; shows you what spam filtering and &lt;code&gt;mark_as_spam&lt;/code&gt; rules decided to divert; if real customer mail is landing there, you'll see it by listing one folder. Custom folders extend the pattern: rules that route invoices or VIP senders into named folders turn "what kind of mail is the agent getting?" into a folder-counts query.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;drafts&lt;/code&gt; folder earns special mention in human-in-the-loop designs. If your agent proposes replies as drafts and a reviewer approves them, the drafts folder &lt;em&gt;is&lt;/em&gt; your approval queue — its count is your queue depth, and a draft that's been sitting there for hours is a stalled approval you can detect with a folder listing.&lt;/p&gt;

&lt;p&gt;The governance layer is observable too. Every rule that fires on inbound mail is logged as a rule evaluation you can fetch afterward — so "why did the agent never see that message?" has a queryable answer (a &lt;code&gt;block&lt;/code&gt; rule rejected it at the SMTP layer) instead of a shrug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit log: sent mail
&lt;/h2&gt;

&lt;p&gt;Here's the primitive that ordinary agent architectures genuinely lack. Every action this agent takes in the world is an email, and every email it sends is preserved in &lt;code&gt;sent&lt;/code&gt; — addressed, timestamped, threaded to its context. The audit log can't drift from reality because it &lt;em&gt;is&lt;/em&gt; reality.&lt;/p&gt;

&lt;p&gt;Threading makes the log legible. Replies group by standard RFC 5322 headers, so reviewing an incident means fetching one thread and reading the whole exchange in order — what came in, what the agent said, what came back.&lt;/p&gt;

&lt;p&gt;And because IMAP access exposes the identical mailbox the API sees, a non-engineer can audit the agent by opening it in Outlook or Apple Mail. There's no separation between protocol traffic and API traffic: one mailbox, one record, two ways to read it. Try giving your compliance team that kind of access to your LLM's tool-call logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Tuesday-morning incident, walked through
&lt;/h2&gt;

&lt;p&gt;Here's how the three primitives compose under pressure. Tuesday, 9:40 a.m.: your dashboard shows the support agent's reply rate dropped to zero overnight, but inbound volume looks normal. Where do you look?&lt;/p&gt;

&lt;p&gt;First, the event stream. &lt;code&gt;message.created&lt;/code&gt; events are still arriving, so mail is landing — input is healthy. But &lt;code&gt;message.send_failed&lt;/code&gt; started climbing at 11 p.m. The agent has been drafting replies and failing to deliver them for ten hours.&lt;/p&gt;

&lt;p&gt;Second, the governance record. A send that fails before it reaches the recipient is typically an outbound rule block or a policy limit, and rule evaluations are logged per grant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; GET &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/grants/&amp;lt;GRANT_ID&amp;gt;/rule-evaluations"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The evaluations show which rule matched and what action it took. In this story, a recently enabled outbound rule is matching more broadly than intended.&lt;/p&gt;

&lt;p&gt;Third, the mailbox itself. Fetch the affected threads and read them in order — what came in, what the agent tried to say, where it stopped. Total diagnostic surface: one webhook chart, one API call, one folder read. No log aggregator, no trace sampling, and the postmortem writes itself from artifacts that can't disagree with each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  The blind spots
&lt;/h2&gt;

&lt;p&gt;Honest limits, so you don't design around capabilities that aren't there. Native open and click tracking — &lt;code&gt;message.opened&lt;/code&gt;, &lt;code&gt;message.link_clicked&lt;/code&gt; — isn't emitted for messages sent through the API on these accounts, so "did a human read it?" is not an observable event; delivery signals are where your visibility ends. And a &lt;code&gt;send_success&lt;/code&gt; only means the recipient's server accepted the message — recipient-side filtering afterward is invisible to you, as it is to every sender on earth.&lt;/p&gt;

&lt;p&gt;There's also a subtler gap: the webhook stream tells you what happened, not why the agent chose it. Mailbox observability covers actions; you still own logging the reasoning (prompts, classifications, decisions) that produced each send.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wire the outbound three first
&lt;/h2&gt;

&lt;p&gt;If you instrument nothing else this sprint, subscribe to &lt;code&gt;message.send_success&lt;/code&gt;, &lt;code&gt;message.send_failed&lt;/code&gt;, and &lt;code&gt;message.bounce_detected&lt;/code&gt; and chart them. Input observability fails loud — the agent stops responding and someone notices. Output observability fails quiet: the agent keeps cheerfully sending into a rising failure rate, and the webhook stream is how you find out in minutes instead of weeks.&lt;/p&gt;

&lt;p&gt;What's on your email-agent dashboard today — and if the answer is "nothing yet," which of the three primitives would you wire up first?&lt;/p&gt;

</description>
      <category>observability</category>
      <category>ai</category>
      <category>email</category>
      <category>devops</category>
    </item>
    <item>
      <title>Human-in-the-Loop Design for Email Agents</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 17:17:47 +0000</pubDate>
      <link>https://dev.to/qasim157/human-in-the-loop-design-for-email-agents-3fhc</link>
      <guid>https://dev.to/qasim157/human-in-the-loop-design-for-email-agents-3fhc</guid>
      <description>&lt;p&gt;A refund request lands in your support agent's queue. The knowledge-base match comes back at 0.91 confidence — comfortably above your drafting threshold, with a clean article on refund policy attached. The agent should still not send that reply. If that sentence sounds wrong to you, this post is the argument for why it's right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Autonomy is a dial, not a switch
&lt;/h2&gt;

&lt;p&gt;Most teams frame human-in-the-loop as a binary: either the agent sends email on its own or a human approves everything. The framing fails in both directions. Full-auto on everything means the one bad reply ends up screenshotted in a board deck; full-review on everything means you've built an expensive draft generator that saves nobody time.&lt;/p&gt;

&lt;p&gt;The better model is a dial set &lt;em&gt;per message type&lt;/em&gt;, not per agent. The same support agent can run fully autonomous on order-status lookups, draft-and-approve on billing questions, and hands-off-escalate on anything with legal weight. The &lt;a href="https://developer.nylas.com/docs/cookbook/agents/email-support-agent/" rel="noopener noreferrer"&gt;email support agent recipe&lt;/a&gt; implements exactly this, with two independent gates deciding where the dial sits for each message.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate one: confidence
&lt;/h2&gt;

&lt;p&gt;The first gate is mechanical — how sure is the knowledge-base match? The recipe's thresholds:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Confidence&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;≥ 0.85&lt;/td&gt;
&lt;td&gt;Draft directly from the matched article&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.60 – 0.85&lt;/td&gt;
&lt;td&gt;Draft conservatively, cite the source article inline so the reviewer can verify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 0.60&lt;/td&gt;
&lt;td&gt;Don't draft — flag for manual review with the best-guess article attached&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The middle tier is the clever part. Citation-required drafts let reviewers calibrate their scrutiny: trust the high-confidence pile, check the cited ones, write the low-confidence ones themselves. That's what keeps review from becoming a second full-time job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gate two: risk — and it overrides
&lt;/h2&gt;

&lt;p&gt;The second gate is about consequences, and it doesn't care what gate one said:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low&lt;/strong&gt; — password resets, FAQ-shaped questions → draft, human approves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium&lt;/strong&gt; — refunds, account changes, anything touching billing → draft, human approves with extra scrutiny.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High&lt;/strong&gt; — legal threats, regulatory matters, fraud reports → no draft at all; escalate immediately to a person with full context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why the 0.91-confidence refund reply doesn't go out: refunds are medium-risk regardless of match quality. Confidence measures "do we know the answer?"; risk measures "what happens if we're wrong?" — orthogonal questions, and conflating them is how an agent ends up committing your company to something. There's a widely cited airline-chatbot ruling about precisely that failure: the bot promised a refund policy that didn't exist, and the company was held to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The choke point
&lt;/h2&gt;

&lt;p&gt;In code, the whole policy is small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;classify_risk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;escalate_to_human&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high-risk topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.60&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;flag_for_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_draft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cite_inline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;queue_for_approval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's absent: a send call. &lt;code&gt;queue_for_approval&lt;/code&gt; is the choke point — in production it drops drafts into Slack or a review tool, never directly into the outbox. The recipe states the load-bearing constraint outright: always show the draft before sending, never auto-send. Every other rule can be tuned; remove that one and you no longer have a gate, you have a delay.&lt;/p&gt;

&lt;p&gt;If you're giving the agent its own mailbox to run this from, Agent Accounts — currently in beta — are the natural home for a &lt;code&gt;support@&lt;/code&gt; identity the agent owns end-to-end. They also give the choke point a native implementation: the Drafts API supports full CRUD, so the agent can create a draft &lt;em&gt;in its own mailbox&lt;/em&gt;, a reviewer can amend it, and approval sends the existing draft — the send action on a draft behaves exactly like a regular send. The pending reply lives where the conversation lives, instead of in a screenshot pasted into Slack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Declaring the policy instead of coding it
&lt;/h2&gt;

&lt;p&gt;The same gates work as configuration. If the agent runs on a skill-file platform, the recipe shows the whole policy as a &lt;code&gt;SKILL.md&lt;/code&gt; rather than code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Support agent&lt;/span&gt;

&lt;span class="gu"&gt;## Reply style&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Replies are under 120 words.
&lt;span class="p"&gt;-&lt;/span&gt; Cite KB articles inline: &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;KB-1234&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://kb.example.com/1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Match the tone of the inbound message.

&lt;span class="gu"&gt;## Drafting rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always show the draft before sending. Never auto-send.
&lt;span class="p"&gt;-&lt;/span&gt; If confidence &amp;lt; 0.6, do not draft — flag for human.
&lt;span class="p"&gt;-&lt;/span&gt; Refunds, account changes, legal threats: never draft. Escalate.

&lt;span class="gu"&gt;## Polling&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Check the support inbox every 10 minutes.
&lt;span class="p"&gt;-&lt;/span&gt; Process at most 5 tickets per cycle while the agent is in shakedown.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the policy reads like an onboarding doc for a junior hire — which is the right intuition. The drafting rules section is the dial from earlier, written in plain English, and "always show the draft before sending" appears here too because it survives every refactor: it's a property of the system, not of any particular implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loosening the dial, slowly
&lt;/h2&gt;

&lt;p&gt;Human-in-the-loop isn't a permanent tax; it's how you earn the data to automate more. The recipe's shakedown numbers: process 5 tickets per cycle while tuning the matcher and risk classifier, bump to 20 once the false-positive rate is acceptable. Polling every 5–15 minutes is plenty for support latency — and simpler than webhook fan-out on a shared inbox.&lt;/p&gt;

&lt;p&gt;Two practices make the loosening defensible. Log everything — every classification, KB lookup, and approval decision — so when you propose moving a message type from draft-and-approve to full-auto, you argue from a thousand logged approvals, not vibes. And track what the agent &lt;em&gt;can't&lt;/em&gt; match: those tickets are your map of missing KB articles.&lt;/p&gt;

&lt;p&gt;The counterargument deserves its hearing: review fatigue is real, and a human rubber-stamping 200 drafts a day is barely a gate. Two answers. First, tier aggressively — automate the genuinely safe tiers so human attention concentrates where it changes outcomes, and batch the repetitive stuff: when three "where's my receipt?" tickets arrive in a row, that's one KB article, one draft template, one reviewer pass, not three. Second, remember what the gate is actually defending against. The recipe puts it bluntly: even at 99% accuracy, the 1% that makes legal commitments destroys trust faster than the 99% builds it. Fatigue is a workload-design problem with workload-design fixes. The 1% is not.&lt;/p&gt;

&lt;p&gt;Map your own agent's message types into the three risk tiers this week, and set the dial per tier. Which message type are you currently over-reviewing — and which one, honestly, shouldn't be on full-auto?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>ux</category>
      <category>email</category>
    </item>
    <item>
      <title>The Lifecycle of an Agent Identity: Provision to Teardown</title>
      <dc:creator>Qasim Muhammad</dc:creator>
      <pubDate>Tue, 16 Jun 2026 17:17:43 +0000</pubDate>
      <link>https://dev.to/qasim157/the-lifecycle-of-an-agent-identity-provision-to-teardown-5c40</link>
      <guid>https://dev.to/qasim157/the-lifecycle-of-an-agent-identity-provision-to-teardown-5c40</guid>
      <description>&lt;p&gt;Every infrastructure team has a graveyard. Service accounts nobody remembers creating. API keys that outlived their project by three years. Credentials that still work and absolutely shouldn't. Email identities for AI agents are about to join that graveyard in large numbers — unless you treat them like infrastructure with a lifecycle: provisioned deliberately, auditable while alive, destroyed on purpose.&lt;/p&gt;

&lt;p&gt;Here's that lifecycle, stage by stage, using &lt;a href="https://developer.nylas.com/docs/v3/agent-accounts/provisioning/" rel="noopener noreferrer"&gt;Agent Accounts&lt;/a&gt; (currently in beta) as the concrete machinery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Birth: a domain, then an address
&lt;/h2&gt;

&lt;p&gt;An agent identity starts before the agent does, with a domain decision. Two options: a &lt;code&gt;*.nylas.email&lt;/code&gt; trial domain that works instantly with zero DNS setup, or your own domain — registered once per organization, verified by publishing the MX record (inbound routing) and TXT records (ownership proof plus SPF/DKIM for outbound) at your DNS provider. Verification happens automatically once records propagate; the domain status flips to &lt;code&gt;verified&lt;/code&gt; and it's ready to host accounts.&lt;/p&gt;

&lt;p&gt;The recommended production pattern is a dedicated subdomain like &lt;code&gt;agents.yourcompany.com&lt;/code&gt;, so the fleet's sender reputation stays isolated from your primary domain. Reputation is a domain-level asset; don't let an experimental agent spend it.&lt;/p&gt;

&lt;p&gt;With a domain ready, birth is one call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas agent account create sales-agent@agents.yourcompany.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prefer raw API? &lt;code&gt;POST /v3/connect/custom&lt;/code&gt; with &lt;code&gt;"provider": "nylas"&lt;/code&gt; does the same job — unlike OAuth providers, no refresh token is involved, just an address on a registered domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"request_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5967ca40-a2d8-4ee0-a0e0-6f18ace39a90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b1c2d3e4-5678-4abc-9def-0123456789ab"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nylas"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"grant_status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales-agent@agents.yourcompany.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1742932766&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response includes the &lt;code&gt;grant_id&lt;/code&gt; (&lt;code&gt;data.id&lt;/code&gt;) — the identity's handle for everything that follows. Two birth-time decisions worth making consciously rather than by default: &lt;strong&gt;placement&lt;/strong&gt; (pass a &lt;code&gt;workspace_id&lt;/code&gt; so the account inherits that workspace's policy limits and rules; omit it and the account is auto-grouped by domain or dropped into the application default) and &lt;strong&gt;protocol access&lt;/strong&gt; (an optional &lt;code&gt;app_password&lt;/code&gt; — 18–40 printable ASCII characters with at least one uppercase, one lowercase, and one digit — enables IMAP/SMTP so humans can open the mailbox in Outlook or Apple Mail; it's bcrypt-hashed on write and can never be retrieved, only reset).&lt;/p&gt;

&lt;h2&gt;
  
  
  Work: one ID, the whole surface
&lt;/h2&gt;

&lt;p&gt;While the agent lives, the &lt;code&gt;grant_id&lt;/code&gt; is its entire interface. Messages, threads, drafts, folders, attachments — all the existing endpoints work against &lt;code&gt;/v3/grants/{grant_id}/...&lt;/code&gt; exactly as they do for a human-connected account. The mailbox arrives with six system folders (&lt;code&gt;inbox&lt;/code&gt;, &lt;code&gt;sent&lt;/code&gt;, &lt;code&gt;drafts&lt;/code&gt;, &lt;code&gt;trash&lt;/code&gt;, &lt;code&gt;junk&lt;/code&gt;, &lt;code&gt;archive&lt;/code&gt;); custom folders can be added beside them.&lt;/p&gt;

&lt;p&gt;The working identity also emits a steady event stream. Inbound mail runs the workspace's rules at the SMTP stage — &lt;code&gt;block&lt;/code&gt; rejects a message before it's ever stored, &lt;code&gt;mark_as_spam&lt;/code&gt; routes it to &lt;code&gt;junk&lt;/code&gt;, &lt;code&gt;assign_to_folder&lt;/code&gt; files it — and whatever survives fires &lt;code&gt;message.created&lt;/code&gt;, identical in payload shape to the same webhook on any connected grant. Outbound, deliverability signals come back as &lt;code&gt;message.send_success&lt;/code&gt;, &lt;code&gt;message.send_failed&lt;/code&gt;, and &lt;code&gt;message.bounce_detected&lt;/code&gt;, so every send the identity makes has a verdict you can record. An identity that's alive is an identity that's &lt;em&gt;telling you things&lt;/em&gt;; silence on those triggers is itself a signal worth alerting on.&lt;/p&gt;

&lt;p&gt;The fleet-management view matters as much as the single-account view. The CLI covers it without a dashboard visit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas agent account list &lt;span class="nt"&gt;--json&lt;/span&gt;   &lt;span class="c"&gt;# inventory&lt;/span&gt;
nylas agent status                &lt;span class="c"&gt;# connector readiness&lt;/span&gt;
nylas agent policy list           &lt;span class="c"&gt;# what governs whom&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That inventory command is your defense against the graveyard. If &lt;code&gt;agent account list&lt;/code&gt; returns something nobody can explain, you've found a zombie.&lt;/p&gt;

&lt;h2&gt;
  
  
  Audit: the mailbox is its own record
&lt;/h2&gt;

&lt;p&gt;Here's where email identities beat most service credentials: the audit trail is built into the artifact. Every message the agent ever sent sits in its sent folder; every conversation is a thread you can fetch and read. The provisioning docs' verification step — send a test message in, then list the mailbox — doubles as the ongoing health check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--request&lt;/span&gt; GET &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.us.nylas.com/v3/grants/&amp;lt;GRANT_ID&amp;gt;/messages?limit=5"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;NYLAS_API_KEY&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And because IMAP access shows the identical mailbox the API sees, a human reviewer can audit an agent by literally opening its mail. No log pipeline required for the first pass. The governance layer keeps its own books too: every rule that fires on inbound mail is logged as a rule evaluation you can query later, so "why did this message end up in junk?" has a recorded answer instead of a shrug.&lt;/p&gt;

&lt;p&gt;One honest caveat before you lean your compliance story on this: the mailbox-as-audit-log has a retention horizon. On the free plan, inbox mail is retained for 30 days and spam for 7. If your audit requirements are measured in years, configure retention through the workspace policy or run an export job — don't discover the horizon during an investigation.&lt;/p&gt;

&lt;p&gt;Mid-life changes don't require rebirth, either: move an account to a different workspace — different policy, different rules — with &lt;code&gt;PATCH /v3/grants/{grant_id}&lt;/code&gt; and a new &lt;code&gt;workspace_id&lt;/code&gt;. Governance evolves; the identity persists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Death: delete on purpose
&lt;/h2&gt;

&lt;p&gt;The stage everyone skips. An agent identity should die when its work ends — project shipped, test run finished, customer churned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nylas agent account delete sales-agent@agents.yourcompany.com &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The discipline that makes this painless is pairing: whatever process creates an account owns its deletion. The docs' environment-separation pattern helps here too — staging agents live on &lt;code&gt;agents.staging.yourcompany.com&lt;/code&gt;, so a sweep of stale staging identities can't touch production, and per-customer domains mean offboarding a tenant cleanly removes their agents with them.&lt;/p&gt;

&lt;p&gt;Ephemeral identities aren't a workaround; they're the design. A mailbox that exists for one CI run and is gone an hour later never joins the graveyard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Script both ends before you need either
&lt;/h2&gt;

&lt;p&gt;Concrete next step: before your next agent ships, write the teardown script in the same PR as the provisioning script — create and delete, side by side, tested together. Then run &lt;code&gt;nylas agent account list&lt;/code&gt; on whatever you have today.&lt;/p&gt;

&lt;p&gt;Be honest: how many identities in that list could you explain right now?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>email</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
