How Email Threading Works for AI Agents

#email #ai #architecture #api

Three headers decide whether your agent's reply lands in the right conversation or starts a confusing new one: Message-ID, In-Reply-To, and References. By the time a thread is five messages deep, the References header carries five Message-ID values in order — a complete audit trail of the conversation that every mail client on earth uses to group messages.

Most developers never think about these headers because their mail client handles them. The moment you build an email agent, they become your problem. An agent that sends a follow-up and gets a reply three hours later needs to know which conversation that reply belongs to, what it last said, and what to do next. All of that context hangs off the threading chain.

The mechanics in one example

Every outbound email gets a globally unique Message-ID stamped by the sending server. When someone replies, their client adds two headers pointing back:

# The agent's outbound message
Message-ID: <abc123@agents.yourcompany.com>
Subject: Following up on your demo request

# The recipient's reply
Message-ID: <def456@gmail.com>
In-Reply-To: <abc123@agents.yourcompany.com>
References: <abc123@agents.yourcompany.com>

# The agent's follow-up
Message-ID: <ghi789@agents.yourcompany.com>
In-Reply-To: <def456@gmail.com>
References: <abc123@agents.yourcompany.com> <def456@gmail.com>

In-Reply-To points at the message being answered directly; References accumulates the whole chain, oldest to newest. Gmail, Outlook, Apple Mail, and Thunderbird all thread on these headers. Subject-line matching is a fallback, not the mechanism.

Why subject matching betrays you

Plenty of agent implementations match replies by checking for Re: plus the original subject. It works in the demo and fails in production, for three documented reasons:

Recipients edit subjects. "Q3 budget review" comes back as "Re: Q3 budget review — updated numbers attached."
Subjects collide. Two prospects both received "Following up on your demo request." A reply to either matches both.
Forwards lie. A recipient forwards the thread to a colleague who replies — same subject, completely different conversation context.

Headers reference specific Message-ID values, not human-editable text, so none of these break them. Match on headers first; fall back to subject only when headers are missing, which basically means very old or broken mail clients.

What the platform handles for you

With a Nylas Agent Account — the hosted-mailbox product currently in beta — you don't manage any of this by hand. The threading guide describes three send paths, all of which preserve the chain:

API sends: pass reply_to_message_id on POST /v3/grants/{grant_id}/messages/send, and the original message's Message-ID is fetched and In-Reply-To plus References are populated automatically.
SMTP submission (port 465 or 587): headers a mail client sets are preserved exactly as sent.
Inbound: full headers are stored on arrival. Pull them with fields=include_headers, or use fields=include_basic_headers to get just the three threading headers — a much smaller payload, since the full header set is often larger than the message body itself.

Even mixed traffic stays coherent: if the agent sends via the API and a human later replies through IMAP, the Threads API groups everything by the header chain, not by how each message was sent.

thread_id is your primary key

Rather than parsing headers, lean on the Threads API. Every message.created webhook includes a thread_id; one GET returns the conversation:

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/threads/<THREAD_ID>" \
  --header "Authorization: Bearer $NYLAS_API_KEY"

The thread object carries message_ids in order, participants, latest_message_received_date, a snippet, and routing metadata like unread and folders. The docs recommend treating thread_id as the primary key for conversation context — it's more stable than raw headers because it's platform-assigned and covers the whole conversation, not one message.

When the agent needs the actual words, not just the structure, reconstruct the conversation from the ID list:

// After receiving a message.created webhook:
const thread = await nylas.threads.find({
  identifier: AGENT_GRANT_ID,
  threadId: message.thread_id,
});

// thread.data.messageIds has the full conversation chain.
const messages = await Promise.all(
  thread.data.messageIds.map((id) =>
    nylas.messages.find({ identifier: AGENT_GRANT_ID, messageId: id }),
  ),
);

That ordered list of full messages is exactly the shape you want to feed an LLM as conversation history.

Connecting threads to what the agent was doing

Threading tells you which messages belong together. It can't tell you which task the conversation belongs to — that mapping lives in your application:

On outbound: store the returned message_id and thread_id against your internal state — session ID, CRM deal, support ticket, workflow step.
On inbound: when the webhook fires, look up thread_id. A hit means a reply to something the agent sent; restore context and continue. A miss means a brand-new conversation; classify and route it.

In code, the mapping is small:

// After sending:
threadState.set(sentMessage.threadId, {
  sessionId: currentSession.id,
  taskId: currentTask.id,
  step: "awaiting_reply",
  sentAt: Date.now(),
});

// On webhook:
const context = threadState.get(inboundMessage.threadId);
if (context) {
  await resumeTask(context.taskId, inboundMessage); // reply — restore and continue
} else {
  await triageNewMessage(inboundMessage); // new conversation — classify and route
}

Keep that mapping in a database, not in memory — email conversations span hours and days, and an in-memory map doesn't survive a restart. Two more edge cases from the docs worth designing for: a single outbound message can draw multiple replies (don't send duplicate responses), and dormant threads come back — someone may answer a three-week-old message after your state TTL expired. Decide upfront whether the agent re-reads the thread history, escalates to a human, or starts fresh.

Closing the loop: the in-thread reply

The send side mirrors the receive side. One field keeps the agent's response in the conversation:

curl --request POST \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages/send" \
  --header "Authorization: Bearer $NYLAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "reply_to_message_id": "<MESSAGE_ID>",
    "to": [{ "email": "alice@example.com" }],
    "subject": "Re: Following up on your demo request",
    "body": "Thanks for getting back to me, Alice. Here are the next steps..."
  }'

Nylas sets In-Reply-To and References on the way out, the reply threads correctly in the recipient's client, and it also lands in the same thread in the agent's own mailbox — so the next webhook-triggered read sees a complete, ordered conversation.

Next step: wire up a message.created webhook, send yourself a message from an agent mailbox, reply from your phone, and log the thread_id round-trip. Once you've watched one conversation thread correctly end to end, the handle-replies recipe turns it into a production loop.