DEV Community

Qasim
Qasim

Posted on

Keep your AI agent's email replies in the right thread

An AI agent sends an email, a reply lands three hours later, and the agent has to answer two questions before it can do anything useful: which conversation is this, and what did I last say? Get the first one wrong and the agent's reply shows up in the recipient's inbox as a brand-new message instead of slotting into the existing thread. To the person on the other end, that looks broken — like the agent forgot the conversation it started.

Threading is the part of agent email that's easy to get almost right and quietly wrong. The fix lives in a few email headers most developers never touch, and in the Threads API that groups messages into conversations for you. This post walks through both, from two angles: the HTTP API for your backend, and the Nylas CLI for the terminal. I work on the CLI, so the terminal commands below are the ones I reach for when I'm testing a reply loop.

The three headers that make threading work

Threading runs on three email headers, not on subject lines. Every message carries a Message-ID — a globally unique identifier the sending server stamps on it. When someone replies, their mail client adds In-Reply-To (the Message-ID of the message being answered) and References (the full chain of Message-ID values, oldest to newest). Those two headers are how every mail client decides which messages belong together.

Here's what the chain looks like across one exchange. The agent's first message gets a Message-ID; the reply points back at it; the agent's follow-up references both:

Enter fullscreen mode Exit fullscreen mode

The References header grows with every message. By the time a thread is five messages deep, it carries five Message-ID values in order — a complete audit trail of the conversation that Gmail, Outlook, Apple Mail, and Thunderbird all read the same way.

Why subject-line matching breaks

Matching replies by subject line is the trap most agent implementations fall into: if a subject starts with Re: and contains the original text, treat it as a reply. It works in testing and fails in production, for three concrete reasons. Subject matching has no way to tell two conversations apart when they share a subject, and no way to follow one when the subject changes.

  • Recipients edit subjects. A reply to "Q3 budget review" comes back as "Re: Q3 budget review — updated numbers attached", and a naive contains-match still works, until the edit drops the original words entirely.
  • Multiple threads share a subject. Two prospects both get "Following up on your demo request". A reply to either one matches both, and the agent can't tell which prospect answered.
  • Forwards reuse the subject. Someone forwards the thread to a colleague who replies. The subject is unchanged but the conversation context is completely different.

The header approach has none of these failure modes because In-Reply-To and References point at specific Message-ID values, not human-readable text. Match on headers first and fall back to subject only when headers are missing, which is rare enough to treat as a broken-client edge case.

How Nylas preserves threading

Nylas keeps the threading chain intact however a message moves through the mailbox, which means your agent never has to generate a Message-ID or hand-assemble a References header. Threading holds on both outbound paths and on inbound mail alike.

  • API sends (POST /v3/grants/{grant_id}/messages/send): pass reply_to_message_id and Nylas fetches the original's Message-ID, then sets In-Reply-To and References on the outbound message automatically. Sending an existing draft behaves the same way, since a draft send runs through the same path.
  • SMTP submission (port 465 or 587): if a human replies from a mail client connected over SMTP, Nylas preserves the Message-ID, In-Reply-To, and References the client set.
  • Inbound messages: when a reply arrives, Nylas stores the full headers. Read them with fields=include_headers for the complete set, or fields=include_basic_headers to skip the full header payload — which is often larger than the message body itself — and get just Message-ID, In-Reply-To, and References.

That consistency is the reason an agent can send via the API and have a human follow up via IMAP without the thread splitting apart. Both paths write to the same mailbox and share the same header chain.

List and fetch threads with the Threads API

Rather than parse In-Reply-To and References yourself, ask the Threads API for the grouped conversation — its message IDs, participants, and metadata — in one call. A GET against the grant's threads collection returns each thread already grouped, so the agent reads the conversation as a unit instead of stitching messages together by hand.

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/threads?limit=10" \
  --header "Authorization: Bearer <NYLAS_API_KEY>"
Enter fullscreen mode Exit fullscreen mode

Each thread object carries the fields an agent needs to decide what to do: message_ids (every message in order), participants (everyone in the conversation), latest_message_received_date and latest_message_sent_date, a snippet of the most recent message, plus subject, unread, starred, and folders. When a reply fires the message.created webhook, the payload includes a thread_id you look up here to pull the full history before responding.

There's one more reason to fetch the thread rather than rely on the webhook payload alone. When a message body exceeds about 1 MB, the trigger name becomes message.created.truncated and the body is omitted to keep the payload small. In that case the agent has the thread_id and message_id but not the text, so a follow-up GET /messages/{message_id} returns the full body it needs to reply. Fetching the thread gives you the conversation's message summaries and IDs; the full body of any single message comes from the messages endpoint.

From an SDK, fetching the thread and its messages is a couple of calls. This is the pattern I use after a webhook fires — get the thread, then reconstruct what was said:

// After receiving a message.created webhook:
const thread = await nylas.threads.find({
  identifier: AGENT_GRANT_ID,
  threadId: message.thread_id,
});

// thread.data.messageIds has the full conversation chain.
const messages = await Promise.all(
  thread.data.messageIds.map((id) =>
    nylas.messages.find({ identifier: AGENT_GRANT_ID, messageId: id }),
  ),
);
Enter fullscreen mode Exit fullscreen mode

Reply in-thread from the API and CLI

A reply that threads correctly is a normal send with one extra field: reply_to_message_id, set to the message you're answering. Nylas reads that ID, pulls the original's Message-ID, and stamps In-Reply-To and References on the outbound message so it lands in the right thread in every recipient's client — and in the agent's own mailbox. From the API:

curl --request POST \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages/send" \
  --header "Authorization: Bearer <NYLAS_API_KEY>" \
  --header "Content-Type: application/json" \
  --data '{
    "reply_to_message_id": "<MESSAGE_ID>",
    "to": [{ "email": "alice@example.com" }],
    "subject": "Re: Following up on your demo request",
    "body": "Thanks for getting back to me, Alice. Here are the next steps..."
  }'
Enter fullscreen mode Exit fullscreen mode

The CLI does the same with one flag. nylas email send takes a --reply-to option that sets reply_to_message_id for you, so Nylas stamps the threading headers and the message lands in the right thread. You still supply the recipient and subject explicitly — it's a normal send that happens to thread, which makes it a quick way to confirm your agent's replies thread before you wire the same call into code:

# Reply in-thread to a specific message
nylas email send \
  --to alice@example.com \
  --reply-to <message-id> \
  --subject "Re: Following up on your demo request" \
  --body "Sounds good, thanks!"
Enter fullscreen mode Exit fullscreen mode

There's no separate reply-all on either surface: to copy everyone on the conversation, list them in --to and --cc on the CLI, or the to and cc arrays in the API body. To inspect what you're replying to first, nylas email read <message-id> prints the message and nylas email list shows the mailbox. The reply you send with either surface appears in the same thread you'd see from GET /threads.

Map threads to your agent's state

The Threads API tells the agent which messages belong together, but not what the agent was doing when the conversation started — which task, which workflow step, which session. That mapping lives in your application, keyed on the thread_id. The reliable pattern has two halves: store the mapping when the agent sends, and look it up when a reply arrives.

// thread_id -> { sessionId, taskId, step, ... }
const threadState = new Map();

// After sending:
threadState.set(sentMessage.threadId, {
  sessionId: currentSession.id,
  taskId: currentTask.id,
  step: "awaiting_reply",
});

// On webhook:
const context = threadState.get(inboundMessage.threadId);
if (context) {
  // A reply to something the agent sent — restore context and continue.
  await resumeTask(context.taskId, inboundMessage);
} else {
  // New conversation — classify and route.
  await triageNewMessage(inboundMessage);
}
Enter fullscreen mode Exit fullscreen mode

In production this map belongs in a database or durable store, not in memory. Email conversations span hours and days, and an in-memory map doesn't survive a process restart — which is exactly when a three-hour-old reply tends to arrive. The thread_id is the right key because Nylas assigns it and it covers the whole conversation, where a single Message-ID covers only one message.

Don't react to your agent's own messages

The message.created webhook fires for every new message in the mailbox, and that includes the ones the agent sends. An agent that replies on every message.created without checking direction will answer its own outbound mail and loop. Before the agent acts, confirm the message came from someone else: check that the sender isn't the Agent Account's own address, and skip anything the agent just sent.

Deduplication on the inbound side matters too. A recipient might reply twice within seconds, firing two message.created events on the same thread, and Nylas redelivers a webhook on a retry if your endpoint is slow to acknowledge. Track the message_id values the agent has already answered and treat a reply as handled once you've sent a response for it. Keying that record on the inbound message_id keeps the agent from sending two replies to the same message.

Things to watch in an agent reply loop

A few threading behaviors trip up agents specifically, because agents react automatically where a human would use judgment. None are hard to handle once you know they exist, but each one causes a visible bug if you don't.

  • thread_id is your primary key, not Message-ID. It's more stable for application logic because Nylas assigns it and it spans the full conversation. Reach for raw headers only when you need them.
  • Don't assume one reply per outbound. A prospect might reply twice, or two people on a thread might both respond. Handle multiple inbound messages on one thread without firing duplicate replies — a dedup check on the inbound message_id is enough.
  • Threads go dormant and come back. Someone replies to a three-week-old thread. If your state mapping has a TTL, decide what the agent does when context has expired: re-read the thread history, escalate to a human, or start fresh.
  • Raw headers are one parameter away. Pass fields=include_basic_headers to any message GET for just Message-ID, In-Reply-To, and References when you need to debug threading directly, instead of the full header set.
  • Fetch the thread before responding. An agent that drafts replies off the single-message webhook payload alone will repeat itself or contradict an earlier message. Pull the thread first.

Wrapping up

Threading for an agent comes down to two habits: always pass reply_to_message_id so replies thread, and key your agent's state on thread_id so it knows what each reply belongs to. Nylas handles the header mechanics on every send path, so the work that's left is in your application — mapping conversations to tasks and deciding what to do when a thread goes quiet, then comes back to life.

Where to go next:

Top comments (0)