Build the Reply Loop: Receive, Think, Respond

#ai #llm #email #tutorial

About 1 MB. That's the body-size threshold where the message.created webhook quietly changes shape — the trigger becomes message.created.truncated and the body is omitted entirely. If your email agent reads bodies straight off webhook payloads, it works fine for months and then silently drops the one reply that contained a forwarded contract. That detail is a good preview of this whole topic: the receive-think-respond loop is conceptually simple, and every interesting bug lives in the edges.

Let's wire the loop properly, using a Nylas Agent Account (in beta) as the agent's mailbox.

Step 1 — Receive: the webhook is a doorbell, not a package

A message.created webhook fires when mail arrives. Treat it as notification only:

app.post("/webhooks/nylas", async (req, res) => {
  res.status(200).end(); // ack fast, work async

  const event = req.body;
  if (event.type !== "message.created") return;

  const msg = event.data.object;
  if (msg.grant_id !== AGENT_GRANT_ID) return;

  // Outbound fires message.created too -- don't reply to yourself.
  if (msg.from?.[0]?.email === AGENT_EMAIL) return;

  const conversation = await db.conversations.findByThreadId(msg.thread_id);
  conversation
    ? await continueConversation(msg, conversation)
    : await triageNewInbound(msg);
});

Three load-bearing lines in there. The grant check keeps other accounts' traffic out. The from check matters because the webhook fires for outbound mail too — skip it and your agent replies to its own replies, forever. And the thread_id lookup is how a reply gets recognized as a reply: messages are grouped into threads using the In-Reply-To and References headers, so if your agent sent the original message, the inbound reply lands on a thread you already have state for. No header parsing on your side.

Step 2 — Think: context before generation

The payload carries summary fields — subject, from, snippet. Before the model decides anything, fetch the real data:

const fullMessage = await nylas.messages.find({
  identifier: AGENT_GRANT_ID,
  messageId: msg.id,
});

const thread = await nylas.threads.find({
  identifier: AGENT_GRANT_ID,
  threadId: msg.thread_id,
});
// thread.data.messageIds -> fetch each, sort by date, build transcript

An LLM answering "sounds good, let's do Thursday" needs to know what was proposed — the full thread is the conversation memory. For long threads, you don't need every message verbatim: summarize the early turns and pass the last 3–4 in full. Same context, fraction of the tokens.

Your own state machine supplies the other half of the context. A conversation record keyed by thread_id tracks a step field, and the handler routes on it before any model call happens:

async function routeReply(message, history, context) {
  switch (context.step) {
    case "awaiting_confirmation":
      // The agent proposed something and is waiting for a yes/no.
      await handleConfirmation(message, history, context);
      break;
    case "awaiting_info":
      // The agent asked a question and needs the answer.
      await handleInfoResponse(message, history, context);
      break;
    case "closed":
      // The conversation was resolved but the person wrote back.
      await handleReopenedThread(message, history, context);
      break;
    default:
      // Unknown state -- log and escalate.
      await escalateToHuman(message, context);
  }
}

A "yes" means something different depending on what the agent asked, and the default branch matters: an unknown state should escalate, not improvise. The other useful trick from the multi-turn recipe: have the LLM return a nextStep value along with the reply text, so the model itself advances the state machine instead of your code guessing where the conversation went.

Step 3 — Respond: one parameter does the threading

const sent = await nylas.messages.send({
  identifier: AGENT_GRANT_ID,
  requestBody: {
    replyToMessageId: msg.id,
    to: fullMessage.data.from,
    subject: `Re: ${fullMessage.data.subject}`,
    body: replyBody,
  },
});

Passing reply_to_message_id makes the platform set In-Reply-To and References on the outbound message, so the recipient's mail client renders a threaded reply instead of a disconnected new email. Skip it and every reply starts a new thread — the fastest way to make an agent feel broken to the human on the other end. The mechanics are covered in depth in the handle-replies recipe.

After sending, update the conversation record: bump the turn count, set the next step, stamp lastActivityAt.

The failure modes, ranked by how much they'll hurt

Self-reply loops. Covered above, but it's the #1 footgun. One missing from check equals an infinite conversation with yourself.

Duplicate replies. Webhook redelivery and concurrent workers will both re-trigger your handler — at any volume, not just at scale. Without dedup and locking, the same inbound message generates two LLM calls and two replies. Treat idempotency as a launch requirement, not a hardening task.

Rapid-fire corrections. Humans send "let's do Thursday" and then "actually, Friday" eleven seconds apart. A 30–60 second cooldown before responding lets you batch consecutive inbound messages into one coherent reply instead of answering each individually.

Runaway conversations. An unbounded loop is a token sink and a risk. The multi-turn recipe bakes a maxTurns cap into the conversation record — 10 is a reasonable default — and escalates to a human when it's hit.

Zombie threads. Someone replies to a conversation that went quiet weeks ago. Decide the behavior up front; a sane rule is escalating anything dormant past 168 hours (one week) rather than letting the agent auto-resume with stale context.

Multiple repliers on one thread. CC someone and you've invited a second voice into the conversation — two people might both reply to the same agent message. Process each inbound independently, and check whether the agent has already responded since the last inbound before generating another reply.

Lost state. The gap between turns can be days, so conversation records live in Postgres, Redis with AOF, DynamoDB — anything that survives restarts. In-memory state means every deploy lobotomizes your agent mid-conversation.

Closing the loop: escalation and completion

Not every conversation ends with the agent's final word, and the exits deserve code too. Escalation is a state change plus a notification:

async function escalate(conversation, reason) {
  await db.conversations.update(conversation.threadId, {
    step: "escalated",
    metadata: { ...conversation.metadata, escalationReason: reason },
  });
  await notifyHumanOperator({
    threadId: conversation.threadId,
    contact: conversation.contactEmail,
    reason,
  });
}

Completion is the same move with step: "completed" — and it's not just bookkeeping. When the prospect books the meeting or the support question gets answered, marking the record done changes how the next inbound on that thread routes: it hits the closed branch of your router instead of generating an out-of-context continuation. The state machine's exits are what make its middle states trustworthy.

One last note on the front door: verify the X-Nylas-Signature header before your handler does anything. An unverified webhook endpoint is an API that lets anyone on the internet make your agent send email.

Where to start

Build the loop in this order: webhook handler with the three guard checks → thread fetching → a hardcoded reply (no LLM yet) → verify threading works in a real mail client → then swap in the model. Wiring the LLM first is the classic mistake; you end up debugging prompt quality and webhook delivery simultaneously.

Which failure mode bit you first? Mine's universal enough that I'll guess: the agent replied to itself.