Test Email Flows in CI With Disposable Mailboxes

#testing #cicd #email #devops

A PR touches your password-reset template. Every unit test passes, the snapshot tests pass, CI goes green, and the deploy ships a reset email whose link 404s — because nothing in the pipeline ever received an actual email. If your product sends mail that users must act on, the only test that counts is one where a real message lands in a real inbox and an assertion reads it.

The blocker has always been the inbox. Shared Gmail catch-alls need OAuth on the test runner, forwarding rules, and label gymnastics to scope per-PR runs — and two CI workers reading the same inbox at the same moment is the single biggest source of flakiness in email-dependent E2E tests. Here's the setup that removes the shared inbox entirely, based on the E2E email testing recipe.

One wildcard inbox, infinite addresses

One-time setup with the CLI:

nylas inbound create e2e

You get back an inbox ID and a wildcard pattern shaped like e2e-*@yourapp.nylas.email. Mail to any address matching that pattern lands in the inbox — and addresses cost nothing to mint, because the wildcard is just a convention. Each test generates its own UUID-suffixed address, and suddenly every test has a private mailbox. MX is handled upstream; there's nothing to configure in your DNS.

The per-test fixture

In Playwright, two fixtures do the work: one mints an address, one polls for the matching message.

import { test as base, expect } from "@playwright/test";
import { execSync } from "node:child_process";
import { randomUUID } from "node:crypto";

export const test = base.extend<{
  testEmail: string;
  pollInbox: (timeoutMs?: number) => Promise<any>;
}>({
  testEmail: async ({}, use) => {
    await use(`e2e-${randomUUID()}@yourapp.nylas.email`);
  },
  pollInbox: async ({ testEmail }, use) => {
    const poll = async (timeoutMs = 30_000) => {
      const deadline = Date.now() + timeoutMs;
      while (Date.now() < deadline) {
        const out = execSync(
          `nylas inbound messages ${process.env.INBOX_ID} --json --limit 50`,
        ).toString();
        const match = JSON.parse(out).find((m: any) =>
          m.to.some((t: any) => t.email === testEmail),
        );
        if (match) return match;
        await new Promise((r) => setTimeout(r, 1500));
      }
      throw new Error(`Email never arrived for ${testEmail}`);
    };
    await use(poll);
  },
});

The numbers are tuned from practice: delivery latency is typically under 5 seconds, so the 1.5-second poll interval catches most messages within two iterations, and the 30-second default timeout is generous for almost any flow. A test then reads naturally — fill the signup form with testEmail, poll, extract the verification link, navigate, assert:

test("signup completes after verification", async ({ page, testEmail, pollInbox }) => {
  await page.goto("/signup");
  await page.getByLabel("Email").fill(testEmail);
  await page.getByRole("button", { name: "Create account" }).click();

  const msg = await pollInbox();
  const link = msg.body.match(/https:\/\/[^\s"<]+\/verify\?[^\s"<]+/);
  await page.goto(link![0]);
  await expect(page.getByText("Email verified")).toBeVisible();
});

OTP flows are the same shape with a different regex — match \b\d{6}\b in the body and type it into the form:

const msg = await pollInbox();
const code = msg.body.match(/\b\d{6}\b/)?.[0];
expect(code).toBeDefined();
await page.getByLabel("Verification code").fill(code!);

One gotcha from the recipe: email bodies often contain several 6-digit numbers — phone numbers, transaction IDs, zip codes in footers. If your template is busy, anchor the regex to nearby text or use the Extract OTP codes helper for sturdier extraction.

When the link hides in HTML

Plain-text regex works until the verification URL only exists inside an <a href> in the HTML part. Then parse it properly instead of fighting the markup with regex:

import * as cheerio from "cheerio";

const $ = cheerio.load(msg.html);
const link = $('a:contains("Verify your email")').attr("href");

Selecting by the link's visible text doubles as a content assertion — if a copy change renames the button, the test fails loudly instead of following the wrong link. That's a feature in a template-regression suite, not a fragility.

Dropping it into the pipeline

In GitHub Actions, the moving parts are: install the CLI, authenticate with a stored API key, export the inbox ID, run the suite.

jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: curl -fsSL https://cli.nylas.com/install.sh | bash
      - run: nylas init --api-key ${{ secrets.NYLAS_API_KEY }}
      - run: npx playwright test
        env:
          INBOX_ID: ${{ secrets.INBOX_ID }}

The inbox itself is long-lived — created once, reused by every run. What's disposable is the address: each test burns a fresh UUID, so runs never collide. That's also why this is parallel-safe by construction. Playwright's fullyParallel: true spreads tests across workers, but each worker only ever matches on its own to: address. No filtering by subject, no "wait for the right message to bubble up."

Where the edges are

Worth knowing before you commit:

execSync blocks. Fine for most suites; for chatty ones, switch to async exec and await polls in parallel.
Addresses live under *.nylas.email. That's the tradeoff for zero DNS setup. If your app validates recipient domains strictly, allow that pattern in test environments.
Messages persist for the standard retention window. Tests don't need cleanup for correctness, though you can mark messages read after each test for a quieter inbox when debugging:

  test.afterEach(async () => {
    execSync(`nylas inbound messages-read ${process.env.INBOX_ID} --to ${testEmail}`);
  });

This receives; it doesn't send. When your test needs the other direction — an address that sends mail, replies in-thread, or RSVPs — that's what Agent Accounts are for: full hosted mailboxes, currently in beta, driven through the same CLI and API.

Quick FAQ

Does each PR or branch need its own inbox? No. The inbox is created once and shared; the per-test UUID addresses are what partition runs. Your INBOX_ID secret never changes, which keeps the CI config boring — exactly what you want from CI config.

What about Playwright's parallelism settings? Nothing special. With fullyParallel: true, every worker shares the INBOX_ID but mints unique addresses, and the wildcard endpoint accepts mail to all of them simultaneously.

What if a flow is slower than 30 seconds? The fixture takes timeoutMs as an argument — pollInbox(60_000) for that one slow digest test, default everywhere else. Keep the default tight so genuinely missing mail fails fast.

The first test to write is whichever email flow last broke in production — for most teams that's password reset. Port one test this week and see if the polling fixture holds up against your real delivery times. What's the flakiest email assertion in your suite right now?