Christian Potvin

Posted on Mar 17

Why Your CI Email Tests Are Flaky (And How to Fix Them)

#testing #devops #automation #ci

TL;DR: If your Playwright/Cypress/Pytest tests involving email keep failing intermittently in CI, the cause is almost always shared inbox state. The fix is per-test inbox isolation, not longer timeouts.

The Symptom

Your test suite has a test that looks reasonable:

✓ user can register (3.2s)
✗ user receives welcome email (timeout after 30s)
✓ user receives welcome email (2.1s)  ← same test, re-run, now passes

The test passes locally. It fails in CI. It passes again on retry. Your team has learned to just re-run the pipeline and move on. The underlying problem is never fixed.

This is not an infrastructure problem. It's not "CI is slower than local." It's a state isolation problem.

The Root Cause

When email-dependent tests share an inbox, several things go wrong.

Race conditions in parallel runs

CI pipelines run tests concurrently. If you're using -n 4 in Pytest or --workers=4 in Playwright, four tests may be signing up simultaneously — all using the same test@yourcompany.com address.

Worker 1 signs up. Worker 2 signs up. Both trigger welcome emails. Worker 1 reads the inbox and finds two emails. Which one is its? It doesn't know. It reads the wrong one. The test either fails on assertion or false-passes.

Stale emails from previous runs

A test passes. The inbox now contains a "Welcome" email. The next test run starts. It signs up again, polls the inbox, and immediately finds the email from the previous run — before the new email has even been sent. The test passes — but it's asserting on stale data.

This is why "it works locally, fails in CI" is common: locally you likely run tests serially, so stale emails are less frequent. CI runs them fast and parallel.

The "longer timeout" trap

The instinct is to increase the wait:

time.sleep(10)  # give the email time to arrive

This doesn't fix the race condition. It makes your test suite slower and still non-deterministic.

Why the Obvious Solutions Don't Fully Work

Mocking the email service

If you control the email service, you can intercept emails before sending. But:

Many teams use third-party auth providers (Auth0, Cognito, Firebase) that send emails you can't intercept
Mocking removes confidence in the real integration
It doesn't test whether the email was actually sent, properly formatted, or delivered

A dedicated test Gmail account

Better than sharing, but still has problems:

Gmail IMAP is rate-limited; fast parallel tests will hit limits
You still need to clean up emails between runs (fragile teardown)
Shared between all test workers unless you manage multiple accounts

Mailhog / MailDev (self-hosted SMTP)

Great for testing your own email service, but:

Doesn't work when the email originates from a third party (Cognito SES, SendGrid, etc.)
Requires infrastructure in CI (Docker, port mapping)
Shared inbox per service instance (same race condition problem unless you implement per-test routing)

The Fix: Per-Test Inbox Isolation

The correct architecture is:

Before each test, create a fresh, unique email inbox
Use the inbox's address in the test
Poll that specific inbox for the expected email
The inbox expires automatically — no teardown

This eliminates shared state entirely. Each test is completely independent. Parallel execution is safe by design.

How to Implement It

Python (Pytest)

# conftest.py
import pytest, requests, time, os

API_KEY = os.environ["MINUTEMAIL_API_KEY"]
BASE = "https://api.minutemail.co/v1"
HDRS = {"Authorization": f"Bearer {API_KEY}"}

@pytest.fixture
def inbox():
    mb = requests.post(f"{BASE}/mailboxes", headers=HDRS, json={"domain": "minutemail.cc", "expiresIn": 10}).json()
    yield mb
    # expires automatically

def wait_for_email(mailbox_id, timeout=30):
    deadline = time.time() + timeout
    while time.time() < deadline:
        msgs = requests.get(f"{BASE}/mailboxes/{mailbox_id}/mails", headers=HDRS).json().get("items", [])
        if msgs: return msgs[0]
        time.sleep(2)
    raise TimeoutError("No email")

TypeScript (Playwright)

// fixtures/email.ts
import { test as base } from '@playwright/test';

const headers = { 'Authorization': `Bearer ${process.env.MINUTEMAIL_API_KEY}` };
const BASE = 'https://api.minutemail.co/v1';

export const test = base.extend({
  inbox: async ({}, use) => {
    const mb = await fetch(`${BASE}/mailboxes`, { method: 'POST', headers, body: JSON.stringify({ domain: 'minutemail.cc', expiresIn: 10 }) }).then(r => r.json());
    await use({
      address: mb.address,
      waitForEmail: async (timeout = 30000) => {
        const deadline = Date.now() + timeout;
        while (Date.now() < deadline) {
          const { items } = await fetch(`${BASE}/mailboxes/${mb.id}/mails`, { headers }).then(r => r.json());
          if (items.length) return items[0];
          await new Promise(r => setTimeout(r, 2000));
        }
        throw new Error('Email timeout');
      }
    });
  }
});

With this pattern, parallel CI becomes stable by design, not by accident.

What to Use for the Inbox Service

You need a hosted service with:

Per-inbox API: create a new inbox on demand, get a unique address back
TTL control: inbox expires automatically (no cleanup logic needed)
Message polling API: read received emails programmatically
Reliable delivery: must actually receive emails from third-party senders

A few options:

Service	API	Per-inbox TTL	Free tier	Notes
Mailinator	Limited	No	Yes	Known domains often blocked
Mailtrap	Yes	No	Yes (limited)	Better for SMTP testing
MinuteMail	Yes	Yes (1–60 min)	Yes (100 calls/day)	Built for this use case

MinuteMail was built specifically for developer/QA use cases: each POST /mailboxes returns a unique address with configurable TTL. Full docs at https://docs.minutemail.co.

The Outcome

After switching to per-test inbox isolation:

Tests that were "retry to fix" become deterministic
Parallel CI works correctly without coordination between workers
You get confidence that the full email flow works end-to-end, not just that your code tried to send

The pattern requires ~20 lines of fixture code. The payoff is a test suite you can trust.

Tags: #testing #devops #automation #ci

Created: 2026-02-26

DEV Community