DEV Community

Christian Potvin
Christian Potvin

Posted on

Why Your CI Email Tests Are Flaky (And How to Fix Them)

TL;DR: If your Playwright/Cypress/Pytest tests involving email keep failing intermittently in CI, the cause is almost always shared inbox state. The fix is per-test inbox isolation, not longer timeouts.


The Symptom

Your test suite has a test that looks reasonable:

✓ user can register (3.2s)
✗ user receives welcome email (timeout after 30s)
✓ user receives welcome email (2.1s)  ← same test, re-run, now passes
Enter fullscreen mode Exit fullscreen mode

The test passes locally. It fails in CI. It passes again on retry. Your team has learned to just re-run the pipeline and move on. The underlying problem is never fixed.

This is not an infrastructure problem. It's not "CI is slower than local." It's a state isolation problem.


The Root Cause

When email-dependent tests share an inbox, several things go wrong.

Race conditions in parallel runs

CI pipelines run tests concurrently. If you're using -n 4 in Pytest or --workers=4 in Playwright, four tests may be signing up simultaneously — all using the same test@yourcompany.com address.

Worker 1 signs up. Worker 2 signs up. Both trigger welcome emails. Worker 1 reads the inbox and finds two emails. Which one is its? It doesn't know. It reads the wrong one. The test either fails on assertion or false-passes.

Stale emails from previous runs

A test passes. The inbox now contains a "Welcome" email. The next test run starts. It signs up again, polls the inbox, and immediately finds the email from the previous run — before the new email has even been sent. The test passes — but it's asserting on stale data.

This is why "it works locally, fails in CI" is common: locally you likely run tests serially, so stale emails are less frequent. CI runs them fast and parallel.

The "longer timeout" trap

The instinct is to increase the wait:

time.sleep(10)  # give the email time to arrive
Enter fullscreen mode Exit fullscreen mode

This doesn't fix the race condition. It makes your test suite slower and still non-deterministic.


Why the Obvious Solutions Don't Fully Work

Mocking the email service

If you control the email service, you can intercept emails before sending. But:

  • Many teams use third-party auth providers (Auth0, Cognito, Firebase) that send emails you can't intercept
  • Mocking removes confidence in the real integration
  • It doesn't test whether the email was actually sent, properly formatted, or delivered

A dedicated test Gmail account

Better than sharing, but still has problems:

  • Gmail IMAP is rate-limited; fast parallel tests will hit limits
  • You still need to clean up emails between runs (fragile teardown)
  • Shared between all test workers unless you manage multiple accounts

Mailhog / MailDev (self-hosted SMTP)

Great for testing your own email service, but:

  • Doesn't work when the email originates from a third party (Cognito SES, SendGrid, etc.)
  • Requires infrastructure in CI (Docker, port mapping)
  • Shared inbox per service instance (same race condition problem unless you implement per-test routing)

The Fix: Per-Test Inbox Isolation

The correct architecture is:

  1. Before each test, create a fresh, unique email inbox
  2. Use the inbox's address in the test
  3. Poll that specific inbox for the expected email
  4. The inbox expires automatically — no teardown

This eliminates shared state entirely. Each test is completely independent. Parallel execution is safe by design.


How to Implement It

Python (Pytest)

# conftest.py
import pytest, requests, time, os

API_KEY = os.environ["MINUTEMAIL_API_KEY"]
BASE = "https://api.minutemail.co/v1"
HDRS = {"Authorization": f"Bearer {API_KEY}"}

@pytest.fixture
def inbox():
    mb = requests.post(f"{BASE}/mailboxes", headers=HDRS, json={"domain": "minutemail.cc", "expiresIn": 10}).json()
    yield mb
    # expires automatically

def wait_for_email(mailbox_id, timeout=30):
    deadline = time.time() + timeout
    while time.time() < deadline:
        msgs = requests.get(f"{BASE}/mailboxes/{mailbox_id}/mails", headers=HDRS).json().get("items", [])
        if msgs: return msgs[0]
        time.sleep(2)
    raise TimeoutError("No email")
Enter fullscreen mode Exit fullscreen mode

TypeScript (Playwright)

// fixtures/email.ts
import { test as base } from '@playwright/test';

const headers = { 'Authorization': `Bearer ${process.env.MINUTEMAIL_API_KEY}` };
const BASE = 'https://api.minutemail.co/v1';

export const test = base.extend({
  inbox: async ({}, use) => {
    const mb = await fetch(`${BASE}/mailboxes`, { method: 'POST', headers, body: JSON.stringify({ domain: 'minutemail.cc', expiresIn: 10 }) }).then(r => r.json());
    await use({
      address: mb.address,
      waitForEmail: async (timeout = 30000) => {
        const deadline = Date.now() + timeout;
        while (Date.now() < deadline) {
          const { items } = await fetch(`${BASE}/mailboxes/${mb.id}/mails`, { headers }).then(r => r.json());
          if (items.length) return items[0];
          await new Promise(r => setTimeout(r, 2000));
        }
        throw new Error('Email timeout');
      }
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

With this pattern, parallel CI becomes stable by design, not by accident.


What to Use for the Inbox Service

You need a hosted service with:

  • Per-inbox API: create a new inbox on demand, get a unique address back
  • TTL control: inbox expires automatically (no cleanup logic needed)
  • Message polling API: read received emails programmatically
  • Reliable delivery: must actually receive emails from third-party senders

A few options:

Service API Per-inbox TTL Free tier Notes
Mailinator Limited No Yes Known domains often blocked
Mailtrap Yes No Yes (limited) Better for SMTP testing
MinuteMail Yes Yes (1–60 min) Yes (100 calls/day) Built for this use case

MinuteMail was built specifically for developer/QA use cases: each POST /mailboxes returns a unique address with configurable TTL. Full docs at https://docs.minutemail.co.


The Outcome

After switching to per-test inbox isolation:

  • Tests that were "retry to fix" become deterministic
  • Parallel CI works correctly without coordination between workers
  • You get confidence that the full email flow works end-to-end, not just that your code tried to send

The pattern requires ~20 lines of fixture code. The payoff is a test suite you can trust.


Tags: #testing #devops #automation #ci

Created: 2026-02-26

Top comments (0)