I've been building with Claude Code for the past few months. It's genuinely changed how I work — I can ship features in hours that used to take days.
But I hit a wall I didn't expect.
Claude Code built me a complete authentication system. NextAuth, email verification, password reset, the works. Every line of code was correct. The tests passed. The PR looked great.
Then I tried to run the E2E tests in CI.
They failed. Every single one.
The problem
Claude Code can write the code that sends an email. It can write the test that checks if an email was sent. But it cannot actually receive that email, read it, and verify the link works.
Here's what the test looked like:
test('user can verify email after signup', async ({ page }) => {
await page.goto('/signup');
await page.fill('[name="email"]', 'test@example.com');
await page.click('[type="submit"]');
// ??? how do we get the verification email here?
const verificationLink = ???;
await page.goto(verificationLink);
await expect(page).toHaveURL('/dashboard');
});
Claude Code filled in everything except the part that actually mattered.
It suggested three approaches:
- Mock the email service — "intercept the sendEmail call and return the link directly"
- Use MailHog — "run a local SMTP server to catch emails"
- Check the database — "query the verification_tokens table directly"
All three are wrong for the same reason: they don't test whether the email actually arrives.
Why mocking is lying
When you mock the email service, you're testing this:
Your app → [mock] → test passes
When you should be testing this:
Your app → email provider → real inbox → verification link → test passes
The difference matters. Mocking passes even when:
- Your Resend API key has expired
- Your email template has a broken link
- Your DNS is misconfigured and emails are going to spam
- Your email provider is down
- Your verification token is being generated incorrectly
I've been burned by all of these in production. A mocked test would have caught none of them.
Why MailHog doesn't work in CI
Claude Code's second suggestion — MailHog — is the traditional answer. And it works locally.
In CI it's a different story:
services:
mailhog:
image: mailhog/mailhog
ports:
- 1025:1025
- 8025:8025
This adds 15-30 seconds of cold start time to every CI run. Parallel tests share one inbox, which means race conditions. And MailHog hasn't been maintained since 2020 — the Docker image has known vulnerabilities.
More importantly: MailHog doesn't test real email delivery. It's a fake SMTP server that catches outbound emails before they leave your network. You're not testing whether your email actually reaches an inbox — you're testing whether your app can connect to a local port.
The actual solution
The problem I needed to solve was simple: I need a real email address that my CI pipeline can read programmatically.
I built ZeroDrop for this.
import { ZeroDrop } from 'zerodrop-client';
const mail = new ZeroDrop();
test('user can verify email after signup', async ({ page }) => {
// Generate a real, isolated inbox for this test
const inbox = process.env.TEST_INBOX ?? mail.generateInbox();
// → "swift-x7k2m@zerodrop-sandbox.online"
await page.goto('/signup');
await page.fill('[name="email"]', inbox);
await page.click('[type="submit"]');
// Wait for the real email to arrive — SSE delivery, sub-second
const email = await mail.waitForLatest(inbox, { timeout: 30000 });
// Magic link auto-extracted — no regex
expect(email.magicLink).not.toBeNull();
await page.goto(email.magicLink!);
await expect(page).toHaveURL('/dashboard');
});
This test:
- Sends a real email through your actual email provider (Resend, SendGrid, Postmark)
- Catches it in a real inbox
- Extracts the verification link automatically
- Navigates to it and verifies the flow works end-to-end
If your API key expires — the test fails. If your email template has a broken link — the test fails. If your DNS is misconfigured — the test fails. That's the point.
The GitHub Actions setup
- name: Generate test inbox
id: inbox
uses: zerodrop-dev/create-inbox@8706a59 # v1.0.0
- name: Run E2E tests
run: npx playwright test
env:
TEST_INBOX: ${{ steps.inbox.outputs.inbox }}
RESEND_API_KEY: ${{ secrets.RESEND_API_KEY }}
Each CI run gets a fresh, isolated inbox. Parallel jobs don't share state. No Docker. No cold start. No MailHog.
What this means for AI-built apps
Claude Code is remarkably good at building auth systems. The code it generates is correct, well-structured, and follows best practices. But it has a blind spot: it can't verify that the system actually works end-to-end because it can't interact with the physical world.
Email delivery is part of the physical world. Your email provider, your DNS configuration, your email templates — none of these exist in Claude Code's context window. It can write code that calls the API. It can't verify the email arrives.
This is the infrastructure gap that AI coding tools expose: as agents ship code faster, the verification layer becomes the bottleneck.
ZeroDrop is one piece of that layer — the email piece. The test that Claude Code couldn't write, ZeroDrop makes possible.
The broader pattern
Every auth-enabled app built with an AI coding tool will eventually hit this wall:
- Cursor builds the auth flow → tests mock the email → CI is green → production breaks
- Claude Code builds the password reset → MailHog catches it locally → CI Docker fails → nobody notices
- Devin ships the signup flow → email verification is skipped in tests → token expiry bugs reach users
The pattern is the same: AI builds fast, verification infrastructure hasn't caught up.
The fix isn't to build slower. It's to give your AI agent the infrastructure layer it's missing.
Try it
npm install zerodrop-client
Free tier. No signup. No API key. No Docker.
The test Claude Code couldn't finish — you can finish it in 5 minutes.
Top comments (0)