kate astrid

Posted on May 19

How We Halved Our Playwright E2E Suite

#playwright #e2e

Five patterns that took a slow, flaky Playwright suite down to fast and stable.

We had a Playwright E2E suite that had grown into a painful, flaky part of CI.

Dozens of login/logout cycles per file. Fixed waitForTimeout calls everywhere. Hundreds of UI clicks just to set up state before each assertion.

Every other run, something would fail. We'd retry, hope for green, and move on.

Eventually we paid the tax.

The Before / After

Metric	Before	After
Tests	95	88
Runtime (clean)	~37 min	~17 min
CI reliability	~50% flaky runs	Stable with normal retries

Across the broader suite, a similar pass:

dropped about a dozen low-value tests,
replaced one E2E test with a focused unit suite that runs in 1.2 seconds instead of 30,
and stripped fixed waits from the most-trafficked files.

Five patterns did most of the work.

Pattern 1: Programmatic Auth

With 40+ login/logout cycles in a file, that's two minutes of click ceremony before you've asserted anything useful.

Most modern auth setups support a programmatic path:

Sign in via the auth provider's REST endpoint.
Write the resulting token to localStorage (or wherever your client SDK reads from).
Skip the form entirely.

Mirror it for logout — clear the session storage and cookies, then navigate to /login.

That last navigation is the part I'd skip on first attempt and regret.

Without it, after clearing storage the page is still on /dashboard/something and the SPA detects "no session" and starts its own redirect to /login.

The next loginAs then races that redirect, and you get sporadic "navbar not visible" errors that look like flakes but are actually deterministic timing bugs.

Pattern 2: API-Driven Setup

This was the single biggest win for us.

Most of what tests do to set up state is already callable via REST — those endpoints exist because the app uses them.

There's no reason to drive the UI to:

add a user to a group,
configure a setting,
assign a role,
or seed fixtures

when you could just POST directly.

Before: six UI steps for setup:

await page.goto('/admin/groups');
await page.getByRole('row', { name: /Test Group/ }).click();
await page.getByRole('button', { name: 'Add Member' }).click();
await page.getByPlaceholder('Search users').fill(TEST_USER_EMAIL);
await page.getByRole('option', { name: TEST_USER_EMAIL }).click();
await page.getByRole('button', { name: 'Add' }).click();
// ... wait for it to persist, navigate back, etc.

After: one API call:

async function ensureUserInGroup(page: Page) {
  const headers = await getAuthHeaders(page);

  await page.request.post(
    `/api/groups/${GROUP_ID}/members/add`,
    {
      headers: {
        ...headers,
        'Content-Type': 'application/json',
      },
      data: {
        user_ids: [TEST_USER_ID],
      },
    }
  );
}

Test setup went from ~25–30 seconds to ~3 seconds.

The Tradeoff

API-based setup means a test can pass even if the UI setup flow itself is broken.

We accepted that tradeoff.

We kept dedicated tests for the setup flows themselves, and the rest of the suite became free to focus on the behavior it actually existed to verify.

The principle:

Exercise the UI for what you're actually testing.

A test asserting:

“The frontend hides button X for a viewer”

does not need to configure viewer permissions through the UI too.

Pattern 3: State Assertions, Not Fixed Waits

Hardcoded page.waitForTimeout(1500) calls were everywhere — usually under comments like:

// wait for it to commit

On a fast CI day, harmless.

On a slow CI day, flaky.

Instead of:

await page.waitForTimeout(1500);

assert on the actual signal:

await page.waitForSelector('[role="alert"][aria-label="Saved"]');

The result is:

faster on the happy path,
more reliable under load,
and easier to reason about.

Reach for expect.toPass() with a polled async block when the signal comes from another page or from eventual backend consistency.

Pattern 4: For “Was This Persisted?” — Ask the API, Not the DOM

This was the subtlest bug in the suite, and the one we learned the hard way after a CI regression we couldn't reproduce locally.

If your frontend uses an optimistic-update library like:

RTK Query
React Query
SWR

then the DOM can temporarily lie about persistence.

A row hidden by an optimistic delete looks identical to a row that was actually deleted — until the cache refetches in the background and the row pops back into existence.

A toHaveCount(0) assertion succeeds against the optimistic state, then the next assertion in the test finds the row again.

When you're asserting that persisted state changed, poll the backend until the change is actually committed:

await expect(async () => {
  const resp = await page.request.get(
    '/api/.../items',
    { headers }
  );

  const items = await resp.json();

  expect(
    items.some((i) => i.name === itemName)
  ).toBe(false);
}).toPass({
  timeout: 30000,
  intervals: [1000, 2000, 3000],
});

Same idea for permission revocation:

await expect(async () => {
  const resp = await page.request.get(
    '/api/.../resource',
    { headers }
  );

  expect(resp.status()).toBeGreaterThanOrEqual(400);
}).toPass({
  timeout: 90000,
});

The distinction matters:

The DOM answers:

“What is rendered right now?”

The API answers:

“Did this actually commit?”

Those are not always the same question.

Pattern 5: Audit for Redundancy

After the speedups, the suite was still larger than it needed to be.

On honest review, three categories of redundancy stood out.

Symmetric Mirror Tests

Examples:

“User has elevated role + base permission”
“User has base role + elevated permission”

What we see here:

Different setup states.
Identical observable assertions.
Same backend path.

One test covered the public contract sufficiently.

Cleanup Tests Duplicating Hooks

We had tests whose sole purpose was deleting leftover fixtures.

But the suite already had afterAll cleanup hooks with retry safety.

The cleanup “tests” added runtime, not coverage.

Generic Component Smoke Tests

Several tests asserted things like:

the list loads,
pagination works,
columns render.

But those shared components were already exercised by nearly every feature test in the suite.

The smoke tests weren't paying for themselves.

The audit removed roughly a third of tests with no meaningful coverage loss.

A useful question turned out to be:

“Is this test distinct?”

not merely:

“Does this test pass?”

What We Didn't Do

Some dead ends are worth documenting too.

Parallelizing Tests That Shared Accounts

Concurrent logins for the same backend user triggered session revocation cascades across contexts.

We reverted it.

Reusing Auth Sessions Across Browser Contexts

Several client SDKs rejected rehydrated sessions across multiple contexts.

Each loginAs() now performs a fresh REST sign-in.

The overhead (~200ms) was small enough that determinism mattered more.

Using Short Retry Budgets for Permission Assertions

Eventually consistent systems can take longer than expected, especially during deploys.

A 15-second retry window was not enough.

We replaced short retries with explicit API polling patterns.

Takeaways

If you're using the UI as a setup harness, you're paying twice.
Use APIs to create state and reserve the UI for the behavior you actually care about.
Replace waitForTimeout() with state assertions wherever possible.
Fixed waits are bets against your worst CI day.
For “was this persisted?” — ask the API, not the DOM.
The DOM can be optimistic, cached, or mid-render.
Programmatic auth compounds across every test.
The savings stack up quickly, and you eliminate subtle redirect races.
Audit your suite for true redundancy.
“All passing” is not the same as “all paying for themselves.”

There’s no silver bullet here — just a stack of small, principled changes that compound.

Fast E2E suites usually aren't the result of one big optimization.

They're the result of removing dozens of tiny sources of unnecessary work.

DEV Community