We Cut Multi-Browser Regression Testing from 2 Hours to 15 Minutes by Doing Just One Thing

#python #programming

At 1 AM, the testing group chat exploded: “The homepage is blank on Safari—user complaints just hit triple digits.” I opened my laptop, scanned the production error logs, and found a CSS Grid compatibility issue from last week’s changes, one that our regression scripts never even covered Safari. After three months of manually running cross-browser tests, we still ended up relying on human eyes to catch the misses. That was the moment I realized: A team that relies on manual effort to patch browser coverage has no business calling itself agile.

Breaking down the problem

Our scenario is pretty typical: a frontend-heavy B2B SaaS that needs to validate core flows (login, ordering, reports) across Chrome, Firefox, and Safari before every release. The old way looked like this:

Testers recorded scripts with Selenium IDE, exported them to Java/Python, and manually assembled suites for different browser profiles.
Regressions ran sequentially because our in-house Selenium Grid had limited resources—three browsers took almost two hours.
Maintenance was even worse: every UI tweak broke half the locators, so people preferred re-recording from scratch over reading the auto-generated XPath soup.

The root cause wasn’t that “testers can’t write code”; it was that the toolchain pitted auto-generation against maintainability. Selenium IDE produced barely readable scripts with flaky built-in waits—reruns would inevitably throw NoSuchElement. On top of that, parallel cross-browser execution demanded manual job splitting and artifact assembly, so most teams just gave up and lived with sequential runs.

Designing the solution

We needed a new workflow: Record → Refactor → CI-native parallel, where each step doesn’t rely on brittle details.

For the tech stack, I dropped Selenium and Cypress altogether:

Selenium: Browser driver management was too primitive—running Safari meant manually setting up safaridriver, and writing explicit waits drove us nuts. Auto-generation depended on the IDE, and the resulting scripts cost almost as much to maintain as a full rewrite.
Cypress: Great developer experience, but they openly admit “our Safari support is experimental.” Cross-browser parallelism is also heavy—without the Dashboard, you can’t easily run parallel jobs.
Playwright: First, it natively supports Chromium, Firefox, and WebKit (WebKit is Safari’s sibling). Second, the codegen command records user actions and generates readable async/await code. Most importantly, it comes with auto-waiting, trace viewer, and a test runner, covering the entire loop from recording to regression.

Architecturally, we did this: use playwright codegen to capture core user flows as base scripts; spend 20 minutes on a “structural wrapper” (extract Page Objects and replace locators with data-testid); then, in GitLab CI, use parallel:matrix to spawn one job per browser—each job pulls its own container, runs independently, and auto-uploads traces on failure.

Why not run the recorded results directly in CI? Because the locators generated at the time of recording are bound to the temporary page structure (text, CSS hierarchy) and lack resilience. We treat recordings as drafts only. The structural wrap that follows determines whether the suite lives for three months or three days.

Core implementation

1. Recording: Generate the script skeleton in one shot

The command below launches a Chromium locally. After you click through the critical path on the page, it generates a full test file.

npx playwright codegen --target=javascript --output=./tests/login.spec.js https://staging.example.com

Playwright generates code similar to this:

const { test, expect } = require('@playwright/test');

test('login-flow', async ({ page }) => {
  await page.goto('https://staging.example.com');
  await page.fill('input[name="email"]', 'test@example.com');
  await page.fill('input[name="password"]', 'password');
  await page.click('button:has-text("Sign in")');
  await expect(page.locator('h1')).toContainText('Dashboard');
});

What does this code solve? It turns manual clicks into a repeatable script. You don’t write a single locator by hand, and in half an hour you can cover over a dozen core flows.

2. Engineering the wrapper: Make scripts survive two releases

The recorded input[name="email"] is too fragile—if the backend changes the name attribute, it’s instantly broken. We quickly refactored the scripts into a fixture + Page Object pattern and enforced that developers add data-testid on critical elements (this contract saved us).

// fixtures.js —— 抽离 browser/page 启动逻辑，让 CI 矩阵复用
const { test: base } = require('@playwright/test');

exports.test = base.extend({
  // 默认 baseURL 从环境变量读取，方便 CI 切换
  baseURL: async ({}, use) => {
    await use(process.env.BASE_URL || 'https://staging.example.com');
  },
  // 每个测试自动注入已认证的 page
  loggedPage: async ({ page }, use) => {
    await page.goto('/');
    await page.fill('[data-testid="email-input"]', process.env.TEST_USER);
    await page.fill('[data-testid="password-input"]', process.env.TEST_PASS);
    await page.click('[data-testid="login-button"]');
    await page.waitForSelector('[data-testid="dashboard-header"]');
    await use(page);
  },
});

// login.spec.js —— 重构后的用例：清晰、抗耦
const { test, expect } = require('./fixtures');

test('successful login', async

DEV Community