Tudor Brad

Posted on Apr 9 • Originally published at betterqa.co

The automation mistakes we keep fixing on inherited test suites

#testing #webdev #automation

I have inherited a lot of test suites. Some were built by contractors. Some were built by developers who drew the short straw. A few were started by QA engineers who left the company before anyone else learned how the framework worked.

They all break in the same ways.

At BetterQA, automation suite maintenance is a significant chunk of our work. We build suites from scratch, yes, but we also take over existing ones. And after years of doing this across dozens of clients and tech stacks, I can tell you the failure modes are remarkably consistent.

Here are the mistakes I keep seeing, what they actually cost, and how we fix them.

Hardcoded waits everywhere

This is the single most common problem. Open any inherited suite and you will find sleep(5000) or cy.wait(5000) or time.sleep(5) scattered through the code like confetti.

I understand why it happens. A test is flaky. The page takes a moment to load. Someone adds a wait, the test passes, the PR gets merged. Problem solved, right?

No. Problem deferred.

Here is what hardcoded waits actually cost you:

They make your suite slow. A 5-second wait runs for 5 seconds whether the element appeared in 200 milliseconds or 4.9 seconds. Multiply that across 300 tests and you have added 25 minutes of pure wasted time to every CI run. That is 25 minutes your developers sit waiting for a green check before they can merge.

They mask real problems. If your app genuinely takes 5 seconds to render a button, that is a performance bug. A hardcoded wait hides that bug. An explicit wait with a reasonable timeout surfaces it.

They are still flaky. The app loads in 5 seconds on your machine. On the CI runner with limited resources, it takes 7 seconds. Now the test fails again and someone bumps the wait to 10.

The fix is straightforward but requires discipline. Replace every hardcoded wait with an explicit condition: wait for the element to be visible, wait for the network request to complete, wait for the loading spinner to disappear. Playwright and Cypress both have built-in mechanisms for this. Use them.

// This is the problem
await page.waitForTimeout(5000);
await page.click('#submit');

// This is the fix
await page.waitForSelector('#submit', { state: 'visible' });
await page.click('#submit');

When we take over a suite, the first thing we do is search for sleep, wait, and timeout calls. Replacing those alone typically cuts suite runtime by 30-40%.

No page object pattern (or an abandoned one)

The second most common problem is raw selectors duplicated across dozens of test files. The login page selector #email-input appears in 40 different tests. The dashboard navigation selector .nav-item.active shows up in 60.

Then the frontend team renames a CSS class and 60 tests break simultaneously.

The page object pattern exists specifically to solve this. You define your selectors in one place, your tests reference the page object, and when the UI changes you update one file instead of 60.

What I see more often than no page objects at all is an abandoned page object pattern. Someone started it, created page objects for the login page and maybe the dashboard, and then the team got busy and started writing selectors inline again. Now you have a codebase with two patterns, and you have to check both places when something breaks.

If you are going to use page objects, commit to them. Every new test file should use them. If you are reviewing a PR that introduces a raw selector for a page that already has a page object, send it back.

We have also started using Flows, our Chrome extension that records browser interactions and generates self-healing test selectors. The self-healing part matters because it addresses the brittle selector problem directly: if your selector breaks because someone changed a class name, Flows detects the shift and adapts. That removes the most painful part of page object maintenance, which is keeping selectors current when the frontend moves fast.

Testing implementation details instead of behavior

This one is subtle and I still catch experienced engineers doing it.

A test that checks expect(component.state.isLoading).toBe(false) is testing implementation. A test that checks expect(screen.getByText('Dashboard')).toBeVisible() is testing behavior.

Why does the distinction matter? Because implementation changes constantly. Someone refactors the loading state from a boolean to an enum. Someone moves from local state to a global store. Someone replaces the custom spinner with a library component. Every one of those changes breaks the implementation test while the actual user-facing behavior stays identical.

Tests should answer one question: does the user see what they expect to see?

When I audit a suite, I look for tests that reference internal state, internal method names, or specific DOM structure beyond what the user actually sees. Those tests are maintenance liabilities. They will break during refactors that change zero user-facing behavior, and every false failure erodes the team's trust in the suite.

Write your test assertions the way a user would describe the expected result. "I click submit and I see a confirmation message." Not "I click submit and the Redux store's formSubmission.status field equals SUCCESS."

No cleanup between tests

Tests should be independent. Each test should set up its own preconditions and clean up after itself. This is testing 101 and it is violated constantly.

The symptom is test order dependence. Test A creates a user, Test B assumes that user exists, Test C deletes the user. Run them in order and everything passes. Run Test B alone and it fails. Run them in parallel and you get race conditions.

I once inherited a suite where the entire test run depended on the first test creating a specific database seed. If that first test failed for any reason, every subsequent test failed too. The team had been living with this for a year, re-running the suite whenever the first test had a hiccup, and treating it as normal.

That is not normal. That is a test suite that can only give you useful signal when conditions are perfect, which in CI environments is roughly never.

The fix involves two things:

Before-each hooks for setup. Every test (or test group) should create the data it needs. If test B needs a user, test B creates that user in a beforeEach block.

After-each hooks for teardown. Delete what you created. Reset the state. Log out the session. If you are using an API to create test data (which you should be for speed), use that same API to clean up.

// Each test owns its own data
beforeEach(async () => {
  testUser = await api.createUser({
    email: `test-${Date.now()}@example.com`
  });
});

afterEach(async () => {
  if (testUser) {
    await api.deleteUser(testUser.id);
  }
});

This adds a few seconds of setup per test but it eliminates an entire category of flakiness. The tradeoff is worth it every time.

Running everything sequentially when tests could run in parallel

Most test suites I inherit run every test in sequence. 400 tests, one after another, 45 minutes total. The team complains about slow CI. Nobody has tried parallelization.

If your tests are independent (and after fixing the cleanup problem above, they should be), there is no reason they cannot run in parallel. Playwright supports parallel execution out of the box. Cypress has parallelization through their dashboard or through CI matrix strategies. Even pytest can parallelize with pytest-xdist.

The objections I hear are usually:

"Our tests share a database." Then give each parallel worker its own database, or use unique prefixes per worker so the data does not collide.

"Some tests are slow and some are fast, so parallelization does not help much." Use test sharding based on historical run times, not naive splitting by file.

"We tried it and got flaky results." That means you have test isolation problems (see the cleanup section above). Fixing isolation fixes parallelization.

On a recent client project we took a suite from 52 minutes sequential to 11 minutes across 6 parallel workers. Same tests, same CI machine. The only changes were fixing test isolation and enabling Playwright's built-in parallelism.

The real cost of bad automation

A bad test suite is worse than no test suite.

That sounds extreme, but I mean it. A suite full of hardcoded waits, brittle selectors, and order-dependent tests produces two outcomes, both harmful:

First, it creates false failures. Tests break for reasons unrelated to actual bugs. Developers learn to ignore the failures, re-run the suite, and merge anyway when it passes on the second try. At that point the suite is not catching bugs. It is a random gate that sometimes blocks merges for no reason.

Second, it creates false confidence. Tests pass, so the team assumes the feature works. But the tests were checking implementation details that happen to still match, not actual user behavior that might have regressed. Bugs reach production despite a green test suite, and leadership starts questioning whether automation was worth the investment.

The fix is not to abandon automation. The fix is to treat your test suite as production code. It needs code review. It needs refactoring. It needs maintenance. It needs someone who knows what they are doing.

What a healthy suite looks like

After we clean up an inherited suite, the result usually has these properties:

Zero hardcoded waits. Every wait is explicit and condition-based.
Page objects for every page. Selectors live in one place.
Behavior-focused assertions. Tests describe what the user sees, not how the code works internally.
Full test isolation. Any test can run alone or in any order.
Parallel execution. Suite runtime is measured in minutes, not close to an hour.
Self-healing selectors where possible. Tools like Flows reduce maintenance when the UI changes frequently.

None of this is revolutionary. It is basic engineering discipline applied to test code. The problem is that test code rarely gets the same attention as application code, and the debt accumulates until someone inherits the suite and has to deal with it.

If you are building a suite from scratch, build it right from the start. If you have inherited one that has these problems, fix them incrementally: start with the waits, then add page objects for the most-referenced pages, then fix isolation one test group at a time.

And if you would rather hand that work to someone who has done it dozens of times before, that is literally what we do.

More on automation, testing strategy, and QA engineering at betterqa.co/blog.

DEV Community