Tudor Brad

Posted on Apr 9 • Originally published at betterqa.co

The test case mistakes we see on every new client project

#testing #automation #devops #webdev

I lead QA onboarding at BetterQA. When a new client signs on, one of the first things I do is audit their existing test suite. I open it up, scroll through a few hundred test cases, and within about twenty minutes I can tell you exactly how much of it is useful.

Usually? About half.

That might sound harsh, but after doing this across dozens of client projects with a team of 50+ engineers, the patterns are so consistent it's almost boring. The same mistakes, the same dead weight, the same "we wrote these two years ago and nobody's touched them since."

The worst version of this is inheriting a 2,000-test suite where the team proudly tells you their pass rate is 97%. Then you look closer and realize 600 of those tests have no real assertions. Another 300 are duplicates with slightly different names. A hundred are flaky and get re-run until they pass. The 97% number is meaningless. It just makes everyone feel good while bugs keep shipping to production.

Here's what I keep finding.

Tests that test the framework, not the app

This is the single most common problem, and it's the sneakiest one because the tests look legitimate. They run. They pass. They show up green in the CI pipeline. Everyone's happy.

But the test isn't actually verifying that your application does something correctly. It's verifying that React renders a component. Or that a form element exists on the page. Or that clicking a button fires an event handler.

I saw a suite last year where someone had written 40 tests for a checkout flow. Every single one was checking that UI elements rendered. Not one test verified that an order was actually created, that inventory was decremented, or that the payment was processed. The checkout could have been completely broken and all 40 tests would still pass.

The fix is simple but requires discipline: every test needs to assert something about your business logic, not about whether your framework is doing its job. If you're testing that a button exists, that's a framework test. If you're testing that clicking the button creates an order with the correct line items, that's an application test.

Tests with no meaningful assertions

Related but distinct from the framework problem. These tests go through a whole flow, click things, fill out forms, navigate between pages, and then... nothing. No assertion at the end. Or a single assertion that checks something trivial, like the page title.

I opened a Cypress suite for a client last quarter and found 15 tests that navigated to various pages and asserted cy.url().should('include', '/dashboard'). That was it. The tests confirmed you could reach the dashboard. They said nothing about whether the dashboard was showing the right data, whether the charts loaded, whether the filters worked.

The tester who wrote them probably had good intentions. They were probably under pressure to increase test coverage numbers. So they wrote tests that technically covered pages without actually verifying anything useful.

If your test doesn't have an assertion that would fail when the feature breaks, it's not a test. It's a page visit.

Copy-pasted tests with wrong expected values

This one physically hurts when I find it. Someone writes a solid test for Scenario A. Then they need a similar test for Scenario B, so they copy-paste and change a few things. But they forget to update the expected values. Now you have a test for Scenario B that's asserting Scenario A's expected output, and it's been passing for months because the assertion is loose enough to match both.

We onboarded a fintech client where this was happening in their pricing calculation tests. Three variants of a discount test all expected the same final price, even though the discount percentages were different. Nobody noticed because the tests passed. The actual discount logic had a bug that made all three discounts produce the same result, which was wrong, but the tests said everything was fine.

Copy-paste is fine. But you have to treat every pasted test as a new test. Read the expected values. Ask yourself if they make sense for this specific scenario. Better yet, calculate them independently rather than copying them from the original.

Flaky tests that nobody fixes

Every team has them. Tests that fail randomly, pass on retry, and gradually erode everyone's trust in the suite. The typical lifecycle goes like this:

Test starts failing intermittently
Someone adds a retry mechanism
Retries mask the flakiness
Team stops investigating failures because "it's probably just flaky"
Real bugs start slipping through because failures get dismissed

I've seen teams with 30-40 known flaky tests that they just re-run whenever CI fails. At that point, your CI pipeline isn't catching bugs. It's a slot machine that eventually gives you a green build if you pull the lever enough times.

The painful truth is that flaky tests are usually flaky for a reason: timing dependencies, shared state between tests, hardcoded test data that conflicts with other tests, or assumptions about the order things load. These are fixable problems. They just require someone to sit down and actually diagnose them instead of adding another retry.

At BetterQA, when we inherit a flaky suite, the first thing we do is quarantine the flaky tests. Move them out of the main pipeline. Run them separately. Then fix them one by one. It's tedious work but it's the only way to make the suite trustworthy again.

No separation between smoke, regression, and edge cases

When every test has the same priority and runs in the same pipeline, you end up with 45-minute CI runs where critical path tests are mixed in with obscure edge case validations. A developer pushes a one-line CSS fix and waits 45 minutes to find out if it broke anything.

The result is predictable: people start skipping CI, merging without waiting for tests, or just ignoring red builds because "it's probably that one slow test again."

A healthy suite has layers. Smoke tests that run in under 5 minutes and cover the critical paths. Regression tests that run on merge to main. Edge case and exploratory tests that run nightly or on-demand. When everything is lumped together, nothing gets the attention it deserves.

Test data that's hardcoded and brittle

Hardcoded IDs, specific usernames, dates that assume a certain timezone, URLs that point to a staging server that got decommissioned six months ago. I see all of these constantly.

The worst case I encountered was a test suite that had a user's actual production email address hardcoded in 200+ tests. The tests were hitting a staging API, but if anyone accidentally pointed them at production, they'd spam a real customer with test emails. Beyond the safety issue, those tests broke every time the staging database got refreshed because the hardcoded user no longer existed.

Test data should be created by the test, used by the test, and cleaned up by the test. If your test depends on something that already exists in the database, it's one environment reset away from failing.

Tests that verify implementation, not behavior

This is a subtler problem but it kills test suite longevity. When tests are tightly coupled to implementation details (specific CSS selectors, internal component state, exact API response shapes), any refactoring breaks them even if the behavior is identical.

I've watched teams avoid refactoring because "it would break too many tests." That's backwards. Tests should give you confidence to refactor. If they're blocking refactors, they're testing the wrong things.

Test the behavior the user sees. The login form accepts credentials and redirects to the dashboard. The search returns relevant results. The export generates a file with the correct data. If you refactor the internals and those behaviors still work, your tests should still pass.

No traceability between tests and requirements

This is the organizational problem underneath all the technical ones. When tests aren't linked to requirements, user stories, or bug reports, nobody knows which tests matter and which are leftovers from features that were redesigned or removed.

We built BugBoard partly because of this problem. When you can see which tests are actually catching bugs versus which ones have been passing quietly for two years without ever failing, you start to understand the real health of your suite. A test that has never failed might be rock-solid validation of a stable feature. Or it might be testing nothing useful. Without traceability, you can't tell the difference.

How we fix this when onboarding clients

When we take over a test suite, the process looks like this:

Audit pass: read every test, tag it with what it actually validates, flag the ones with weak or missing assertions
Quarantine flaky tests: pull them out of the main pipeline, track them separately
Prioritize by risk: map tests to features ranked by business impact, find the gaps where critical features have no coverage
Kill the dead weight: delete tests that test framework behavior, have no assertions, or duplicate other tests
Fix what remains: stabilize the flaky tests, update hardcoded data, decouple from implementation details

It's not glamorous work. It takes time. But the difference between a 2,000-test suite with 50% useful coverage and a 900-test suite with 95% useful coverage is enormous. The smaller suite runs faster, fails for real reasons, and actually catches bugs before they ship.

The uncomfortable math

If you have 1,000 tests and 400 of them are noise, every developer on your team is waiting for those 400 useless tests to run on every CI build. Multiply that wait time by the number of builds per day, the number of developers, and the number of working days in a year. You're burning weeks of engineering time on tests that provide zero value.

That's before you count the cognitive cost. When developers see tests fail and their first reaction is "it's probably flaky," you've already lost. The test suite has become background noise instead of a safety net.

Start with honesty

The hardest part of fixing a test suite is admitting it needs fixing. Nobody wants to hear that the 2,000 tests they spent months writing are half useless. But the alternative is continuing to invest in something that gives you false confidence while bugs keep reaching production.

If you want to see the patterns I've described in your own suite, start with one question: for each test, what specific bug would this catch? If you can't answer that clearly, the test needs work or removal.

We write about testing patterns, QA team structure, and what we learn from client projects on our blog: betterqa.co/blog

DEV Community