The QA team that tried to go 80% automated and what actually happened

#testing #automation #devops #webdev

A fintech client came to us in 2023 with a clear goal: get to 80% test automation in six months. They had 400 manual test cases, a team of four QA engineers who'd never written code, and a CTO who'd read a McKinsey report about automation ROI. The number 80% came from that report, not from any analysis of their actual product.

We got to 45% automation in eight months. The client was disappointed for about two weeks, until they realized their regression cycle dropped from five days to one and a half. The 45% we automated were the right 45%. The remaining 55% were things that genuinely needed human judgment, and trying to automate them would have produced a flaky, unmaintainable mess.

That project taught us more about automation transformation than any best practice guide.

The part everyone skips: figuring out what to automate

The first thing we did was sort those 400 test cases into three buckets. Bucket one: tests that run every sprint, follow the same steps, and check deterministic outcomes. Things like login, CRUD operations on accounts, and balance calculations. These are automation candidates. About 160 of the 400.

Bucket two: tests that involve visual judgment, subjective UX evaluation, or complex multi-system workflows with timing dependencies. A payment reconciliation flow that depends on a third-party bank API response, for example. These stay manual. About 140 tests.

Bucket three: the remaining 100 tests that nobody could clearly categorize. We left these alone for the first three months and revisited them later. About 40 eventually got automated. The rest stayed manual.

Most automation efforts fail because teams skip this sorting step. They try to automate everything, hit the hard cases early, get frustrated, and declare automation "doesn't work for our product." It works fine. You just automated the wrong things first.

The tool fight

The CTO wanted Selenium because he'd used it at a previous company in 2019. Our team recommended Playwright. This turned into a three-week debate that accomplished nothing except burning goodwill.

Here's what we've learned across dozens of these transformations: the tool matters less than people think. Playwright is faster and has better auto-waiting. Cypress has better developer experience for teams already using JavaScript. Selenium has the widest browser support. Pick one, commit, move on. We've seen successful automation suites in all three.

For this client, we went with Playwright because their app was React-based and their dev team already used TypeScript. That alignment matters more than any feature comparison chart.

We use Flows, our own Chrome extension, for teams that want to record tests without writing code. It records browser interactions and replays them with self-healing selectors, which means the test doesn't break every time a developer renames a CSS class. We built it because selector maintenance was eating 30% of our automation team's time on some projects. But for this client, they wanted code-based tests, so Playwright it was.

The first month was painful

We wrote 20 tests in the first month. Most guides would tell you that's too slow. But those 20 tests were solid. They ran in CI, they didn't flake, and they covered the login flow, account creation, the main dashboard load, and basic transaction queries.

What slowed us down was test data. The application didn't have a clean way to seed test data, so every test had to create its own state from scratch. A test that should have been 15 lines was 60 lines because of setup. We spent two weeks building a test data factory before we could move faster.

Nobody talks about test data in automation articles. It's boring. It's also the thing that determines whether your suite takes 4 minutes or 40 minutes to run.

Month three: the flake crisis

By month three we had 80 automated tests and a 72% pass rate on CI. That sounds terrible, and it was. Eight tests were genuinely flaky. They'd pass locally, fail in CI, pass again on retry. The team was spending mornings investigating failures that turned out to be timing issues, not real bugs.

We stopped writing new tests for two weeks and fixed the flaky ones. Most of them had the same root cause: the app used optimistic UI updates, so Playwright would see the expected text before the API call actually completed. When the API was slow in CI (shared resources, less CPU), the test would sometimes catch a loading state instead.

The fix was boring: explicit waits for network idle on specific API calls, not global timeouts. We also added a retry-once policy in CI, which sounds like a hack but reduced our false failure rate from 28% to under 3%.

The honest numbers

After eight months:

180 tests automated out of 400 (45%)
CI run time: 12 minutes for the full suite
Regression cycle: 1.5 days (down from 5)
False failure rate: 2.8%
Tests maintained by: 2 of the 4 QA engineers (the other 2 focused on exploratory testing)
Cost of the automation effort: roughly equivalent to 6 months of one senior engineer's time

Was it worth it? Yes, but not because of some dramatic ROI calculation. It was worth it because those two engineers running manual regressions for five days every sprint were bored, making mistakes, and starting to job-hunt. Automation didn't replace them. It gave them different work. One became the automation lead. The other moved into performance testing, which the team had never done before.

What we'd do differently

We should have built the test data factory in week one, not week six. Every automation engagement we've done since then starts with data setup.

We should have set the target at "automate the right things" instead of a percentage. The 80% number created pressure to automate tests that weren't good candidates, and we pushed back successfully, but it took energy that could have gone elsewhere.

We should have involved the developers earlier. For the first two months, the dev team treated our automation suite as "the QA thing." Once we started contributing test utilities back to their codebase and catching bugs in their PR pipeline, they started adding test IDs to their components voluntarily. That collaboration made everything faster.

The pattern we see now

After doing this across multiple clients, the pattern is consistent. Teams that succeed at automation transformation share three things: they pick the right tests to automate first (not all tests), they invest in infrastructure before writing tests (data factories, CI configuration, environment management), and they accept that the final automation percentage will be lower than whatever number someone put in a slide deck.

At BetterQA, we've run these transformations for healthcare clients, fintech platforms, and SaaS products. The tools change, the domain changes, but the mistakes are always the same. Everyone wants to skip straight to writing tests. Nobody wants to set up the data layer. And the target percentage is always too high.

The honest version is less exciting than the pitch deck version. But it's the version that actually ships working software.