When to test what: honest notes from eight years of picking the wrong strategy

#testing #automation #security #a11y

A founder called me on a Thursday. He had paid his agency for three weeks of "full test coverage" on a prototype he was about to demo to investors. The demo was on Monday. Over the weekend, his team pivoted the entire data model based on feedback from an advisor, and every single one of those tests became garbage. Three weeks of billed work, gone.

He was upset with the agency. I told him it wasn't the agency's fault. Someone should have asked what stage the product was in before writing a line of test code, and nobody did, including us when he had asked us for a quote two months earlier.

That call stuck with me. Testing advice on the internet tends to be one-size-fits-all. "Write unit tests." "Automate everything." "Shift left." All of it is technically correct, and all of it is wrong if you apply it at the wrong stage. The honest truth is that the test strategy for a prototype is almost the opposite of the strategy for a mature product, and picking the wrong one costs you either quality or time, sometimes both.

I run QA operations across 50+ engineers working with teams in 24 countries. We've made a lot of these mistakes. Here's what we've actually learned about what to test, and what to leave alone, at each stage of a product's life.

Stage one: the prototype that probably won't survive

A prototype is code written to answer a question. Usually the question is something like "would users click this?" or "can we actually build this integration?" or "does this algorithm produce the output we need?"

The honest truth about prototypes is that most of them get thrown away. Not rewritten. Thrown away. The code exists to prove or disprove a hypothesis, and once you have your answer, you move on.

Writing unit tests for a prototype is usually a waste. Writing E2E tests is almost always a waste. I say this as someone who has literally billed clients for doing both, and I'm not proud of it. The story above with the founder wasn't an isolated incident. We've had at least two other projects where we wrote comprehensive automated test suites for prototypes that got scrapped within a month.

What actually helps at the prototype stage is manual exploratory testing. Have someone who didn't write the code sit down and try to use it. Take notes. Not bug reports in Jira, notes. What confused them? What broke in an obvious way? Does the core thing it's supposed to prove actually work?

If you must write automated tests at this stage, write a handful of smoke tests that verify the happy path still runs after changes. Nothing more. The goal is to move fast and learn, not to build a bulletproof codebase that nobody will ever use.

Stage two: the MVP that has paying users

An MVP is different. Real people are giving you real money or real attention, and if it breaks, they leave. But an MVP is also still changing fast. You're probably rewriting chunks of it every two weeks based on what users actually do versus what you assumed they'd do.

This is the stage where most teams get testing wrong in both directions. Either they under-test because "we're still pivoting," and they ship bugs that erode the trust of their first users. Or they over-test, building an elaborate test pyramid for features that get deleted the following sprint.

The right move at MVP stage is selective: heavy testing on the parts that touch money, auth, or data integrity, and light testing on everything else. If your app takes payments, your payment flow gets unit tests, integration tests, and E2E tests. If your app has login, your auth gets the same. Everything else gets smoke tests and exploratory coverage.

I'll give you the under-tested story I promised. A client MVP had a subscription upgrade flow that worked fine in our test environment. We covered the happy path, the cancellation path, and the "invalid card" path. We didn't cover the "user upgrades from a grandfathered legacy plan that had a different billing cycle" path, because nobody mentioned that legacy plans existed. That bug shipped to production. Three customers were double-charged. We refunded them, apologized, and added the missing test. But the trust hit with those customers was real. One of them churned a month later.

That bug wouldn't have been caught by writing more tests. It would have been caught by asking the question "what user states exist that we don't have in our test data?" before we started writing tests. This is the part that tools don't solve.

Stage three: the growth phase where everything breaks

Your product works. Users are signing up. Revenue is climbing. And everything is on fire.

Growth-stage products break in ways that earlier-stage products don't, because scale exposes assumptions that were invisible when you had 50 users. A query that ran in 40 milliseconds at 50 users takes 8 seconds at 50,000. A cache that had 99% hit rate fills up and starts evicting hot data. An API rate limit that was fine for a while suddenly isn't.

This is where you actually need load testing, performance testing, and observability. Not because the gurus say so. Because real things are breaking at 2am and your team is burning out fixing them.

This is also where automated regression testing starts to pay off, because the codebase is big enough that humans can't hold all of it in their heads anymore. You need a computer to tell you that the password reset flow still works, because no human is going to manually test the password reset flow every sprint.

The honest lesson from this stage: build the automated regression suite a little earlier than feels comfortable, but not as early as the textbooks say. If you do it in the MVP stage, you'll throw half of it away. If you do it after you've already scaled, you're playing catch-up while the fires get worse. The sweet spot is when you have enough users that manual testing has become slow, but not so many that you're firefighting.

Stage four: the mature product nobody wants to break

A mature product has thousands or millions of users and an existing reputation. At this stage, the calculus of testing flips entirely. The cost of shipping a bug is massive. The cost of shipping a feature a week late is comparatively small.

This is the stage where you can justify everything the textbooks tell you to do. Unit tests for core logic. Integration tests for service boundaries. E2E tests for critical user journeys. Performance tests before every major release. Security testing. Accessibility audits. Regression suites that run on every commit.

But here's the thing nobody mentions: mature products also accumulate test debt. Tests that were written for features that got deprecated. Flaky tests that nobody has time to fix. Fixtures that reference test users who were deleted from the database. Test environments that have drifted from production.

We audited a mature product for a client last year. They had 3,400 automated tests. About 800 of them were testing features that no longer existed in the product. Another 400 were so flaky that the team had started ignoring them. The effective test coverage was maybe half of what the numbers said. Pruning tests is as important as writing them, and nobody budgets time for it.

The question we should have been asking all along

Looking back at eight years of running QA across dozens of products, the pattern is clear. The biggest testing mistakes we made weren't about choosing the wrong tool or writing the wrong assertion. They were about not asking "what stage is this product in?" before we wrote our first test.

A prototype needs speed and learning. An MVP needs protection of critical paths and permission to break the rest. A growth product needs scale testing and automation catching up to the size of the codebase. A mature product needs discipline and ongoing maintenance of the test suite itself.

If you mix these up, you either waste weeks on tests that get thrown away, or you ship bugs that cost you users. Both are expensive. Both are avoidable if you start with the stage question instead of the tool question.

This is one of the reasons we lean on the idea of independent testing at BetterQA. A team too close to the code usually defaults to the habits of the last stage they worked in. The chef doesn't certify his own dish, and the developer who built the prototype is usually the wrong person to decide whether it needs E2E coverage. Someone from outside can look at the product honestly and ask where it actually is, not where the team wishes it was.

What I'd tell that founder now

If I could go back to the call with the founder who lost three weeks of test work over the weekend, here's what I'd say.

Before you hire anyone to test your product, answer one question out loud: is this code going to exist in three months in roughly the same shape it's in today? If the answer is no, you don't need automated tests. You need an experienced human to bang on it for a few hours and tell you what's broken. If the answer is yes, then you can start thinking about what categories of test to invest in, and in what order.

Everything else is just choosing between tools, and tools are the easy part.