Three times independent QA saved a release (and one time we almost didn't)

#testing #ai #automation #security

I founded BetterQA in 2018 because of a healthcare project that was drowning in bugs. The client hired me as a single QA lead. Within weeks I needed two more people. Then four. Then eight. The defect backlog was that deep, and the development team had been marking things as "works on my machine" for months.

That project became BetterQA. Today we're 50+ engineers working across 24 countries from our base in Cluj-Napoca, Romania. And the lesson from that first engagement hasn't changed: development teams should not validate their own code.

The chef should not certify his own dish.

Here are three stories from our work where independent QA changed the outcome of a release. They're not sanitized. They're messy, like real projects are.

The bug that made the development team "look bad"

This one still bothers me.

We had an engineer named Christie on a client project. She found a bug during regression testing, filed it properly, attached screenshots, reproduction steps, the whole thing. Clear defect. The feature didn't work as specified.

The project manager pulled her aside and told her to close the ticket. His reason: "It makes the development team look bad."

Christie pushed back. The PM insisted. She closed it.

Three weeks later the product owner found the exact same bug in production. The client escalated. The PM blamed "insufficient testing." Nobody mentioned the closed ticket.

This is not an edge case. This happens in organizations where QA reports to the same manager as development. When the person evaluating your performance is also the person whose deadline you threaten by finding bugs, the incentive structure is broken. Bugs get downplayed. Severity gets negotiated. Tickets get closed for political reasons.

Independent QA exists because the person finding bugs should not answer to the person who wrote them. Christie was right. The PM knew she was right. The structure failed her.

If you run QA inside your dev org, ask yourself: would a junior tester on your team feel safe filing a P1 bug the night before a release? If the answer is "it depends on who's managing them," you have a structural problem.

A healthcare platform where "small" bugs could hurt patients

Early in BetterQA's history, before we had 50 people, we took on a healthcare client whose internal team had signed off on a release. They told us they just needed a "final check" before going live. Regulatory box-ticking, basically.

We found 47 defects in the first week.

Most were edge cases around data entry. A date field that accepted February 30th. A dosage calculator that silently rounded down when you entered decimal values. A session timeout that didn't save form progress, so nurses would lose 10 minutes of patient intake data if they stepped away.

None of these would crash the application. All of them would degrade care. A rounded-down dosage is not a cosmetic bug.

The internal dev team had tested the happy paths. Login works, forms submit, data appears in the dashboard. But they built those forms. They knew how to fill them in correctly. They never entered February 30th because they knew February doesn't have 30 days.

End users are more independent than independent QA will ever be. They don't read your specs. They don't attend your standups. They figure out your product on their own, and they use it in ways you never anticipated. An independent QA team is the closest simulation you'll get of that behavior before launch.

The client delayed their release by two weeks. They were frustrated with us at first. Then the compliance team reviewed our findings and the conversation changed fast.

The e-commerce checkout that was bleeding money

A mid-size e-commerce platform came to us because their cart abandonment rate had spiked. Their internal team had been debugging it for three weeks. They suspected a payment gateway issue. They were wrong.

We set up test scenarios that replicated actual user behavior, not just the checkout flow as designed. We tested with slow connections. We tested with users who added items, left for an hour, came back. We tested mobile browsers where the viewport cut off the "proceed to payment" button. We tested what happens when you hit the back button after entering card details.

The root cause was a combination of three things. The session expired after 20 minutes with no warning. On mobile Safari, the checkout button was hidden below the fold and the page didn't scroll to it. And when a user's session expired mid-checkout, the error message said "Something went wrong" with no recovery path. The user had to start over.

The internal team missed all three because they tested on desktop Chrome with fresh sessions. Every time. They never simulated a distracted shopper on a phone.

After fixes shipped, cart abandonment dropped noticeably within the first month. I won't give exact numbers because they're the client's data, but it was enough that the engagement paid for itself many times over.

The one where we almost didn't catch it

I said there were three victories and one near-miss. Here it is.

We were doing regression testing on a fintech client's API. One of our engineers noticed that a specific sequence of API calls, executed in rapid succession, could cause a race condition in the transaction processing. The balance would temporarily show an incorrect amount.

He filed it. The dev team said it was a "theoretical" issue because no user would make those calls that fast. Our engineer disagreed but moved on to other test cases.

Two days before release, I reviewed the open tickets and saw it marked as "won't fix." I called the client's CTO directly. I explained that "no user would do this" is exactly what an attacker would do. Automated scripts don't have human reaction times. This wasn't a UX bug. It was a security vulnerability.

They fixed it. But if our engineer hadn't filed it, or if I hadn't reviewed the backlog, it would have shipped. The system worked because we had separation between the people building and the people questioning.

Why this keeps happening

The pattern across all of these stories is the same. Internal teams test their own assumptions. They test the paths they built. They use the data they expect. They run browsers they prefer on networks they control.

This isn't incompetence. It's human nature. When you build something, you develop a mental model of how it works. Your testing unconsciously confirms that model. You need someone without that model.

The cost argument matters too. Hiring in-house QA means PTO, office space, equipment, training, and paying salary even during slow periods. With an independent team, you get B2B flexibility. Scale up for a release, scale down between sprints. At BetterQA, clients also get our internal tools (BugBoard for test management, Flows for browser testing, BetterFlow for time tracking) included, no separate licenses.

But the real cost is shipping bugs. The healthcare client would have shipped dosage rounding errors. The e-commerce client was losing revenue every day. The fintech client would have shipped a security hole. Christie's client shipped a known bug because a manager prioritized optics over quality.

Every one of those outcomes is more expensive than independent QA. And that's before you factor in the cost nobody budgets for: the trust you lose when users find the bugs your team already knew about.

Where this is headed

We're in 2026 now and AI is accelerating development speed by an order of magnitude. Features that took three months ship in three hours. That same speed produces proportionally more defects. You need QA that can match the pace.

I keep telling people: AI will replace development before it replaces QA. You still need humans at the output end, verifying that what the machine produced actually works for real users in real conditions. If anything, independent QA is more critical now than it was when I started BetterQA in 2018.

The chef should not certify his own dish. That was true before AI, and it's even more true when the chef is a language model that hallucinates.