Kirill Sokolov

Posted on Jul 4

How We Cut Our QA Team — and What Happened to the Bugs

#discuss #management #softwareengineering #testing

...someone might rightly point out: of course incidents went down, there's just no one left to find them. Let me clarify: we define an incident as something found in production by a real user — not caught during testing. So this is specifically about what still reached the customer despite having no QA at all.

Disclaimer: before you read the first paragraph and write an angry comment — hold on. This article is not an attack on the testing profession, it doesn't question the value of QA as a discipline, and it definitely doesn't claim to be some new industry gospel. Testing is a serious engineering discipline built on years of hard-won practice, and we fully understand that.

What we want to share is a specific case, under specific conditions, at a specific company, with a specific product. The result surprised us, which is exactly why it seemed worth talking about. If your context is different — and it probably is — your conclusions might be completely different. Read this as a story, not a manual.

What is QA
About the product and context (about us)
What R&D has to do with QA
Here's where it gets interesting
Data Quality — not QA in a new wrapper
The numbers
Instead of a conclusion

What is QA

Let's take a small detour for anyone not deep in software development, because my mom will definitely read this article, and she definitely needs the explanation below.

Imagine you're building a house. An architect designs it, builders raise the walls and run the utilities — and the house is ready. But before anyone moves in, someone has to walk through every room, check that the outlets work, the faucets don't leak, the doors open, and the roof doesn't let rain in. In construction, that's called acceptance testing. In software development, that role belongs to QA — Quality Assurance.

QA specialists are people who professionally look for what's broken or not working as intended, before the product reaches real users. They test new features, catch bugs, make sure old functionality doesn't break when new stuff is added, and generally act as the last barrier between the dev team and the end user.

It's widely believed — not without reason — that the earlier a bug is found, the cheaper it is to fix. A bug caught before release costs, say, one dollar. The same bug found by a user after release costs a hundred dollars, plus reputational damage and the whole team's nerves. That's exactly why QA has long been an industry standard and is seen as a necessary part of any serious product.

About the product

To understand the context in which this decision was made, you need to understand what we do and what our company is about. We build, maintain, and grow several analytics products that provide tools for SEO traffic analysis, contextual advertising, and, more recently, AI search result analytics. In other words, what we sell is data analytics and business insight built on top of our data lake.

Our clients are mostly large corporations that can afford to spend serious budgets on things like SEO analytics. They include major fintech companies, marketplaces, mobile carriers, and travel/ticket aggregators — the kind of companies that have an entire dedicated SEO department on staff. Yes, I mentioned context already, but we only recently entered this new market, and by and large our main source of revenue is still SEO analytics — which tends to be one of the last things companies spend money on. So our client base is a couple hundred large players across different areas of the IT industry.

Since the product is mostly B2B, users themselves don't generate much load — the real load sits on the data collection and processing infrastructure. By nature, our data splits into two types.

Search result data — we work with several vendors who parse HTML and hand it to us. Each source scales horizontally, with the number of instances adjusted based on load, evenly distributed across them. HTML is downloaded and immediately compressed with zstd, then stored on S3 in that form for debugging and research purposes. Raw compressed HTML alone weighs 4TB a day. At the transformation stage, the HTML is parsed, broken down into TSV, enriched with internal data, and written directly to ClickHouse.

Post-click analytics data — collected on a schedule from external platforms on the client's behalf: Yandex Webmaster, Yandex Metrika, Yandex Logs API, Google Analytics, Google Search Console. All of this also goes into ClickHouse.

The ClickHouse cluster consists of 5 shards with a replication factor of 2, with all nodes running in master-master mode: new data is written to both copies simultaneously, and read load is distributed evenly. Calculations run in real time — no pre-aggregation — and still perform fast thanks to a well-designed storage schema and optimized queries. The total data lake size is around 100TB of compressed text data accumulated over 3 years.

An important point for context: we're a fast-growing company with a high pace of development. Releases here aren't a weekly event, or even a daily one. Features ship to production 5-6 times a day. It was precisely this pace that made the classic testing model less and less viable — QA simply couldn't physically keep up with the team's speed.

The current dev team consists of my department, R&D, which handles all new development and services being introduced into the backend infrastructure (6 backend developers), the Core team (5 backend developers), which maintains and does minimal development on the legacy application's functionality, and the frontend team — 4 people at the time, 7 now, thanks in part to the QA team cut.

Speaking of the legacy application, it's worth noting that over the past year, alongside shipping new products, we've also been "sawing apart" the old monolith into services — there are now around 20 of them, including 13 ETL services, an SSO service, a billing service, a ClickHouse proxy service, and a couple of new configuration services responsible for storing collection settings and displaying analytics to users.

This might make it sound like a QA team of just 2 people couldn't possibly physically cover such a massive set of services and developers — so it seems strange to expect anything from them at all, and you might wonder what this article is even about. But here's an important clarification: the testing team worked on and covered exactly one Core service, and all features, new screens — calculations, buttons, tables, and so on — went through that team, but exclusively for that one primary legacy application, whose feature release pace is noticeably slower than everywhere else.

What R&D has to do with QA

It's worth noting that my team, R&D, never really crossed paths with the QA department. There are two fairly objective reasons for that.

First — the specificity of the expertise. The R&D team handled a wide range of complex tasks: ETL pipelines, processing and storing terabytes of data, building a brand-new full-fledged application with its own role model and access management, working with semantics, and an SSO integration that unified the entire product ecosystem into a single authentication layer. To properly verify that any of these services works correctly, you need to deeply understand what exactly it does, why it does it that way, and what the output should look like. That requires the same level of immersion as the person who wrote the service or at least maintained and reviewed it. That's why in R&D, responsibility for code quality lies with the developer from the start — this isn't an unspoken rule, it's a conscious requirement when hiring or transferring into the department, and it remains part of the team's culture to this day.

Second — the economics of testing at real scale. It's impossible to properly test a service that's meant to process terabytes of data on anything smaller than terabytes of data. Spinning up staging infrastructure with real volumes just for testing is expensive twice over: in time spent preparing fixtures, and in money spent on infrastructure. That's why production is objectively the only valid test environment here. It sounds scary — but with the right processes in place, it works.

Testing comes in different forms. For us, it existed in two formats: manual — where testers manually walked through functionality on staging, clicking buttons and checking scenarios — and automated, where scripts ran the API before every release and compared "before" and "after" results, to make sure a new feature didn't break what already worked.

All of this is standard, reasonable, and familiar. But watching how the R&D team operated — without QA, with quality responsibility built into the dev team itself — we started asking ourselves an uncomfortable question: how applicable is classic testing to our product, really?

Here's where it gets interesting

A tester can click a sort button and confirm it works. Or doesn't. That's a valid and useful check — when you're selling an interface. But we don't sell an interface. We sell data. Analytics. Numbers. And if the sorting works perfectly but the numbers behind it are garbage — that sorting is worthless. The user paid for insight and got a beautifully sorted lie.

That's the core takeaway of this article: if what you're selling is the content, not the packaging, you need to test the content. And in our case, the content is the data, the formulas, the aggregation logic, the accuracy of the numbers on screen. And figuring out whether the number in a specific chart is correct isn't something just anyone with a checklist can do. That's something either the analyst who designed the metric can do, or the backend developer who built it.

This leads to another effect that isn't usually said out loud. When a developer knows that after them comes frontend, and after frontend comes a tester, a temptation to relax a little inevitably creeps in. Not on purpose, not out of laziness — that's just how people work. The safety net exists, and the brain feels it. Remove the net, and attentiveness to your own code naturally goes up.

Watching the dynamics of the R&D team — how it shipped high-quality products with a low incident rate, without QA — dev leadership started looking at what was happening in the Core team through a different lens.

And the picture there was less rosy. Incidents and bugs kept showing up with discouraging regularity. At the same time, the number of new applications that needed building and maintaining kept growing, and the frontend team was critically understaffed. And at the intersection of these two observations, an idea was born that, at first glance, sounded at best questionable.

What if we cut the testing department?

Not because the people were bad or useless in a vacuum. But because, in the context of our specific product, their effectiveness was becoming increasingly questionable — and that was visible in the numbers. The freed-up resources were proposed to go toward strengthening the frontend team, which was suffocating under its workload.

But that was only half the idea. The second half was more interesting.

Backend developers, as part of the mandatory code review process, would now have to check the results of frontend work themselves. By hand. Opening screens, looking at the numbers, comparing them against what their own code was outputting. And there's an elegant logic to it: a backend developer reviewing the frontend, whether they want to or not, sees the numbers they themselves generate. They know what should be on screen. They notice when something's off — not because they have a checklist, but because they have context. That became the foundation of the new process: in the Jira workflow, the "QA-test" status became "Backend-test."

On top of that, the safety net we talked about earlier disappears. When you know no one's got your back afterward, you look at your own work differently. The level of personal responsibility rises — not by mandate, but organically.

The decision wasn't impulsive. The question of whether a testing department was even justified in the context of our product had been circulating among tech leads for a while, and was discussed with the CTO and dev leadership. But there's a considerable distance between thinking about something and actually doing it.

The first step wasn't elimination — it was an experiment. Until that point, the QA team reported to product managers, which was already a fairly odd setup given that testing is fundamentally an engineering discipline — but that's just how it had been set up historically, since the company's founding. The first thing we did was get QA moved under engineering leadership. Then we tried to strengthen the team and narrow its scope of responsibility as much as possible, to rule out the possibility that we were simply asking too much of people who weren't set up to succeed. We appointed a lead, narrowed the scope down to a single product, and removed unnecessary tasks.

The optimization didn't help. And that's when the decision was made to cut the department.

There was pushback — and quite a bit of it. Product managers took the idea badly. The Core team and frontend developers weren't thrilled either — people were used to having someone catch mistakes after them. "How can you possibly expect zero bugs in production without a testing department?" was roughly the tone of the conversations.

My team was the exception here. In R&D, the attitude toward QA-based testing had always been skeptical, simply because we'd never really relied on it and did just fine without it.

Data Quality — not QA in a new wrapper

There was another important decision made in parallel, without which this whole story could have ended badly. Instead of classic QA, we created a Data Quality team — made up of backend developers who took on responsibility for data quality, anomaly monitoring, and tracking issues on the vendor side. This is a key point: we didn't just remove quality control — we replaced it with a different, more precise tool tailored specifically to our needs.

The Data Quality team didn't appear overnight — it formed in parallel with the process of moving away from classic testing, literally inheriting some of the old tests and rethinking them in a new format. This isn't testers in a new wrapper, and it isn't just a rebrand — it's a fundamentally different function with a different underlying logic.

Where classic QA comes in after development and checks what ended up on screen, Data Quality works continuously and looks deeper — at the data itself, which is the foundation of everything. They track how ETL services are performing, flag where integrations have broken, and identify which data types have stopped coming in and what analytics depend on them. Based on that, they build dashboards that show, in real time, the completeness and correctness of new data flowing into the data lake.

This isn't output control — it's input monitoring. And for a product where everything is built on data, that's a far more valuable function than checking whether a sort button works.

The numbers

Any decision can be justified in words. But words are less convincing than numbers — so let's get to them.

For comparison, we picked two comparable cycles: Q4 2024 → Q1 2025 (a release quarter and the one following it — still with QA), and Q4 2025 → Q1 2026 (the same cycle, but without a testing department: we cut it at the end of Q3 2025). Both cycles share a major feature release timed to November 11th. In both periods, the team worked under intense schedules, the volume of changes was comparable, and the team composition didn't change.

An important caveat about the metric: the number of closed incidents is not a reliable measure, because a bug can be found in one quarter and closed in another. So instead we look at the number of created incidents — that is, problems found in production by users or account managers and logged by technical support. These aren't bugs found by testers before release — this is what made it through every filter and still reached the customer.

The results speak for themselves:

Period	With QA	Without QA
Release quarter	28	19
Quarter after release	51	26

A quarter after a major release always sees a spike in incidents — that's normal, the backlog catches up. But the difference between the tail-quarter with QA (Q1 2025) and the tail-quarter without QA (Q1 2026) — 51 versus 26 — suggests this isn't about seasonality or luck.

Honesty matters here: the drop in incidents isn't solely down to developers' increased sense of ownership after QA was cut. In that same window, the Data Quality team also came online, catching part of the data-related problems before they ever reached the user. We can't fully separate these two effects in the numbers — both decisions were made and rolled out in parallel.

We're deliberately making an allowance for the fact that statistics always favor whoever is interpreting them. The sample is small, conditions weren't identical — even though we picked periods as close to each other as possible — and we don't claim scientific rigor here. But the trend is pronounced enough to at least be worth thinking about.

There was also a second effect of this decision — just as important, though less visible from the outside. The freed-up headcount went toward strengthening the frontend team, which at the time couldn't keep up with its workload with just 4 people; it's 7 now. Based on the team's sense of it, this eased development time on features and reduced the backlog that used to pile up simply from a lack of hands — we didn't formally track velocity metrics, but growing the team from 4 to 7 without a single new hire is telling on its own.

Instead of a conclusion

What conclusion can be drawn from all of this? Definitely not that testers aren't needed. And definitely not that ours were lazy or slacking off. People did their jobs — it's just that the context in which they did it made that job objectively less valuable than it could have been elsewhere.

The one conclusion I want to draw is a truth I keep coming back to, again and again, in my career and in life, and one that over the years has become as obvious to me as it is unshakeable.

Any decision, any plan, any algorithm should first and foremost be built on the context in which it's made. Even recognized industry standards — things proven over time by millions of teams — can, under certain conditions, stifle and hold back progress. Not because they're bad. But because they were designed for the average case, and your case might be different.

Check your context before copying someone else's solution.

And finally — a question for you: if your company shut down QA tomorrow, what would break first — the product, or the processes built around it?

DEV Community