Antoine Dubois

Posted on Jun 4

AI-Assisted QA Changes the Testing Job, Not the Testing Need

#testing #devops #webdev

Internal note to the team, we need to improve test coverage and keep shipping, which means we should treat AI as a helper in the workflow, not as a replacement for testing discipline.

AI-assisted development changes the shape of our risk. It can produce more code faster, but it also increases the chance that small logic mistakes, brittle selectors, and shallow test cases slip through review. The answer is not to add more manual checking everywhere. The answer is to be more deliberate about what we review, what we automate, and where we let AI help.

What changes when AI writes part of the code

The first thing that changes is review. When a developer uses AI to draft a feature, a test, or a refactor, the reviewer is no longer only checking intent and style. The reviewer also needs to check whether the generated code matches the product rule, whether it introduced a hidden dependency, and whether it quietly weakened coverage.

That does not mean every AI-assisted change deserves extra ceremony. It means our review checklist should shift from "does this look correct" to "what did the model assume, and did we verify those assumptions?" That is especially important for test code, because generated tests often look plausible even when they do not prove much.

Coverage should move from volume to signal

AI tends to produce more test cases, but more cases are not the same as better coverage. If a generated test suite repeats the same happy path under slightly different names, the team gets a false sense of safety. Coverage should answer a more practical question, where are we most likely to break the user experience, and where will a test actually catch it?

For chat and other AI features, prompt-by-prompt manual checks are a trap. They do not scale, and they encourage a habit of eyeballing output instead of verifying behavior. A better pattern is to build assertions around expected properties, create eval sets for representative prompts, and add regression coverage for failure modes. The article How to Test AI Chat Features Without Relying on Prompt-by-Prompt Manual Checks is a useful practical reference here, because it focuses on assertions, guardrails, and repeatable checks instead of one-off spot checks.

Automation decisions need a maintenance lens

AI also affects what we choose to automate. It is tempting to let an assistant generate Playwright tests for every flow, then call it done. The hidden cost shows up later, when those tests need debugging, fixture updates, and locator repairs. AI can speed up creation, but it does not remove maintenance.

That is why I like comparing the first version of a suite with its first maintenance cycle. The real question is not "how fast can we create tests," it is "how expensive is the second week of ownership?" The piece Endtest vs Hand-Written Playwright Suites: What Changes After the First Maintenance Cycle makes that tradeoff concrete, especially around upkeep, collaboration, and debugging.

If your app UI changes often, especially locators, editable regression suites can reduce friction. They let the team maintain tests without rewriting everything every time a selector moves. That is why the guide on How to Use Endtest for Editable Regression Suites When Your Team Keeps Changing Locators is relevant, because it frames locator stability as a maintenance problem, not just a tooling preference.

What to trust in AI testing tools

We should also be careful about the tools themselves. A tool saying it has AI features does not tell us much. Does it explain why it healed a test? Can a human review the change before it lands? Does the generated test code stay understandable six weeks later? Those details matter more than the label.

Before we trust any automation that claims to be smart, we should verify how much control we keep. The checklist in AI Features in Testing Tools: What Buyers Should Verify Before Trusting the Automation is a good reminder to look for explainability, human review, and failure visibility. That is the difference between a helpful assistant and a black box that quietly erodes confidence.

The hidden cost of generated tests

Generated test code is not free just because the first draft appeared quickly. Someone still has to review it, debug it, align it with the app architecture, and keep it from turning into a pile of near-duplicates. If the team does not budget for that work, the automation suite becomes harder to trust over time.

This is where AI-assisted development can mislead teams. A fast start can hide a slow tail. The article The Hidden Cost of AI-Generated Test Code is a useful counterweight, because it frames review, infrastructure, and long-term maintenance as part of the real cost of ownership.

A practical operating model for the team

Here is the working approach I would use.

1. Let AI draft, but not decide

Use AI to produce a first pass for test ideas, boilerplate, and edge case lists. Do not let it decide what matters. A human should pick the assertions, the test boundaries, and the priority of the suite.

2. Review for behavior, not just syntax

When reviewing AI-assisted code, ask three questions, does this test protect a user outcome, does it fail for the right reason, and is the setup readable enough for a teammate to fix later?

3. Keep regression suites editable

If selectors, flows, or copy change often, prioritize maintainable regression patterns over raw code volume. The suite should be easy to update without a full rewrite.

4. Test AI features with properties and evals

For chat, summarization, classification, and similar features, define what good output means. Use assertions and curated eval sets rather than manually reading every response.

5. Measure ownership, not just generation speed

When comparing tools or approaches, include the cost of the first maintenance cycle. That is where the real shape of the workflow appears.

Where this leaves us

AI-assisted development is changing testing, but not in the dramatic way tool vendors like to suggest. It does not eliminate QA work. It changes where the work happens. We spend less time typing repetitive code and more time checking assumptions, keeping suites maintainable, and deciding which failures actually matter.

If we get this right, AI can help the team move faster without turning testing into guesswork. If we get it wrong, we end up with more code, more tests, and less confidence.

For teams still evaluating tools and workflows, Best AI Testing Tools for QA Teams is a practical overview of no-code, low-code, and code-first options. Use it as a starting point, then judge every option by the same rule, does it reduce maintenance without hiding risk?

That is the bar I would set for the next quarter, better coverage, clearer review, and automation that stays usable after the first rush of AI-generated output.

DEV Community