Antoine Dubois

Posted on Jun 1

How AI-Assisted Development Changes the Way We Test, Review, and Automate

#testing #qa #automaton #ai

AI-assisted development is changing more than how fast teams ship code. It is changing the shape of risk.

When a developer can generate a component, a test, or even a small feature in minutes, the old assumptions around review and coverage start to break down. The question is no longer just, "Did we write enough tests?" It becomes, "Do we understand what the code is doing, what the test is actually proving, and where the system may be guessing?"

That shift matters for QA engineers, developers, and platform teams alike. AI can be a useful multiplier, but it can also create a false sense of confidence. The teams that do well with AI-assisted development are usually not the ones that automate the most. They are the ones that make test intent, traceability, and reliability more explicit.

AI changes the meaning of coverage

Traditional coverage conversations often focused on lines, branches, and a rough sense of feature completion. Those numbers still have value, but they do not tell the whole story when code is AI-generated or AI-assisted.

A code assistant can produce a lot of syntactically valid output very quickly. That means you can end up with a larger surface area of code, but not necessarily a larger surface area of confidence. In practice, I find it more useful to ask three questions:

1. What user behavior is covered?

A test suite should protect business-critical paths, not just code paths. If AI helps generate a new form, a new endpoint, or a new workflow, the important question is whether the tests still describe the user journey correctly.

2. What assumptions are hidden?

AI-generated code often contains implicit assumptions about defaults, timing, retries, API responses, or DOM structure. Those assumptions are easy to miss in review. Good testing makes them visible.

3. What changed in the failure mode?

When the system fails, is it failing because the logic is wrong, the selector is fragile, the environment is unstable, or the model output is inconsistent? Different failure modes need different strategies.

That last point is one reason I like the observability mindset from AI Test Observability Checklist: Metrics That Reveal When Your Agent Is Guessing. Even if your team is not testing an AI agent directly, the article is useful because it pushes you to look for signs that automated behavior is drifting, guessing, or becoming flaky. That is a strong mental model for AI-assisted QA too, especially when test steps or assertions are generated with help from a model.

Review needs to become more explicit, not just faster

One of the easiest traps with AI-assisted development is assuming that speed automatically improves review throughput. In reality, faster code creation can make review harder, because reviewers see more changes and have less time to understand the intent behind them.

That means code review and test review should become more explicit about purpose. I like to see reviews answer questions such as:

What risk does this change introduce?
What evidence proves it works?
What would make this test give us a false pass?
If this fails in CI, how would we know whether it is a product defect or a test defect?

This is especially important for generated tests. A test that passes is not always a good test. It might be asserting too little, depending on timing too much, or validating a DOM detail that has no user value.

A useful supporting perspective comes from How to Measure Browser Test Stability Without Confusing Real Failures With Flakes. The key lesson is simple, stability metrics are only useful when they separate genuine regressions from noisy behavior. In an AI-assisted workflow, that separation matters even more because you may be creating more automation faster, which also increases the chance of introducing brittle checks.

Coverage is now about trust boundaries

When AI helps produce code, I think about coverage in terms of trust boundaries.

A trust boundary is the point where your team stops assuming something will behave correctly and starts verifying it. In AI-assisted development, those boundaries often include:

generated UI flows that rely on specific selectors
API calls that depend on loosely specified response shapes
transformations where the code compiles, but the business logic is easy to misread
tests that were authored quickly and never reviewed for negative paths

The practical move is to increase coverage where the cost of failure is high, and reduce coverage where the value is low. That sounds obvious, but AI makes it easier to overproduce low-value tests. You can end up with a large suite that looks impressive and still misses the risky paths.

This is where roadmap thinking helps. A good automation plan should not just ask what can be automated next, it should ask what should be automated next, based on risk and maintenance cost. The guide How to Build a QA Automation Roadmap for the Next 12 Months is a solid reminder that sustainable automation depends on prioritization, team capacity, and ROI, not just enthusiasm.

Automation strategy changes when code generation is cheaper

AI-assisted development changes the economics of automation. If code is easier to generate, then creating a test suite is easier too. But maintaining it is still the hard part.

That changes the tool conversation. Low-code platforms, code-based frameworks, and hybrid approaches all look different when the team can move faster on implementation. What matters most is not whether a tool is flashy, but whether it supports change without turning every edit into a mini-migration.

That is why I think comparisons like Endtest vs Low-Code Test Automation Platforms: What Changes in Maintenance, Collaboration, and Scale are relevant to AI-assisted teams. The article is useful because it frames automation around maintenance, collaboration, and scaling, which are exactly the pain points that grow when more tests are created with AI help.

For teams adopting AI-assisted coding, the main automation question is not, "Can the model generate this test?" It is, "Can the team understand, update, and trust this test six months from now?"

Debugging still needs human discipline

AI can help draft a test or suggest a fix, but it does not replace good debugging habits. In fact, AI can make sloppy debugging easier, because it offers plausible explanations very quickly.

That is dangerous. If a test fails only in WebKit, or only under CI load, or only after a certain navigation path, the worst thing you can do is accept the first explanation that sounds reasonable.

I keep coming back to workflows like How to Debug WebKit-Only Failures in Playwright Without Guessing. The important part is not the browser engine itself, it is the discipline: use traces, compare logs, inspect timing, and validate assumptions against a real browser. That approach is even more important when the test or the component was generated with AI help, because you need a reliable way to distinguish a real compatibility issue from a bad test design.

What this means for outsourced teams and handoffs

AI-assisted development also affects the way work gets handed off between developers, QA, and external partners. If a team expects QA to verify generated features without clear test intent, handoffs become vague very quickly.

The best handoffs are specific. They describe what changed, why it matters, what should be observed, and which parts of the suite are supposed to protect the change. This is true whether QA is internal or outsourced, but it matters even more when AI has accelerated the pace of delivery.

I found the discussion in Endtest vs Selenium for Outsourced QA Teams: What Changes in Maintenance, Handoffs, and Time to Value helpful here, because it focuses on maintenance burden, handoff quality, and time to value. Those are exactly the places where AI-assisted delivery can either reduce friction or amplify confusion.

If code changes arrive faster, QA needs better context, not just more tickets.

A practical mindset for AI-assisted QA

If I had to reduce all of this to a few habits, they would be these:

Make test intent visible

Every automated test should answer a clear question. If you cannot describe the behavior it protects, the test is probably too vague.

Review generated code as if it were unfamiliar code

Even if the assistant produced it, a human still owns the result. Read it for assumptions, edge cases, and maintainability.

Treat flakiness as a product signal

Flaky tests are not just a CI annoyance. They often reveal unstable selectors, weak assertions, or unclear ownership.

Prefer stable evidence over clever automation

A simple, well-structured test with good diagnostics is better than a complex one that is hard to trust.

Revisit the suite as the team’s AI usage grows

The more AI contributes to implementation, the more important it becomes to audit test quality, failure signals, and maintenance cost.

Closing thoughts

AI-assisted development is not making testing less important. It is making the quality of testing more visible.

Teams that rely on AI to write code, tests, or even review suggestions still need the same fundamentals: clear intent, stable assertions, good observability, and a realistic maintenance plan. What changes is the speed and scale at which weak practices show up.

If anything, AI raises the bar. It rewards teams that can explain why a test exists, what it protects, and how they know it is telling the truth. That is a good thing. It pushes QA back toward its real job, not just confirming that code runs, but proving that the system behaves in ways the team can trust.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.