DEV Community

marinsky roma
marinsky roma

Posted on

Does it really work for anybody, AI agentic testing, or automation

I have extensive experience in testing, automation, and setup of all these systems, and have successfully led numerous projects in the past that have fixed and scaled up automation, even for non-technical engineers.

Participated and consulted with around ten companies about how to fix their Agentic testing or automation approach.
In most cases, automation and testing were so unreliable that they even resulted in false positives, prompting complaints from C-level executives, managers, and engineers about their decision to shift Test Engineers or Automation engineers.

Some of the Agentic testing providers claim to develop around 1000 test cases monthly, but after a few months, they deliver around 500 largely flaky tests.
Some providers have claimed zero flakiness, connect Confluence or documents, and our tool will build excellent test coverage for every feature without requiring human interaction.

I had a chance to listen to their demos, watch their ads, and websites with excellent descriptions
But most of their demos, sites, and ads are based on just login, or just fulfilling a simple user registration setup

But moreover, I see lots of articles on the engineering media that "Playwright MCP will blow your mind" and it's the same - login scenario, maximum simple registration, and shitty code generated based

So, does anybody have a success story for a big product to shift human Test Engineering to Agentic testing tools?

Top comments (4)

Collapse
 
anchildress1 profile image
Ashley Childress

I have a related theory I've been wanting to try and haven't had time to get to it yet. But mostly the automation benefits I've seen are all documentation related. That's exactly what I use Playwright for most of the time, too. The testing comes secondary.

I'm terrified to even completely turn over unit tests in chat most of the time. It's filling up the green bar for the sake of covering lines and if the implementation was wrong to begin with, then you just get a wronger test reinforcing that fact.

All that being said, I do think there's potential here for automation, even if you consider where the LLMs are currently. It would need to be focused and highly specific, but doable. I've just never seen it actually work anywhere yet. 😆

Collapse
 
brayden_t profile image
brayden t

Great insights! From what I’ve seen, AI/agentic testing tools work well for simple flows like login or registration but can become unreliable for complex, enterprise-scale workflows. Many teams adopt a hybrid approach—using AI for smoke or regression tests while keeping critical paths human-tested (Perfecto.io, AskUI.com).

Collapse
 
roshan_sharma_7deae5e0742 profile image
roshan sharma

I’ve seen lots of “agentic testing” tools make big promises, but flaky tests and false positives seem to be a common problem.

In your experience, what makes the difference between a successful automation setup vs one that degrades over time?

Collapse
 
rodgh2005 profile image
Rodgh2005

Demos cover only simple flows; complex scenarios break everything.