I have extensive experience in testing, automation, and setup of all these systems, and have successfully led numerous projects in the past that have fixed and scaled up automation, even for non-technical engineers.
Participated and consulted with around ten companies about how to fix their Agentic testing or automation approach.
In most cases, automation and testing were so unreliable that they even resulted in false positives, prompting complaints from C-level executives, managers, and engineers about their decision to shift Test Engineers or Automation engineers.
Some of the Agentic testing providers claim to develop around 1000 test cases monthly, but after a few months, they deliver around 500 largely flaky tests.
Some providers have claimed zero flakiness, connect Confluence or documents, and our tool will build excellent test coverage for every feature without requiring human interaction.
I had a chance to listen to their demos, watch their ads, and websites with excellent descriptions
But most of their demos, sites, and ads are based on just login, or just fulfilling a simple user registration setup
But moreover, I see lots of articles on the engineering media that "Playwright MCP will blow your mind" and it's the same - login scenario, maximum simple registration, and shitty code generated based
So, does anybody have a success story for a big product to shift human Test Engineering to Agentic testing tools?
Top comments (4)
I have a related theory I've been wanting to try and haven't had time to get to it yet. But mostly the automation benefits I've seen are all documentation related. That's exactly what I use Playwright for most of the time, too. The testing comes secondary.
I'm terrified to even completely turn over unit tests in chat most of the time. It's filling up the green bar for the sake of covering lines and if the implementation was wrong to begin with, then you just get a wronger test reinforcing that fact.
All that being said, I do think there's potential here for automation, even if you consider where the LLMs are currently. It would need to be focused and highly specific, but doable. I've just never seen it actually work anywhere yet. 😆
Great insights! From what I’ve seen, AI/agentic testing tools work well for simple flows like login or registration but can become unreliable for complex, enterprise-scale workflows. Many teams adopt a hybrid approach—using AI for smoke or regression tests while keeping critical paths human-tested (Perfecto.io, AskUI.com).
I’ve seen lots of “agentic testing” tools make big promises, but flaky tests and false positives seem to be a common problem.
In your experience, what makes the difference between a successful automation setup vs one that degrades over time?
Demos cover only simple flows; complex scenarios break everything.