Your functional tests pass. Your unit tests pass. Your E2E suite is green.
And then a user reports that the checkout button is invisible on the Ga...
For further actions, you may consider blocking this person and/or reporting abuse
Excellent article, Jay. The distinction between "Does it work?" (Functional) and "Does it look right?" (Visual) is where most mobile QA strategies fall apart.
I’m particularly interested in the "No Baselines" approach of Vision AI. Moving away from rigid screenshot comparisons to a model that understands layout semantics solves the dynamic content problem that has plagued tools like Percy or Applitools for years. Definitely looking into Drizz for our next sprint
This is one of the few posts on AI-based testing that actually explains a real problem instead of just saying “AI will replace QA.”
What I liked most is the point about functional tests passing while the UI is still broken for the user. That happens a lot on mobile apps, especially across different screen sizes and Android variants. A button can technically exist and still be unusable because of overlap, clipping, or bad scaling.
The comparison between locator-driven testing and vision-based testing also makes sense. Traditional automation becomes painful to maintain when the UI changes frequently, and modern apps change constantly. Using visual understanding instead of depending entirely on selectors feels like a natural next step.
I don’t think script-based tools are going away anytime soon, but combining them with Vision AI for visual validation honestly seems much more practical than treating them as separate worlds.
Good read overall, especially the focus on real-world reliability instead of just test execution numbers.
The most valuable insight here is that mobile QA has been treating “functional” and “visual” correctness as two separate problems, even though users experience them as one.
A test passing doesn’t mean the UI is usable. A checkout button can technically exist while being hidden behind the keyboard, clipped on certain devices or unreadable in dark mode. Script-based tools simply weren’t built to catch those failures.
I also liked the point about screenshot diff tools eventually becoming noise generators on mobile because of fragmentation, animations, and dynamic content. Maintaining hundreds of baselines across devices feels less like testing and more like babysitting screenshots.
The Vision AI approach is interesting because it changes the question from “Did pixels change?” to “Does this screen still make sense to a human?” which honestly feels much closer to real user experience.
Curious though: how does Vision AI handle subtle design regressions where the UI is still usable, but spacing, typography rhythm, or visual hierarchy slightly drift from the intended design system?
This article explains a very important shift in mobile testing. Traditional script-based tools can miss small UI issues, while Vision AI makes testing more human-like by detecting visual changes users actually notice. A useful read for anyone interested in app quality, automation, and the future of mobile testing.
The simple truth is that most of all automation tests fail.
Traditional tools those script based depends a lot on locators, ID's, XPath and other static flow. As soon as Any button or position is changed, a layout altered to UI elements under the app, it broke test without even hindering functionality of the application.
Right in this article, one thing which I learnt is that, Vision AI completely transforms the entire approach of mobile testing!
Instead of asking:
— Is this element present in the DOM?
Vision AI asks:
Does a real user actually look at this screen and see it as intended?
That difference matters
Do you think Vision AI will fully replace pixel-based testing or just complement it?
One thing I genuinely liked in this article is how it explained the difference between “tests passing” and “good user experience.” In many projects, if Selenium/Appium scripts pass, teams assume everything is fine. But in reality, users don’t care whether a locator worked — they care whether the UI actually looks usable on their device.
The checkout button example was very relatable because issues like invisible buttons, overlapping elements, spacing problems, dark mode rendering bugs, or broken responsiveness are things script-based automation often misses unless someone manually notices them.
I also found the point about maintenance overhead important. In large mobile apps, constantly updating selectors, handling device-specific UI differences, and maintaining inspection workflows can consume a lot of QA effort. Vision AI feels interesting because it shifts testing closer to how humans naturally validate interfaces — visually instead of structurally.
What I personally think is that visual AI testing won’t completely replace functional automation, but combining both could make QA much stronger. Functional tests can verify logic while Vision AI verifies the actual user-facing experience.
As someone currently exploring software testing and AI-driven tools, this blog gave me a much more practical understanding of where mobile QA is heading in the next few years. Great insights throughout 👏
Really liked this post. The “does it work?” vs “does it look right?” distinction is super important, especially on mobile. Functional tests can pass even when the UI is broken for real users. Vision AI seems like a smarter way to catch those visual issues without dealing with tons of screenshot baselines. How does Vision AI improve mobile visual regression testing compared with script-based tools?
Really solid breakdown. The point about two separate testing systems (functional + visual) being where bugs actually hide hit home — that gap is exactly where things slip through to production. The Vision AI approach of seeing the screen to interact with it, rather than diffing against a stored baseline, is a cleaner architecture. No more 500 diffs per build that everyone starts approving blindly just to keep the pipeline moving.
Exactly, that gap between functional and visual testing is where most real UI bugs slip through. Vision AI reducing noisy baseline diffs does feel like a more practical approach for mobile apps.
This is one of the clearest articulations I've seen of why the "green pipeline = working product" assumption breaks down at the UI layer. The framing of functional tests answering "does it work?" versus visual testing answering "does it look right?" sounds obvious once stated, but most teams don't operationalize the distinction — they just assume passing tests imply a usable interface.
The part that resonated most with me is the two-system problem. In practice, these systems don't just have separate tooling — they have separate ownership. Functional tests are owned by developers; visual baselines are often nobody's job until they're everybody's problem. The review bottleneck you describe (500 diffs per build, 15% false positives, reviewers approving blindly) isn't a tooling failure — it's what happens when a process generates more noise than signal and humans adapt by ignoring it.
The Vision AI approach is interesting precisely because it reframes the architecture: the AI has to see the screen to interact with it, so visual verification isn't an additional step — it's a prerequisite to every action. That's a fundamentally different contract than "run functional tests, then run a separate screenshot diff suite."
One thing I'd push on: how does Vision AI handle intentional but subtle visual changes — say, a border-radius update from 4px to 6px, or a line-height tweak that shifts the rhythm of a long-form page? These are cases where a human designer would immediately notice a regression, but the semantic content of the screen (buttons, fields, text) is identical. Is that the gap where pixel-level tools still earn their place, or does the AI surface these too?
Genuinely useful piece — the comparison table between traditional and AI approaches is something I'm going to share with our QA lead.
What I thought was interesting about this approach is that Vision AI is able to identify the problems in the same way as any actual user could.
It should be noted that conventional approaches to visual regression rely on screenshots analysis. However, there is always a chance of overlooking the problems such as the presence of overlapped text, incorrect alignment of buttons or other elements, or any other visual aspects, which can have a significant impact on user experience.
The use of Vision AI is more feasible since it allows us to assess the interface not from the development standpoint, but from the perspective of its actual visual characteristics. It is especially relevant in case of application design for mobile devices, where layout optimization matters a lot.
Finally, what also made me think about Vision AI was its ability to minimize the manual work of QA engineers when it comes to analyzing visual changes. When applications become more sophisticated, the time spent on screenshots analysis may become significant. Thus, AI-based visual identification is quite a reasonable solution here.
Overall, it has helped me gain valuable insights into the problem.
What stood out to me here is that visual regressions are one of the few problems where “test passed” can still mean “user experience failed.” A button can technically exist, be clickable, and satisfy every assertion while still being partially hidden, misaligned, or unusable from the user’s perspective.
That’s why the shift from structure-based validation to perception-based validation feels important. Traditional automation frameworks are great at verifying logic and workflows, but they were never really designed to understand layout quality, visual hierarchy, or rendering consistency the way humans do.
At the same time, I don’t think visual AI replaces script-based testing — it complements it. Functional assertions answer “does the app work?”, while visual regression testing answers “does the app still look usable and trustworthy?” Both matter, especially in modern mobile apps where UI changes happen constantly.
Curious to see how these systems evolve around dynamic content, animations, and intentional UI variations though, since reducing false positives is probably the real challenge at scale.
One thing I found really interesting is how Vision AI focuses on validating the actual visual user experience instead of relying only on predefined scripts and assertions.
Traditional automation can verify functionality correctly, but UI issues like spacing inconsistencies, rendering glitches, or layout shifts may still go unnoticed even when tests pass successfully.
The point about reducing manual inspection workload in large-scale mobile testing environments also stood out because visual validation across multiple devices and screen sizes is genuinely challenging for QA teams.
It’ll be interesting to see how Vision AI evolves alongside existing automation workflows in modern CI/CD pipelines.
Do you think Vision AI will fully replace pixel-based testing or just complement it?
This was a really good read. The way you explained the gap between functional testing and actual user experience was spot on. Most teams focus so much on whether a feature works that they forget users only care about what they actually see on screen. The examples around layout shifts, overlapping elements, and device-specific issues felt very real because these are exactly the kinds of bugs that slip into production even when all tests are green. I also liked how clearly you showed the maintenance headache with traditional screenshot-based tools on mobile.
The Vision AI approach makes a lot of sense, especially for teams dealing with multiple devices and fast release cycles. Combining functional and visual validation into one flow feels much more practical than maintaining two completely separate testing systems. Really insightful article overall!!!
What I found interesting here is how this actually exposes a gap most teams don’t talk about openly not the tools themselves but the "assumption layer" in QA.
We assume "green tests = safe release", but in mobile that assumption quietly breaks in production all the time. Especially when UI issues are treated as second-class bugs compared to functional failures.
The idea of Vision AI isn’t just "smarter automation" to me ,it feels more like collapsing two separate mental models teams have been maintaining for years: one for logic, one for appearance.
That being said I still think the hardest part in real world teams won’t be detection, but deciding what counts as a "visual regression worth failing a build" versus "acceptable drift". That boundary is usually where QA discussions get messy.
Still, this feels closer to how actual users experience apps, not how test suites model them.
It was great to read your article Jay. When you mentioned that functional tests can pass despite the actual UI being broken on real devices, I have to say that this is an issue I've encountered multiple times in mobile applications. However, what really drew me in was the concept of getting rid of the entire baseline management hassle. Screenshot diffing seems like a good solution on paper until you take into account that you need to deal with fragmentation, animation, and dynamic elements. On the other hand, using the Vision AI-based approach seems much more realistic because you'll be validating the UI's content itself rather than pixel by pixel.
I also agree with you when you say that it makes much more sense to merge functional and visual testing instead of using two different systems entirely. I will definitely dive deeper into Drizz and see where this goes from now on. Great article!
I liked how this article explained the difference between functional testing and visual testing with real examples. Normally we think if the test passes then the app is fine, but issues like hidden buttons, overlapping text, or broken dark mode can still affect users badly.
The part about screenshot comparison creating too many false positives on different mobile devices was also interesting. Vision AI testing seems like a smarter approach for handling device fragmentation and dynamic content in mobile apps.
As a student, this gave me a good understanding of how AI is slowly changing software testing and QA workflows too.
I love the distinction between 'Does it work?' and 'Does it look right?' A green checkmark on a functional test means nothing if the 'Buy' button is hidden behind a popup. Vision AI finally aligns QA with the actual human experience
Excellent read, Jay. The friction of maintaining "Two Separate Systems" for functional and visual checks is exactly where mobile QA workflows typically bog down. I’m particularly drawn to how Vision AI eliminates the need for rigid baseline dependency. Moving away from brittle, pixel-to-pixel matching to a model that leverages semantic understanding completely bypasses the review bottleneck and device fragmentation issues that inevitably cause alert fatigue. Being able to run both functional and visual validations in a single pass without constantly configuring masks is a massive leap forward, and I'll definitely be keeping an eye on how Drizz streamlines this process for future testing environments.