Does AI Generate Accessible Android Apps?

#android #a11y #ai #programming

Over the past six months or so, I've been writing a series of blog posts in which I've generated an app using the same prompt with different AI tools, and then tested the outcome with various assistive technologies and accessibility settings.

The tools I've tested are: Gemini, Junie, Cursor, and Claude. You can find the list of the blog posts above.

Now it's time to wrap up and write a summary of my learnings. Something worth noting is that I started these tests in spring, which is like decades ago in the current pace of technical advancements. Unfortunately, when it comes to accessibility, not everything moves forward. So, even if the first findings are from the end of spring, they're still relevant for learning purposes.

Redundant Content Descriptions

Every tool I tested added redundant content descriptions. And with redundant content descriptions, I mean, for example, buttons that already had a text "Add yarn", and the content description was "Add new yarn", which adds zero new value. And in some cases, the implementation was such that the screen reader read both the text ("Add yarn") and the content description ("Add new yarn"), which added redundant listening for the user.

Claude took this even further, as it added sometimes actions to the content descriptions, meaning that after the redundant content description, there was a "Tap for details" text appended. The card where this was added already had a role of button set, so it led to the screen reader user getting something like: "Bla bla bla. Tap for details. Tap to activate". And if you're using your screen by listening, you probably want to skip redundant information.

From a technical point of view, it's understandable why it happens - accessibility documentation and blog posts often focus on screen reader accessibility, and content descriptions are probably the easiest way to add something and assume it makes the UI accessible. And AI has been trained with the documentation (among other things), so it's repeating these patterns.

Non-Scrollable Screen(s)

Another major problem these tests revealed was that, except for Claude, none of the tested tools added scrollability to the screens. The screens didn't contain that much information, so they didn't need scrollability at default text sizes. But the problems started when the font size was bigger.

If the screen doesn't support scrollability, and the content takes more vertical space than available on the phone screen, the content that goes beyond the visible screen is unusable.

As developers often test with the default font sizes, not all available apps support scrollability, so, again, from a technical point of view, it's understandable why this happens - the material the tools are trained with contains these issues.

Redundant Focusable Modifier

The first app I created with Gemini had this one unique problem from others - it added focusable modifier to a component with clickable modifier. What it means is that when a user who uses e.g. a keyboard or D-pad for navigation encounters this component, they would:

Focus on a button
Focus disappears
Focus on the next focusable item

I've seen this out in the wild - developers adding a focusable modifier because they think, with good intentions, that it improves the accessibility of the app. So, no wonder it was added.

Extra Tab Stops

Junie, on the other hand, created an interesting problem on the second run. When testing with a keyboard, an invisible component was added to the tab order. After some investigation, it turned out to be an invisible floating action button.

Incorrect Semantics

Junie and Claude both added some incorrect semantics to the components. On the second test run with Junie, as I asked it to improve accessibility, it added incorrect roles to some components and redundant state descriptions.

Claude went a bit further - it started hallucinating semantics. For a custom modifier called accessibleTextField, it added a role of Role.TextField, which doesn't actually exist.

Button Navigation not Supported

Finally, the last issue found on the tests was that Claude did not support button navigation. In practice, it means there isn't enough padding at the bottom of the screen for the button navigation not to hide the last content on the screen.

As a button navigation user myself, I see this happening way too often. I suspect most developers use gesture navigation, so they don't test with button navigation, which is why this pattern is widespread.

Summary

Here you can also find the findings in a table format.

Legend: In the table below, "YES" means that it's a problem.

Issue	Gemini	Junie	Cursor	Claude
Redundant content descriptions, which override the text	❌ YES	❌ YES	❌ YES	❌ YES
Redundant actions in content descriptions	No	No	No	❌ YES
Screen(s) not scrollable	❌ YES	❌ YES	❌ YES	No
Redundant focusable modifier	❌ YES	No	No	No
Large font sizes not supported	No	❌ YES	❌ YES	No
Extra tab stops	No	❌ YES	No	No
Incorrect semantics	No	❌ YES	No	❌ YES
Doesn’t support button navigation	No	No	No	❌ YES

Wrapping Up

This blog post concludes my journey into testing AI-generated Android apps. The testing results were pretty much what I expected - the apps contained similar issues that I see out there in the wild, with apps that humans have developed. So I don't believe code generation with AI will solve issues with the inaccessibility of Android apps anytime soon.

You might argue that the issues I've discussed are not that big. But it's worth noting that the app built from the prompt itself isn't that complicated, so a more complex app would probably have resulted in more accessibility issues.

So, all in all, I'm not convinced just yet. Different AI-tools can be helpful, but they're not yet ready to replace developers, as many non-developers would like to believe.