Accessibility testing: what automated scanners keep missing

#testing #automation #a11y #webdev

The audit that made us take this seriously

A few years back we took on a fintech client who had passed an automated accessibility scan. Green across the board in their internal tool. They were confident enough to quote WCAG 2.1 AA compliance on their marketing page.

Then one of our QA engineers ran through the signup flow with NVDA. The screen reader got stuck on a custom dropdown that had no aria-expanded, no keyboard handler, and no role. A blind user would hit that control and land in a dead end. The automated scanner had flagged nothing because the element was technically a <div> with an onClick handler and the scanner had no way to know what it was supposed to be.

That is the gap nobody talks about when they write accessibility guides. Automated tools catch roughly 30 to 40 percent of WCAG violations. The rest requires a human who understands how people with disabilities actually use software.

What automated scanners are good at

We run axe-core, Pa11y, and our own tool Auditi on every client project. They are fast, consistent, and catch the boring stuff at scale:

Color contrast failures (anything below 4.5:1 on normal text)
Missing alt attributes on images
Form inputs without associated labels
Duplicate IDs in the DOM
Missing language attributes on the <html> element
<button> elements with no accessible name
Heading hierarchy jumps (h2 to h4 with no h3 in between)

axe-core in particular is the workhorse. It is what powers most browser extensions, it runs inside Playwright for CI, and it rarely gives false positives. Pa11y is useful when you want a command line output for Jenkins or GitHub Actions. Auditi is what we built when we needed something that scans an entire sitemap and produces a shareable report for a client who does not want to read JSON.

Here is the catch. Every one of these tools is measuring things a computer can check. Alt text length. Contrast ratio calculations. Whether an input has a for attribute. None of them can tell you whether the alt text actually describes what matters in the image, or whether the reading order makes sense, or whether a screen reader user can figure out where they are after the page updates via JavaScript.

The stuff scanners miss (and real users hit)

Last year we audited a booking platform. The automated scan came back with 14 issues, mostly contrast. Our manual pass found 47 more. A sample of what only showed up with a human in the loop:

A modal that stole focus on open but did not return focus to the triggering button on close. A keyboard-only user would lose their place and have to tab from the top of the document again.
An "add to cart" button that visually changed to "added" but had no aria-live region. Screen reader users got no confirmation anything happened.
A date picker that was fine with a mouse, fine with a keyboard, and completely broken with VoiceOver on iOS. The announcement said "button button button" for every day cell.
A cookie banner that trapped focus inside itself but had no visible close button. Desktop users could tab out by accident. Mobile users with switch control could not.
Skip links that worked but jumped to a container with tabindex="-1" that then fired a focus style that looked like a blinking cursor in the middle of the page.
Error messages that appeared inline in red text. The color was fine. The problem was the error had no programmatic association with the input, so screen readers announced the field as valid.

Not one of those made it into an axe report.

A workflow that actually catches things

This is what we run on client projects now. It is not the only way, but it has survived about 200 audits.

Automated baseline. Run axe-core, Pa11y, and Auditi on every key page. Record the violation count. Fix the low hanging fruit first so the manual pass does not get drowned in contrast warnings.
Keyboard-only walkthrough. Unplug the mouse. Try to complete the main flows using Tab, Shift+Tab, Enter, Space, and arrow keys. Note every place focus disappears, jumps unexpectedly, or gets trapped.
Screen reader pass. NVDA on Windows, VoiceOver on Mac and iOS, TalkBack on Android. You do not need to be fluent. You need to notice when the reader says something confusing or says nothing at all.
Zoom to 200 and 400 percent. WCAG 1.4.10 covers reflow. A lot of sites break at 400 percent. Text gets cut off, sticky headers cover content, buttons move off screen.
Color blind simulation. Chrome DevTools has simulators built in. Check whether anything relies on color alone to convey information (red error text, green success state with no icon).
Real user testing. When the budget allows it. A 30 minute session with someone who uses a screen reader daily will teach you more than a week of our own testing.

The order matters. If you start with manual testing you waste time on issues the scanner would have caught in 10 seconds. If you stop at the scanner you ship a product that is technically compliant and practically unusable.

The limitation we ran into with our own tool

Auditi is good at what it does. Sitemap crawl, WCAG checks, a shareable report with severity grouping. But we hit a wall when clients asked us to test flows behind authentication. A scanner can check the login page. It cannot log in, navigate to the checkout, add an item, and test the checkout modal. Not without either shipping credentials to the cloud (which clients hate) or running the scan locally with cookies.

We ended up pairing Auditi with Playwright for the authenticated paths. Playwright logs in, navigates, and then injects axe-core into the page at each step. It is clunky. We have not found a clean solution that works for every client. This is the honest limitation of automated accessibility testing: if the critical user journey is behind a login, you need a hybrid setup, and that hybrid setup is still work.

Why this is our hill to die on

BetterQA has existed since 2018, we run 50+ engineers across 24 countries, and accessibility is one of the services clients most often ask us to retrofit after something goes wrong. Usually a legal letter. Sometimes a complaint from a user. Once, an ADA lawsuit in the US that cost the client more than our retainer for three years combined.

We believe accessibility testing is exactly the kind of work that should not be done by the team that built the feature. Not because they do not care, but because they cannot see what they built from outside. The chef should not certify his own dish. Devs test the happy path with a mouse on a 27 inch monitor. Real users show up on a cracked iPhone with VoiceOver, a tremor, and 30 seconds of patience.

If you want to see how we approach audits or how we built Auditi, you can start at betterqa.co. If you just want the tools: axe-core, Pa11y, NVDA, VoiceOver. Run them in that order. Fix what they find. Then find someone who actually uses assistive tech and watch them try to use your product.

That last step is the one that will change how you build.