Tudor Brad

Posted on Apr 9 • Originally published at betterqa.co

We built an accessibility tool because spreadsheet audits were killing us

#testing #ai #automation #a11y

There's a specific kind of despair that comes from opening the ninth spreadsheet in a WCAG audit, the one you're pretty sure somebody duplicated from the wrong version two weeks ago, and finding that step 14 of the login journey is marked "Fail" in your copy and "Not Tested" in the reviewer's copy.

Which one is right? Nobody knows. The tester who originally logged it is on PTO. The screenshot is somewhere in Slack, probably in a thread that got buried under a deployment argument.

That was us, about two years ago, during a healthcare accessibility audit in the US. The client was strict, the regulations were strict, everything was strict except our tooling, which was held together with Google Sheets, manual dates, and hope.

We had nine spreadsheets open. One tracked testers. One tracked severity. One tracked notes. One was apparently a backup of another one, but with different data. Screenshots lived in Slack channels, sometimes in DMs, sometimes attached to Jira tickets that referenced a different version of the WCAG criteria.

And then we found the conflicting reports. Same user flow, same step, two different testers, two different results. One said Fail with a note about missing alt text. The other said Not Tested. Both had been submitted to the client in the same week.

That was the moment we stopped patching the process and started building something.

The tool is called Auditi

It lives at auditi.ro. We built it at BetterQA because nothing else matched how we actually test accessibility: by user journeys, broken into steps, with everything traceable back to a specific tester, date, platform, and WCAG criterion.

The core idea is simple. You model journeys the way a user experiences them. Login flow. Checkout flow. Onboarding. Each journey has steps. Each step gets an audit result: pass, fail, or not applicable. Every result has a tester name, severity, notes, evidence files, and a timestamp.

That sounds obvious. It isn't. In spreadsheet world, you're tracking all of that across columns, tabs, and files. Somebody renames a column. Somebody adds rows in the middle. Somebody filters by severity and forgets to unfilter before sending the report. I've watched an experienced QA engineer spend forty minutes rebuilding a pivot table that broke because Excel decided to reinterpret dates.

Auditi gives you filters by journey, tester, status, severity, platform, WCAG level, device, and date. If you've ever tried to find "that one iOS Safari fail from last Tuesday" in a spreadsheet, you understand why this matters.

What we actually built

Assignment dialogs and review queues, so work gets distributed without a Slack message chain. Pass/fail/N-A toggles per step, because that's the atomic unit of an accessibility audit. Notifications for deadlines and invites, because relying on people to check a spreadsheet daily doesn't work.

Then analytics. Pass rate over time. A WCAG compliance matrix. Breakdown by tester, by platform, by severity. This is the part that managers actually care about, and the part that's almost impossible to maintain in a spreadsheet without a dedicated person updating charts.

Reports export to Excel, PDF, and CSV. We kept that because the people who receive accessibility reports often live in those formats. Auditi generates Overview, Detailed, and Matrix reports.

We also added an AI-powered Smart Report that produces an executive summary, scores by WCAG level, flags top issues by priority, and suggests fixes. I'll be honest about this: AI summarization is useful here because it's compressing structured data, not making judgment calls. The tester still decides what passes and what fails. The AI just writes the summary you'd otherwise spend an hour drafting.

If you've tried to make a React app WCAG compliant

Here's where I want to talk to the developers reading this, because the auditing side is only half the problem. The other half is actually fixing things.

We run thirteen products in the BetterQA ecosystem. Different stacks: Vite/React SPAs, Next.js apps, a Laravel app, a WordPress site. Earlier this year we decided to do an accessibility sweep across all of them using our own scanner tool, which runs axe-core via Playwright.

The results were humbling.

Eight of our thirteen sites had accessibility scores below 60. The single biggest offender? Color contrast. Specifically, Tailwind's purple-400 on a white background.

Every Vite/React SPA in our ecosystem used text-purple-400 for links, badges, labels, secondary text. It's a nice color. It also has a contrast ratio of about 3.3:1 against white. WCAG AA requires 4.5:1 for normal text. We were failing dozens of contrast checks per page, across eight different sites, and nobody had noticed because the pages looked fine to us.

The fix: switch to purple-600 (#9333ea), which gives you 4.6:1. Just barely over the threshold, but it passes. We made the change across all eight sites. Some of them needed 24 individual class updates. One site, BetterFlow (a Laravel/Blade app), had the same pattern in Blade templates.

/* Before: 3.3:1 contrast - fails WCAG AA */
.text-purple-400 { color: #a855f7; }

/* After: 4.6:1 contrast - passes WCAG AA */
.text-purple-600 { color: #9333ea; }

That got us from the 50s to the 70s and 80s in accessibility scores. But it only caught the low-hanging fruit.

The deeper fixes taught us more

After the color contrast sweep, we went deeper. Sites like jrny.ro had icon-only buttons with no accessible name. Three buttons that a screen reader would announce as just "button." Fix: add aria-label attributes.

On menute.ro, we found eight form inputs and selects with no labels. A sighted user sees the placeholder text and understands the field. A screen reader user hears nothing useful. Fix: aria-label on each input.

The one that taught us the most was nis2manager.ro. The site uses CSS custom properties for its primary color. The original --primary value was set to an oklch lightness of 0.65. Changing it to 0.48 fixed over seventy contrast violations in one line of CSS. Seventy. From a single variable change.

/* One variable, seventy fixes */
--primary: oklch(0.48 0.2 270);  /* was 0.65 */

That's the lesson we keep coming back to: if your design system uses CSS custom properties or Tailwind theme colors, check the contrast of your base tokens first. You can hunt individual elements for hours, or you can fix the source and watch dozens of violations disappear.

What automated tools actually catch

I want to be direct about this because I've seen too many articles claim that automated accessibility testing solves the problem. It doesn't. Not even close.

Automated tools like axe-core catch maybe 30-40% of WCAG issues. They're good at color contrast, missing alt text, missing form labels, duplicate IDs, and broken ARIA attributes. They're bad at everything that requires context: whether alt text is actually meaningful, whether focus order makes sense, whether a custom widget is operable with a keyboard, whether content is understandable when read linearly by a screen reader.

WCAG has roughly 80 success criteria across levels A, AA, and AAA. Automated tools can reliably check maybe 25-30 of them. The rest need a human who understands the user flow, the intent of the content, and what the experience is like without a mouse.

That's why Auditi is structured around human-driven audits with journeys and steps, not around automated scan results. The automation is useful for catching regressions. It's not a substitute for a tester who actually navigates the site with a screen reader.

What we got wrong

A few things, since we're being honest.

The first version of Auditi was too complex. We modeled every WCAG criterion as a separate audit point, which meant testers had to click through dozens of criteria per step. Most of those were not applicable. We simplified it to let testers mark what matters and skip the rest.

We also underestimated how important the export format is. Early exports were clean but didn't match what compliance officers expected. We had to add specific report layouts that mapped to the documentation formats our healthcare and government clients were already using.

And our own ecosystem sweep revealed that we'd been shipping inaccessible products while building an accessibility tool. That stung. We fixed it, but it's a good reminder that building a tool and actually using it consistently are two different things.

The honest numbers

Our ecosystem scores before and after the sweep:

Site	Before	After	Primary fix
betterqa.co	54	84	Plugin color updates
betterflow.eu	55	84	24 Blade template fixes
auditi.ro	54	82	purple-400 to purple-600
electricworks.ro	53	80	Tailwind primary classes
psysign.ro	53	80	Same pattern
nis2manager.ro	51	76	CSS custom property (one-liner)
factos.ro	47	72	Same pattern

Not perfect scores. Not even close. But a 25-30 point jump across eight sites, and a process we can repeat.

Where this matters for your stack

If you're running a React or Vite SPA and you haven't run axe-core against it, do that first. You'll probably find contrast issues, missing labels, and button-name violations. Those are fixable in an afternoon.

After that, the harder work begins. Keyboard navigation. Focus management in modals and dynamic content. Screen reader announcements for state changes. That's where spreadsheets fall apart and you need actual audit tracking.

We built Auditi because we couldn't do that work well with the tools we had. It's at auditi.ro if you want to look at it.

For more about how we approach QA across different domains, there's the BetterQA blog.

DEV Community