Stanly Thomas

Posted on Mar 5 • Originally published at echolive.co

Is Your Content Accessible? The Audio Gap in WCAG 2.2

#a11y #wcag #texttospeech #publishers

Most publishers spend serious time and money on visual accessibility. They fix color contrast ratios, write alt text for images, and label their form fields correctly. That work matters. But there's a significant gap that almost no one is closing: the absence of audio alternatives for written content.

WCAG 2.2 covers a wide range of recommendations for making web content more accessible, including accommodations for blindness and low vision, deafness and hearing loss, limited movement, speech disabilities, photosensitivity, and some accommodation for learning disabilities and cognitive limitations. The standard requires content to be perceivable — meaning information must be presented in ways users can access, including through assistive technologies. While WCAG 2.2 does not explicitly require audio versions of text content, the audio dimension of perceivability is still routinely overlooked.

In this article, we explore how audio alternatives complement WCAG 2.2's perceivability principles, who benefits from audio alternatives, why publishers are leaving accessibility value on the table, and how automated text-to-speech production changes the equation entirely.

The State of Web Accessibility Is Still Dire

Before we get into audio specifically, it helps to understand how far we are from general accessibility compliance. The numbers are not encouraging.

According to WebAIM's annual Million report, in 2025 an estimated 96.3% of websites still had at least one detectable accessibility failure, and on average each homepage had around 51 accessibility errors, creating multiple barriers for users with disabilities. These aren't fringe or obscure websites — this data comes from analyzing the world's top one million most-visited pages.

The human cost is significant. The World Health Organization estimates that about 1.3 billion people — roughly 16% of the global population — live with some form of disability. According to the Click-Away Pound survey, 73% of disabled customers experience accessibility barriers on more than a quarter of websites they visit, and companies with inaccessible websites lose an estimated $6.9 billion to competitors annually.

The irony is that most of these failures are preventable. The most common homepage errors are low-contrast text, missing alternative text for images, empty links, missing form input labels, empty buttons, and missing document language. Developers fix these issues and feel they've "done accessibility." The audio dimension of perceivability barely enters the conversation.

What WCAG 2.2 Actually Requires Around Audio

WCAG 2.2 is a web accessibility standard published by the World Wide Web Consortium (W3C). It became an official W3C Recommendation on 5 October 2023 and represents current best practice. Organizations should be building against it now.

Organizations should conform to WCAG 2.2 Level AA to be compliant with laws like the Americans with Disabilities Act (ADA), Section 508, or the Accessibility for Ontarians with Disabilities Act (AODA).

The audio-related requirements are organized under the Perceivable principle — the idea that no information should be "invisible" to any of a user's senses. Here are the key criteria publishers need to know:

Criterion 1.2.1: Audio-only and Video-only (Prerecorded) — Level A

WCAG 1.2.1 is a Level A success criterion that mandates the provision of accessible alternatives for prerecorded media content that contains only audio or only video. For publishers, this means that if you publish a podcast episode, audio recording, or any audio-only content, a text transcript is not optional — it is a baseline requirement.

Criterion 1.2.3: Audio Description or Media Alternative — Level A

WCAG 1.2.3 is a Level A success criterion that applies to prerecorded video with synchronized audio. It requires either audio description of the video content or a full text alternative for the time-based media, so that users who cannot perceive the visual information still receive equivalent access. This criterion does not, by itself, create a conformance requirement to provide audio versions of text-based pages — but the underlying principle of offering content in multiple formats is relevant.

The Broader Perceivability Principle

WCAG 2.2 contains 13 guidelines organized under the four POUR principles, with 86 testable success criteria that define conformance requirements, including Time-based Media (providing alternatives for audio/video) and Adaptable (creating content that can be presented in different ways without losing information).

That last point — content presented in different ways without losing information — is the philosophical foundation behind the audio gap. A long-form article is text-based. A user with low vision, a reading disability, or who simply cannot look at a screen right now would benefit from audio. While WCAG 2.2 does not include a specific success criterion requiring audio versions of all text content, its Perceivable principle strongly encourages offering content in multiple formats so more people can access it.

Who Actually Needs Audio Alternatives?

The case for audio accessibility goes well beyond blindness and low vision. The audience is far larger than most publishers realize.

Text-to-speech technology ensures information is more accessible to people with visual impairments, learning disabilities such as dyslexia, cognitive challenges, or those who simply prefer audio learning.

Research backs this up strongly. Oral presentation of material including text-to-speech helps eliminate the decoding requirement by reading words aloud to users, enabling comprehension. A systematic review published in the Canadian Journal of Learning and Technology found that text-to-speech software has assisted students in improving reading speed, fluency, and content retention, and resulted in increased self-efficacy in reading abilities and independent learning.

Dyslexia alone affects approximately one in five people. Over 430 million people worldwide have disabling hearing loss, and according to the World Health Organization, it is estimated that 1 in every ten people will have disabled hearing by 2050, impacting their ability to access audio content. Meanwhile, about 1% of the world's population has a cognitive disability, and these users benefit from delivering content in multiple formats — text, audio, and video — as well as from content that highlights important information.

There's also a situational angle. Text-to-speech provides convenience for any user who prefers to listen to content while multitasking, such as driving or exercising. When you make your content accessible, you're not just serving users with disabilities — you're serving commuters, gym-goers, and anyone whose eyes are occupied. That's a much bigger share of your audience than you might think.

Why Publishers Are Stuck in a Manual Workflow Problem

Publishers understand that audio matters. But for most teams, producing audio versions of every article, newsletter, or document feels like an editorial burden on top of an already-stretched workflow. Recording narration requires a studio. Hiring voice actors is expensive. Managing audio files alongside text content adds operational complexity that most content teams aren't set up for.

The result? Audio accessibility gets deprioritized. It gets flagged as "nice to have" rather than a compliance requirement. Accessibility audits surface the contrast ratios, the missing alt text, and the unlabeled buttons — but the absence of audio alternatives rarely triggers the same urgency. And so the audio gap persists.

WCAG guidelines address many of the most frustrating accessibility barriers, including missing alternative text on images or other non-text content, and missing video captions, audio descriptions, or transcripts, which may impact people with auditory disabilities, learning disabilities, and neurocognitive conditions. The standards are already written. The expectations are already set. The obstacle for most publishers isn't knowledge — it's workflow.

How Automated Text-to-Speech Closes the Gap Without the Manual Overhead

This is where the equation changes. Automated text-to-speech production — built directly into a content workflow — removes the manual bottleneck entirely.

With EchoLive, converting articles to audio takes seconds. Paste the text, pick a voice, and you have a production-quality audio file that can live alongside your written content. No recording equipment. No scheduling a voice actor. No post-production. For teams publishing at scale, this isn't a one-article trick — it's a workflow transformation.

The same logic applies across content types. Newsletters can generate audio editions automatically. PDF and document-based content can be imported directly and converted with AI-assisted segmentation. RSS feeds can be processed continuously, keeping audio versions of new content current without any human touch. For organizations managing large archives of written content, EchoLive's Smart Import handles txt, markdown, docx, PDF, HTML, and URL-based content — and the segment-based Studio editor lets you apply per-section voices and pacing when a project calls for more nuance.

The accessibility implication is significant. When every piece of published content has a corresponding audio version — hosted alongside the original, clearly labeled as an alternative — you go meaningfully beyond minimum conformance and serve users who benefit from audio. Content is no longer locked to a single sensory modality.

Beyond Compliance: The Quality Difference Neural Voices Make

There's one more dimension worth raising. Not all audio alternatives are created equal. A robotic monotone voice may technically check a compliance box, but it doesn't serve users well. Text-to-speech technology has evolved dramatically: where it was once a robotic-sounding voice, today AI provides human-like speech synthesis, driven by advancements in machine learning, neural networks, and deep learning.

EchoLive draws on 630+ Azure neural voices across multiple languages and tiers — including Standard, HD, and Professional voices — with full SSML support for controlling emphasis, pacing, breaks, and pronunciation. The result isn't just technically accessible audio. It's genuinely listenable audio that respects your readers' time and attention. For users who depend on it, that distinction matters enormously.

Accessibility Is a Business Advantage, Not Just a Legal Requirement

It's worth stepping back from the compliance framing for a moment. Following WCAG helps ensure everyone can use your digital content, expands your audience reach, and reduces legal risk. All three of those outcomes have a direct line to business performance.

Accessible digital products tend to see better user engagement, broader audience reach, and stronger brand perception than less accessible alternatives. The disability community and their families represent substantial global purchasing power, so building inclusive experiences opens your content to a much wider market.

When planning and budgeting for accessible media, it is often helpful to communicate the benefits to organizations, such as search engine optimization, better user experience for all, improved customer satisfaction, and more. Audio content is also indexed and shareable in ways that purely text-based content isn't. Every audio version of your article is a new content asset — distributable, embeddable, and discoverable.

Closing the Gap Starts With One Workflow Change

Most accessibility initiatives fail not because of a lack of intention, but because of friction. The more effort it takes to produce accessible content, the less likely it is to happen consistently. Automated audio production removes that friction at the source.

Solving the accessibility problem means more than checking boxes — it requires embedding accessibility into the DNA of every digital project from the ground up. For publishers, that means audio shouldn't be an afterthought bolted onto a finished article. It should be generated automatically as part of the same publishing step that pushes text to production.

WCAG 2.2 isn't asking for perfection. It's asking for parity — that every user, regardless of ability, can access the same information. Providing audio alternatives for your written content is one of the most direct, high-impact steps you can take toward more inclusive publishing. If you're already producing quality content, the only thing standing between you and better accessibility is the workflow to turn it into audio. EchoLive is built to be exactly that workflow.

Originally published on EchoLive.

DEV Community