How We Built a Free WCAG 2.1 Accessibility Scanner
Web accessibility remains one of the most overlooked aspects of web development. Most developers understand the importance—accessibility improves UX for everyone, expands market reach, and keeps you compliant with regulations like WCAG 2.1. Yet building accessible websites still feels like an afterthought for many teams.
That's why we built Web Accessibility Checker—a free, open-minded scanner that instantly audits your site against WCAG 2.1 standards. In this article, I'll walk you through the architecture we chose, the challenges we faced, and how we engineered a solution that processes accessibility audits in under 20 seconds, completely free.
The Problem We Solved
Existing accessibility tools fall into two camps:
- Enterprise solutions ($5k+/month) that are overkill for indie developers and SMEs
- Simplistic tools that miss critical issues because they don't go deep enough
We wanted something in the middle: fast, thorough, and genuinely useful for developers making real decisions about accessibility improvements.
The challenge? Building a scanner that:
- Handles JavaScript-rendered content (not just static HTML)
- Doesn't take forever to run
- Stays within API rate limits
- Gives actionable results in real time
Our Two-Phase Scanning Architecture
We settled on a hybrid approach: instant DOM scanning + asynchronous PSI analysis.
Phase 1: Instant DOM Parsing (< 1 second)
When you submit a URL to WAC, we immediately fetch and parse the HTML using PHP's DOMDocument class. This phase checks for:
- Missing alt text on images
- Heading hierarchy violations (h1 → h3 jump)
- Form label associations
- ARIA attribute correctness
- Color contrast (preliminary pass)
- Semantic HTML usage
Here's a simplified version of our DOM scanning core:
<?php
// Phase 1: Instant DOM Analysis
function scanDOM($html, $baseUrl) {
$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($dom);
$issues = [];
// Check 1: Images without alt text
$images = $xpath->query('//img[@alt=""] | //img[not(@alt)]');
foreach ($images as $img) {
$issues[] = [
'type' => 'missing_alt',
'severity' => 'critical',
'element' => $dom->saveHTML($img),
'wcag_level' => '1.1.1',
'message' => 'Image missing alt text (WCAG A)'
];
}
// Check 2: Heading hierarchy
$headings = $xpath->query('//h1 | //h2 | //h3 | //h4 | //h5 | //h6');
$prevLevel = 1;
foreach ($headings as $heading) {
$currentLevel = (int)$heading->nodeName[1];
if ($currentLevel > $prevLevel + 1) {
$issues[] = [
'type' => 'heading_hierarchy',
'severity' => 'warning',
'element' => $heading->textContent,
'wcag_level' => '1.3.1',
'message' => "Heading hierarchy skips from h{$prevLevel} to h{$currentLevel}"
];
}
$prevLevel = $currentLevel;
}
// Check 3: Form labels
$inputs = $xpath->query('//input[@type!="hidden" and @type!="submit" and @type!="button"]');
foreach ($inputs as $input) {
$id = $input->getAttribute('id');
if (!$id) {
$issues[] = [
'type' => 'missing_form_label',
'severity' => 'high',
'element' => $dom->saveHTML($input),
'wcag_level' => '1.3.1',
'message' => 'Form input missing associated label or id'
];
continue;
}
$label = $xpath->query("//label[@for='{$id}']");
if ($label->length === 0) {
$issues[] = [
'type' => 'missing_form_label',
'severity' => 'high',
'element' => $dom->saveHTML($input),
'wcag_level' => '1.3.1',
'message' => "Input #{$id} has no associated <label>"
];
}
}
// Check 4: ARIA button pattern
$ariaButtons = $xpath->query('//div[@role="button"] | //span[@role="button"]');
foreach ($ariaButtons as $btn) {
if (!$btn->getAttribute('tabindex') || $btn->getAttribute('tabindex') < 0) {
$issues[] = [
'type' => 'aria_button_not_focusable',
'severity' => 'high',
'wcag_level' => '2.1.1',
'message' => 'ARIA button role requires tabindex="0" for keyboard access'
];
}
}
return $issues;
}
?>
This phase is instantaneous because we're just parsing local HTML. Users get preliminary results in milliseconds.
Phase 2: Async PSI Analysis (Background)
Phase 1 gives you immediate feedback, but it misses issues that only a real browser engine can detect: JavaScript-injected content, computed styles, actual color contrast ratios, and more.
For this, we use Google's PageSpeed Insights API (PSI), which runs Lighthouse audits on real Chrome. The catch? PSI takes 10–30 seconds. We didn't want users staring at a loading spinner.
Solution: Asynchronous queuing.
When Phase 1 finishes, we immediately return the DOM results to the user's browser. In the background, we queue a PSI scan and poll for results. Once PSI completes, we push an update via WebSocket (or polling as fallback).
Here's the backend handler:
<?php
// /api/scan endpoint
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
$url = filter_var($_POST['url'] ?? '', FILTER_VALIDATE_URL);
if (!$url) {
http_response_code(400);
echo json_encode(['error' => 'Invalid URL']);
exit;
}
// Create scan record
$scanId = bin2hex(random_bytes(16));
$db->insert('scans', [
'scan_id' => $scanId,
'url' => $url,
'status' => 'pending',
'created_at' => date('Y-m-d H:i:s')
]);
// Phase 1: Instant DOM scan
$html = @file_get_contents($url);
$domIssues = scanDOM($html, $url);
// Phase 2: Queue PSI scan (async)
queuePSIScan($scanId, $url);
// Return Phase 1 results immediately
http_response_code(200);
echo json_encode([
'scan_id' => $scanId,
'phase1_issues' => $domIssues,
'phase2_status' => 'queued'
]);
exit;
}
// PSI queue processor (runs via cron or background job)
function queuePSIScan($scanId, $url) {
$db->insert('psi_queue', [
'scan_id' => $scanId,
'url' => $url,
'status' => 'pending'
]);
}
// Cron job (runs every minute)
function processPSIQueue() {
$pending = $db->query(
"SELECT * FROM psi_queue WHERE status = 'pending' LIMIT 10"
);
foreach ($pending as $job) {
$result = callPSIAPI($job['url']);
$db->update('psi_queue',
['status' => 'completed', 'result' => json_encode($result)],
['scan_id' => $job['scan_id']]
);
}
}
function callPSIAPI($url) {
$apiKey = getenv('PAGESPEED_API_KEY');
$endpoint = 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed';
$params = [
'url' => $url,
'key' => $apiKey,
'category' => 'accessibility',
'strategy' => 'mobile'
];
$fullUrl = $endpoint . '?' . http_build_query($params);
$ch = curl_init($fullUrl);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 45,
CURLOPT_HTTPHEADER => ['Content-Type: application/json']
]);
$response = curl_exec($ch);
curl_close($ch);
return json_decode($response, true);
}
?>
The Frontend: Real-Time Updates
On the client side, we use a simple polling mechanism while we wait for PSI results:
async function submitScan(url) {
const response = await fetch('/api/scan', {
method: 'POST',
body: new FormData(Object.assign(document.createElement('form'), {
elements: {
url: { value: url }
}
}))
});
const data = await response.json();
const scanId = data.scan_id;
// Display Phase 1 results immediately
renderResults(data.phase1_issues);
// Poll for Phase 2 results
const pollInterval = setInterval(async () => {
const statusResp = await fetch(`/api/scan/${scanId}/status`);
const status = await statusResp.json();
if (status.phase2_status === 'completed') {
clearInterval(pollInterval);
renderResults([...data.phase1_issues, ...status.phase2_issues]);
}
}, 2000); // Poll every 2 seconds
}
Handling Scale: Rate Limits & Caching
The PSI API has quotas (300 requests/day on free tier). To stay within limits while serving thousands of daily scans:
- Cache PSI results for 24 hours (same URL = same results)
- Batch queue processing to avoid API spikes
- Graceful degradation: if PSI quota is exhausted, return Phase 1 results + a note that advanced checks are temporarily unavailable
function callPSIAPI($url) {
// Check cache first
$cached = $db->queryOne(
"SELECT result FROM psi_cache WHERE url = ? AND created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)",
[$url]
);
if ($cached) {
return json_decode($cached['result'], true);
}
// Call API and cache result
$result = httpGetJSON($psiEndpoint);
$db->insert('psi_cache', [
'url' => $url,
'result' => json_encode($result),
'created_at' => date('Y-m-d H:i:s')
]);
return $result;
}
Why This Approach Works
- Instant feedback: Users don't wait 20+ seconds for any results
- Comprehensive: Phase 1 + Phase 2 cover static + dynamic issues
- Scalable: Async processing means we can handle many concurrent users
- Cost-efficient: We cache PSI results and batch API calls
- User-friendly: Progressive enhancement—Phase 1 results are immediately useful
Open Source Lessons
Building a free, multi-language accessibility scanner taught us that:
- Perfect is the enemy of good: Launch with Phase 1, add Phase 2 later
- Accessibility tooling drives accessibility adoption: Free tools lower the barrier to entry
- Developer experience matters: Fast feedback loops encourage behavior change
If you want to audit your site's accessibility, check out Web Accessibility Checker. It's free, completely anonymous, and works in 10+ European languages. No signup, no tracking—just instant, honest feedback.
Have questions about building accessibility scanners? Drop them in the comments below. I'd love to hear about your favorite a11y tools, too.
Tags: #a11y #webdev #php #wcag #accessibility
Top comments (0)