SIKOUTRIS

Posted on Feb 28 • Edited on Mar 9

How We Built a Free WCAG 2.1 Accessibility Scanner

#a11y #webdev #php #wcag

How We Built a Free WCAG 2.1 Accessibility Scanner

Web accessibility remains one of the most overlooked aspects of web development. Most developers understand the importance—accessibility improves UX for everyone, expands market reach, and keeps you compliant with regulations like WCAG 2.1. Yet building accessible websites still feels like an afterthought for many teams.

That's why we built Web Accessibility Checker—a free, open-minded scanner that instantly audits your site against WCAG 2.1 standards. In this article, I'll walk you through the architecture we chose, the challenges we faced, and how we engineered a solution that processes accessibility audits in under 20 seconds, completely free.

The Problem We Solved

Existing accessibility tools fall into two camps:

Enterprise solutions ($5k+/month) that are overkill for indie developers and SMEs
Simplistic tools that miss critical issues because they don't go deep enough

We wanted something in the middle: fast, thorough, and genuinely useful for developers making real decisions about accessibility improvements.

The challenge? Building a scanner that:

Handles JavaScript-rendered content (not just static HTML)
Doesn't take forever to run
Stays within API rate limits
Gives actionable results in real time

Our Two-Phase Scanning Architecture

We settled on a hybrid approach: instant DOM scanning + asynchronous PSI analysis.

Phase 1: Instant DOM Parsing (< 1 second)

When you submit a URL to WAC, we immediately fetch and parse the HTML using PHP's DOMDocument class. This phase checks for:

Missing alt text on images
Heading hierarchy violations (h1 → h3 jump)
Form label associations
ARIA attribute correctness
Color contrast (preliminary pass)
Semantic HTML usage

Here's a simplified version of our DOM scanning core:

<?php
// Phase 1: Instant DOM Analysis
function scanDOM($html, $baseUrl) {
    $dom = new DOMDocument('1.0', 'UTF-8');
    @$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
    $xpath = new DOMXPath($dom);

    $issues = [];

    // Check 1: Images without alt text
    $images = $xpath->query('//img[@alt=""] | //img[not(@alt)]');
    foreach ($images as $img) {
        $issues[] = [
            'type' => 'missing_alt',
            'severity' => 'critical',
            'element' => $dom->saveHTML($img),
            'wcag_level' => '1.1.1',
            'message' => 'Image missing alt text (WCAG A)'
        ];
    }

    // Check 2: Heading hierarchy
    $headings = $xpath->query('//h1 | //h2 | //h3 | //h4 | //h5 | //h6');
    $prevLevel = 1;
    foreach ($headings as $heading) {
        $currentLevel = (int)$heading->nodeName[1];
        if ($currentLevel > $prevLevel + 1) {
            $issues[] = [
                'type' => 'heading_hierarchy',
                'severity' => 'warning',
                'element' => $heading->textContent,
                'wcag_level' => '1.3.1',
                'message' => "Heading hierarchy skips from h{$prevLevel} to h{$currentLevel}"
            ];
        }
        $prevLevel = $currentLevel;
    }

    // Check 3: Form labels
    $inputs = $xpath->query('//input[@type!="hidden" and @type!="submit" and @type!="button"]');
    foreach ($inputs as $input) {
        $id = $input->getAttribute('id');
        if (!$id) {
            $issues[] = [
                'type' => 'missing_form_label',
                'severity' => 'high',
                'element' => $dom->saveHTML($input),
                'wcag_level' => '1.3.1',
                'message' => 'Form input missing associated label or id'
            ];
            continue;
        }

        $label = $xpath->query("//label[@for='{$id}']");
        if ($label->length === 0) {
            $issues[] = [
                'type' => 'missing_form_label',
                'severity' => 'high',
                'element' => $dom->saveHTML($input),
                'wcag_level' => '1.3.1',
                'message' => "Input #{$id} has no associated <label>"
            ];
        }
    }

    // Check 4: ARIA button pattern
    $ariaButtons = $xpath->query('//div[@role="button"] | //span[@role="button"]');
    foreach ($ariaButtons as $btn) {
        if (!$btn->getAttribute('tabindex') || $btn->getAttribute('tabindex') < 0) {
            $issues[] = [
                'type' => 'aria_button_not_focusable',
                'severity' => 'high',
                'wcag_level' => '2.1.1',
                'message' => 'ARIA button role requires tabindex="0" for keyboard access'
            ];
        }
    }

    return $issues;
}
?>

This phase is instantaneous because we're just parsing local HTML. Users get preliminary results in milliseconds.

Phase 2: Async PSI Analysis (Background)

Phase 1 gives you immediate feedback, but it misses issues that only a real browser engine can detect: JavaScript-injected content, computed styles, actual color contrast ratios, and more.

For this, we use Google's PageSpeed Insights API (PSI), which runs Lighthouse audits on real Chrome. The catch? PSI takes 10–30 seconds. We didn't want users staring at a loading spinner.

Solution: Asynchronous queuing.

When Phase 1 finishes, we immediately return the DOM results to the user's browser. In the background, we queue a PSI scan and poll for results. Once PSI completes, we push an update via WebSocket (or polling as fallback).

Here's the backend handler:

<?php
// /api/scan endpoint
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $url = filter_var($_POST['url'] ?? '', FILTER_VALIDATE_URL);

    if (!$url) {
        http_response_code(400);
        echo json_encode(['error' => 'Invalid URL']);
        exit;
    }

    // Create scan record
    $scanId = bin2hex(random_bytes(16));
    $db->insert('scans', [
        'scan_id' => $scanId,
        'url' => $url,
        'status' => 'pending',
        'created_at' => date('Y-m-d H:i:s')
    ]);

    // Phase 1: Instant DOM scan
    $html = @file_get_contents($url);
    $domIssues = scanDOM($html, $url);

    // Phase 2: Queue PSI scan (async)
    queuePSIScan($scanId, $url);

    // Return Phase 1 results immediately
    http_response_code(200);
    echo json_encode([
        'scan_id' => $scanId,
        'phase1_issues' => $domIssues,
        'phase2_status' => 'queued'
    ]);
    exit;
}

// PSI queue processor (runs via cron or background job)
function queuePSIScan($scanId, $url) {
    $db->insert('psi_queue', [
        'scan_id' => $scanId,
        'url' => $url,
        'status' => 'pending'
    ]);
}

// Cron job (runs every minute)
function processPSIQueue() {
    $pending = $db->query(
        "SELECT * FROM psi_queue WHERE status = 'pending' LIMIT 10"
    );

    foreach ($pending as $job) {
        $result = callPSIAPI($job['url']);

        $db->update('psi_queue',
            ['status' => 'completed', 'result' => json_encode($result)],
            ['scan_id' => $job['scan_id']]
        );
    }
}

function callPSIAPI($url) {
    $apiKey = getenv('PAGESPEED_API_KEY');
    $endpoint = 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed';

    $params = [
        'url' => $url,
        'key' => $apiKey,
        'category' => 'accessibility',
        'strategy' => 'mobile'
    ];

    $fullUrl = $endpoint . '?' . http_build_query($params);

    $ch = curl_init($fullUrl);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT => 45,
        CURLOPT_HTTPHEADER => ['Content-Type: application/json']
    ]);

    $response = curl_exec($ch);
    curl_close($ch);

    return json_decode($response, true);
}
?>

The Frontend: Real-Time Updates

On the client side, we use a simple polling mechanism while we wait for PSI results:

async function submitScan(url) {
    const response = await fetch('/api/scan', {
        method: 'POST',
        body: new FormData(Object.assign(document.createElement('form'), {
            elements: {
                url: { value: url }
            }
        }))
    });

    const data = await response.json();
    const scanId = data.scan_id;

    // Display Phase 1 results immediately
    renderResults(data.phase1_issues);

    // Poll for Phase 2 results
    const pollInterval = setInterval(async () => {
        const statusResp = await fetch(`/api/scan/${scanId}/status`);
        const status = await statusResp.json();

        if (status.phase2_status === 'completed') {
            clearInterval(pollInterval);
            renderResults([...data.phase1_issues, ...status.phase2_issues]);
        }
    }, 2000); // Poll every 2 seconds
}

Handling Scale: Rate Limits & Caching

The PSI API has quotas (300 requests/day on free tier). To stay within limits while serving thousands of daily scans:

Cache PSI results for 24 hours (same URL = same results)
Batch queue processing to avoid API spikes
Graceful degradation: if PSI quota is exhausted, return Phase 1 results + a note that advanced checks are temporarily unavailable

function callPSIAPI($url) {
    // Check cache first
    $cached = $db->queryOne(
        "SELECT result FROM psi_cache WHERE url = ? AND created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)",
        [$url]
    );

    if ($cached) {
        return json_decode($cached['result'], true);
    }

    // Call API and cache result
    $result = httpGetJSON($psiEndpoint);

    $db->insert('psi_cache', [
        'url' => $url,
        'result' => json_encode($result),
        'created_at' => date('Y-m-d H:i:s')
    ]);

    return $result;
}

Why This Approach Works

Instant feedback: Users don't wait 20+ seconds for any results
Comprehensive: Phase 1 + Phase 2 cover static + dynamic issues
Scalable: Async processing means we can handle many concurrent users
Cost-efficient: We cache PSI results and batch API calls
User-friendly: Progressive enhancement—Phase 1 results are immediately useful

Open Source Lessons

Building a free, multi-language accessibility scanner taught us that:

Perfect is the enemy of good: Launch with Phase 1, add Phase 2 later
Accessibility tooling drives accessibility adoption: Free tools lower the barrier to entry
Developer experience matters: Fast feedback loops encourage behavior change

If you want to audit your site's accessibility, check out Web Accessibility Checker. It's free, completely anonymous, and works in 10+ European languages. No signup, no tracking—just instant, honest feedback.

Have questions about building accessibility scanners? Drop them in the comments below. I'd love to hear about your favorite a11y tools, too.

Tags: #a11y #webdev #php #wcag #accessibility

DEV Community

How We Built a Free WCAG 2.1 Accessibility Scanner

How We Built a Free WCAG 2.1 Accessibility Scanner

The Problem We Solved

Our Two-Phase Scanning Architecture

Phase 1: Instant DOM Parsing (< 1 second)

Phase 2: Async PSI Analysis (Background)

The Frontend: Real-Time Updates

Handling Scale: Rate Limits & Caching

Why This Approach Works

Open Source Lessons

Top comments (0)