AccessScan

Posted on Mar 23

How I Built a Free Accessibility Scanner with Next.js, Puppeteer, and axe-core

#nextjs #a11y #typescript #webdev

ADA website lawsuits are at an all time high. Pro se filings are surging because AI tools make it easier to file without a lawyer. Settlements range anywhere from a few thousand dollars to well over fifty thousand.

The tools that exist to help? Enterprise scanners cost a fortune per year. Overlay widgets like accessiBe just got hit with an FTC settlement. Free developer tools like WAVE speak in technical jargon that a small business owner will never understand.

I built accessscan.pro to fill the gap: an affordable, honest accessibility scanner that speaks plain language. Here's how.

The Architecture

User enters a URL, which hits the scan API. It checks the rate limit through Upstash Redis, validates the URL with SSRF prevention, launches headless Chromium using puppeteer-core and @sparticuz/chromium-min, navigates to the page, injects axe-core from CDN, runs the accessibility analysis, calculates a compliance score, translates violations into plain language, stores the results in Supabase, and returns everything to the frontend.

The whole thing runs on about twenty dollars a month: Next.js App Router with TypeScript on Vercel Pro, Supabase PostgreSQL on the free tier, Upstash Redis on the free tier, and Resend for email on the free tier.

Running Puppeteer on Vercel Serverless

This was by far the hardest part. The full puppeteer package bundles Chromium at around 400MB. Vercel's function bundle limit is 50MB. That's a non-starter.

The solution was using puppeteer-core (just the API client, tiny footprint) paired with @sparticuz/chromium-min (a stripped down Chromium binary designed for serverless). The Chromium binary is hosted externally and downloaded at runtime.

import puppeteerCore from 'puppeteer-core';
import chromium from '@sparticuz/chromium-min';

const CHROMIUM_PACK_URL = process.env.CHROMIUM_PACK_URL;

async function getBrowser() {
  const isProduction =
    process.env.VERCEL_ENV === 'production'
    || process.env.NODE_ENV === 'production';

  if (isProduction) {
    const executablePath = await chromium.executablePath(CHROMIUM_PACK_URL);
    return puppeteerCore.launch({
      args: [
        ...chromium.args,
        '--disable-dev-shm-usage',
        '--disable-gpu',
      ],
      defaultViewport: { width: 1280, height: 720 },
      executablePath,
      headless: 'shell',
    });
  }

  const puppeteer = await import('puppeteer');
  return puppeteer.default.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });
}

Two gotchas that cost me hours:

First, headless: 'shell' is required, not headless: true. The new headless mode in Chromium defaults to WebDriver BiDi, which causes a protocol mismatch error with @sparticuz/chromium-min. Using 'shell' forces the old headless mode that speaks CDP natively.

Second, you need to mark packages as external. Without serverExternalPackages in next.config.ts, Next.js tries to bundle puppeteer-core and @sparticuz/chromium-min, breaking both:

// next.config.ts
const nextConfig: NextConfig = {
  serverExternalPackages: ['puppeteer-core', '@sparticuz/chromium-min'],
};

I also allocate 1GB of memory and a 60 second timeout for the scan function since headless Chromium is hungry:

// vercel.json
{
  "functions": {
    "src/app/api/scan/route.ts": {
      "memory": 1024,
      "maxDuration": 60
    }
  }
}

axe-core Won't Bundle on Vercel

My first approach was using @axe-core/puppeteer, the official Puppeteer wrapper. Worked perfectly locally. In production? Cannot find module 'axe-core'.

The problem is that the wrapper does a dynamic require('axe-core') at runtime to read the source file from disk. Vercel's serverless bundler can't detect dynamic requires, so axe-core gets silently excluded from the bundle. I also tried fs.readFileSync, which fails on Vercel with a bad file descriptor error.

The fix was to load axe-core from CDN and inject it directly:

const AXE_CORE_CDN_URL =
  'https://cdnjs.cloudflare.com/ajax/libs/axe-core/4.10.2/axe.min.js';
let _axeCoreSource: string | null = null;

async function getAxeCoreSource(): Promise<string> {
  if (!_axeCoreSource) {
    const response = await fetch(AXE_CORE_CDN_URL);
    if (!response.ok) {
      throw new Error(
        `Failed to fetch axe-core: ${response.status}`
      );
    }
    _axeCoreSource = await response.text();
  }
  return _axeCoreSource;
}

Then in the scan function, inject it into the page and run the analysis:

const axeSource = await getAxeCoreSource();
await page.evaluate(axeSource);

const results = await page.evaluate(() => {
  return (window as any).axe.run(document, {
    runOnly: {
      type: 'tag',
      values: [
        'wcag2a', 'wcag2aa',
        'wcag21a', 'wcag21aa', 'wcag22aa',
        'best-practice',
      ],
    },
  });
});

No npm wrapper, no file system reads, no bundler issues. The CDN fetch adds a tiny bit of latency on cold starts and zero on warm invocations.

Making Scans Fast Enough

Scans on Vercel's basic CPU can take a while. Every millisecond counts when your function has a timeout. Here's what I do to speed things up.

Block unnecessary resources. Images, fonts, and media don't affect accessibility analysis:

await page.setRequestInterception(true);
page.on('request', (req) => {
  const resourceType = req.resourceType();
  if (['image', 'font', 'media'].includes(resourceType)) {
    req.abort();
  } else {
    req.continue();
  }
});

Use domcontentloaded instead of networkidle0. We need the DOM, not every network request:

await page.goto(url, {
  waitUntil: 'domcontentloaded',
  timeout: 30000,
});
await new Promise((resolve) => setTimeout(resolve, 2000));

The Plain Language Translation Layer

This is the actual product differentiator. axe-core returns technical violations like "Elements must have sufficient color contrast. Expected contrast ratio of 4.5:1."

A restaurant owner doesn't know what that means. So I maintain a mapping file that translates each rule into three things: a plain title, a human description, and a specific fix suggestion.

export const plainLanguageMap: Record<
  string,
  { title: string; description: string; fix: string }
> = {
  'image-alt': {
    title: 'Images are missing descriptions',
    description:
      "Screen readers can't describe these images to visually " +
      "impaired visitors. Search engines also can't index them.",
    fix:
      'Add a short description (alt text) to each image. ' +
      'Describe what the image shows, e.g., ' +
      '"A family enjoying dinner at our restaurant."',
  },
  'color-contrast': {
    title: 'Text is hard to read (low contrast)',
    description:
      "Some text doesn't have enough contrast against its " +
      "background. Hard to read for people with low vision — " +
      "and annoying for everyone else.",
    fix:
      'Make your text darker or your background lighter. ' +
      'Minimum contrast: 4.5:1 for normal text, 3:1 for large text.',
  },
  'link-name': {
    title: 'Links have no descriptive text',
    description:
      'Some links say "click here" or have no text at all. ' +
      'Screen reader users hear "link, link, link" with no ' +
      'idea where each one goes.',
    fix:
      'Make every link describe where it goes. ' +
      '"View our menu" instead of "Click here."',
  },
  // covers about 30 rules, handling the vast majority of common violations
};

For unmapped rules, I fall back to axe-core's own help text. The mapping currently covers the most common rules, which handles the vast majority of what small business sites will see.

SSRF Prevention

Since the scanner takes arbitrary URLs from users, SSRF prevention is critical. A malicious user could try to make the server hit internal endpoints, cloud metadata services, or the local filesystem.

export async function validateScanUrl(input: string) {
  // 1. Parse and normalize (assume https:// if missing)
  // 2. Only allow http:// and https://
  // 3. Block raw IP addresses
  // 4. Block known internal hostnames (localhost, metadata.google.internal)
  // 5. DNS resolution — check all IPs against private ranges
  // 6. Block URLs over 2000 characters

  const addresses = await resolveHostname(parsed.hostname);
  for (const ip of addresses) {
    if (isPrivateIP(ip)) {
      return {
        valid: false,
        error: 'This URL points to a private network.',
      };
    }
  }
}

Blocked IP ranges include loopback, private networks, link-local (where cloud metadata lives), and IPv6 equivalents.

Scoring

The compliance score uses a weighted deduction model from 0 to 100:

const SEVERITY_WEIGHTS: Record<string, number> = {
  critical: 10,
  serious: 5,
  moderate: 3,
  minor: 1,
};

export function calculateScore(axeResults) {
  const totalChecks =
    axeResults.passes.length +
    axeResults.violations.length +
    axeResults.incomplete.length;

  if (totalChecks === 0) return 100;

  let deductions = 0;
  for (const violation of axeResults.violations) {
    const weight = SEVERITY_WEIGHTS[violation.impact || 'minor'] || 1;
    deductions += weight * violation.nodes.length;
  }

  const maxDeductions = totalChecks * 10;
  const normalized = Math.min((deductions / maxDeductions) * 100, 100);
  return Math.max(0, Math.round(100 - normalized));
}

Each instance of a violation counts separately. Five images missing alt text is five deductions, not one. The score normalizes against total checks so pages with more elements aren't unfairly penalized.

What I Learned

Serverless plus headless browsers is painful but possible. The bundling issues alone took days. If you're doing this, start with puppeteer-core and @sparticuz/chromium-min from day one. Don't try to migrate later.

CDN injection beats npm wrappers for serverless. Any library that uses dynamic require or fs.readFileSync is going to break on Vercel or Lambda. Load from CDN, cache in memory, move on.

Plain language is a product feature, not a nice to have. The scanner uses the same axe-core engine as expensive enterprise tools. The entire value add is making the output understandable to non-technical people.

Be honest about limitations. Automated scanning catches a fraction of all accessibility issues. I put that disclaimer on every results page. Counterintuitively, transparency builds more trust than overclaiming.

Try It

Scan your website for free at accessscan.pro. Takes about 30 seconds, no signup required.

If you're a dev, I'd especially love feedback on the plain language translations (helpful or dumbed down?), the scoring algorithm (does the weighting feel right?), and anything I might be missing in the SSRF prevention.

The scanner is free for single pages. I'm building paid full site scanning next, and if that's interesting, there's a waitlist on the site.

Follow along or roast my code. Both are welcome.

DEV Community