How I Built a Website Health Scanner Inside Cloudflare Workers (And the Constraints I Hit)

#webdev #cloudflare #typescript #seo

I wanted one tool instead of opening four browser tabs every time I launched a website.

PageSpeed Insights for performance. SecurityHeaders.com for security headers. Manual checks for meta tags and Open Graph. A separate accessibility checker. It was tedious, and I kept forgetting steps.

So I built WebPulse — a scanner that runs performance, SEO, security, and accessibility checks in a single request. Here's what I learned building it on Cloudflare Workers.

The Stack

Cloudflare Workers — edge runtime, 330+ locations worldwide
Hono — lightweight router that runs perfectly on Workers
D1 — Cloudflare's SQLite-at-the-edge database
TypeScript — because I like catching bugs before runtime

Constraint #1: No DOM API

The biggest surprise: Cloudflare Workers don't have a DOM. No document.querySelector, no DOMParser, nothing.

Most website scanners run in Node.js or a headless browser where you can do:

const dom = new DOMParser().parseFromString(html, 'text/html');
const title = dom.querySelector('title').textContent;

Inside Workers, that throws immediately.

My solution: a regex-based HTML parser from scratch. It sounds scary, but for the structured data I needed (title tags, meta descriptions, heading hierarchy, alt attributes), regex works surprisingly well:

function extractTitle(html: string): string | null {
  const match = html.match(/<title[^>]*>([^<]+)<\/title>/i);
  return match ? match[1].trim() : null;
}

function extractMetaContent(html: string, name: string): string | null {
  const pattern = new RegExp(
    `<meta[^>]+name=["']${name}["'][^>]+content=["']([^"']+)["']`,
    'i'
  );
  const match = html.match(pattern);
  return match ? match[1] : null;
}

The tricky part was handling attribute order variations — <meta content="..." name="description"> vs <meta name="description" content="...">. I ended up writing a generic attribute extractor that handles both orderings.

Constraint #2: CPU Time Limits

Workers on the free plan have a 10ms CPU time budget per request (not wall-clock time — actual CPU time). Paid plans are more generous, but I wanted to stay efficient.

A full scan involves:

Fetching the target URL
Parsing response headers
Parsing HTML (regex is CPU-intensive on large pages)
Running 20+ individual checks
Computing weighted scores
Writing results to D1

The fetch is I/O-bound and doesn't count against CPU time. The parsing does.

I profiled my checks and found a few culprits — mostly checks that were scanning the entire HTML multiple times. The fix was a single-pass extraction that pulls all the data I need in one scan, then runs checks against the extracted data.

Constraint #3: Scoring Across Different Severity Levels

This was the most interesting design problem. How do you combine a score for "missing meta description" with a score for "no HTTPS"?

They're not the same kind of problem. A missing meta description hurts SEO mildly. No HTTPS is a serious security issue that browsers actively warn users about.

I ended up with a weighted scoring system where security issues can cap the overall grade more aggressively:

function computeOverallScore(scores: DimensionScores): number {
  const weighted =
    scores.performance * 0.25 +
    scores.seo * 0.30 +
    scores.security * 0.25 +
    scores.accessibility * 0.20;

  // Security issues cap the overall grade
  if (scores.security < 40) return Math.min(weighted, 50);
  if (scores.security < 60) return Math.min(weighted, 65);

  return Math.round(weighted);
}

This means a site with great SEO but terrible security can't get an A. Feels right.

What the Scanner Actually Checks

Performance (25%): TTFB, HTML size, render-blocking resources, compression (gzip/brotli), image optimization signals

SEO (30%): Title tag, meta description, H1 presence, heading hierarchy, Open Graph tags, canonical URL, robots meta, lang attribute

Security (25%): HTTPS enforcement, HSTS header, Content Security Policy, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Permissions-Policy

Accessibility (20%): Image alt text coverage, form label associations, document language, viewport meta, semantic HTML elements, descriptive link text

Interesting Edge Cases

Redirect chains: Some sites redirect HTTP → HTTPS → www → non-www. I follow up to 3 redirects but flag the chain length in the performance score.

Servers that return 200 for everything: Some servers respond 200 OK to any URL, including /robots.txt when it doesn't exist. I check response body length and content type as secondary signals.

CSP complexity: Content-Security-Policy headers can be extremely long. I check for presence and whether unsafe-inline or unsafe-eval are used, rather than trying to fully validate the policy.

The Result

Scans complete in 2–4 seconds. The edge deployment means the Worker runs close to the target server geographically, which keeps fetch latency low.

Try it at webpulse-api.linkpeek.workers.dev — no signup required.

If you've built scraping or analysis tools on Workers, I'd be curious how you handled the CPU time constraint — there's probably a smarter approach than what I did.