I wanted one tool instead of opening four browser tabs every time I launched a website.
PageSpeed Insights for performance. SecurityHeaders.com for security headers. Manual checks for meta tags and Open Graph. A separate accessibility checker. It was tedious, and I kept forgetting steps.
So I built WebPulse — a scanner that runs performance, SEO, security, and accessibility checks in a single request. Here's what I learned building it on Cloudflare Workers.
The Stack
- Cloudflare Workers — edge runtime, 330+ locations worldwide
- Hono — lightweight router that runs perfectly on Workers
- D1 — Cloudflare's SQLite-at-the-edge database
- TypeScript — because I like catching bugs before runtime
Constraint #1: No DOM API
The biggest surprise: Cloudflare Workers don't have a DOM. No document.querySelector, no DOMParser, nothing.
Most website scanners run in Node.js or a headless browser where you can do:
const dom = new DOMParser().parseFromString(html, 'text/html');
const title = dom.querySelector('title').textContent;
Inside Workers, that throws immediately.
My solution: a regex-based HTML parser from scratch. It sounds scary, but for the structured data I needed (title tags, meta descriptions, heading hierarchy, alt attributes), regex works surprisingly well:
function extractTitle(html: string): string | null {
const match = html.match(/<title[^>]*>([^<]+)<\/title>/i);
return match ? match[1].trim() : null;
}
function extractMetaContent(html: string, name: string): string | null {
const pattern = new RegExp(
`<meta[^>]+name=["']${name}["'][^>]+content=["']([^"']+)["']`,
'i'
);
const match = html.match(pattern);
return match ? match[1] : null;
}
The tricky part was handling attribute order variations — <meta content="..." name="description"> vs <meta name="description" content="...">. I ended up writing a generic attribute extractor that handles both orderings.
Constraint #2: CPU Time Limits
Workers on the free plan have a 10ms CPU time budget per request (not wall-clock time — actual CPU time). Paid plans are more generous, but I wanted to stay efficient.
A full scan involves:
- Fetching the target URL
- Parsing response headers
- Parsing HTML (regex is CPU-intensive on large pages)
- Running 20+ individual checks
- Computing weighted scores
- Writing results to D1
The fetch is I/O-bound and doesn't count against CPU time. The parsing does.
I profiled my checks and found a few culprits — mostly checks that were scanning the entire HTML multiple times. The fix was a single-pass extraction that pulls all the data I need in one scan, then runs checks against the extracted data.
Constraint #3: Scoring Across Different Severity Levels
This was the most interesting design problem. How do you combine a score for "missing meta description" with a score for "no HTTPS"?
They're not the same kind of problem. A missing meta description hurts SEO mildly. No HTTPS is a serious security issue that browsers actively warn users about.
I ended up with a weighted scoring system where security issues can cap the overall grade more aggressively:
function computeOverallScore(scores: DimensionScores): number {
const weighted =
scores.performance * 0.25 +
scores.seo * 0.30 +
scores.security * 0.25 +
scores.accessibility * 0.20;
// Security issues cap the overall grade
if (scores.security < 40) return Math.min(weighted, 50);
if (scores.security < 60) return Math.min(weighted, 65);
return Math.round(weighted);
}
This means a site with great SEO but terrible security can't get an A. Feels right.
What the Scanner Actually Checks
Performance (25%): TTFB, HTML size, render-blocking resources, compression (gzip/brotli), image optimization signals
SEO (30%): Title tag, meta description, H1 presence, heading hierarchy, Open Graph tags, canonical URL, robots meta, lang attribute
Security (25%): HTTPS enforcement, HSTS header, Content Security Policy, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Permissions-Policy
Accessibility (20%): Image alt text coverage, form label associations, document language, viewport meta, semantic HTML elements, descriptive link text
Interesting Edge Cases
Redirect chains: Some sites redirect HTTP → HTTPS → www → non-www. I follow up to 3 redirects but flag the chain length in the performance score.
Servers that return 200 for everything: Some servers respond 200 OK to any URL, including /robots.txt when it doesn't exist. I check response body length and content type as secondary signals.
CSP complexity: Content-Security-Policy headers can be extremely long. I check for presence and whether unsafe-inline or unsafe-eval are used, rather than trying to fully validate the policy.
The Result
Scans complete in 2–4 seconds. The edge deployment means the Worker runs close to the target server geographically, which keeps fetch latency low.
Try it at webpulse-api.linkpeek.workers.dev — no signup required.
If you've built scraping or analysis tools on Workers, I'd be curious how you handled the CPU time constraint — there's probably a smarter approach than what I did.
Top comments (0)