Benjamin

Posted on Mar 22

How I Score Your Website's Security (And Why I Rebuilt It From Scratch)

#javascript #security #webdev #beginners

I tested my scanner against Mozilla Observatory on 229 sites, found I was wrong on 40% of them, and rebuilt my entire approach. Here's everything I learned.

The problem: security scores that lie

I built AmIHackable to give developers a clear picture of their website's security. Paste your URL, get a score, fix what matters. Simple.

Except it wasn't simple. Users started telling me things like:

"My site is a React SPA on Netlify. Your scanner says I have WordPress, PHP, and an exposed .env file. None of that is true."

"You gave me 3/10 but Mozilla Observatory gives me B+. Your score is misleading for a site with TLS 1.3, solid auth, and zero XSS surface."

"The scanner flagged dangerouslySetInnerHTML as an XSS risk — but that string doesn't exist anywhere in my code. It's in React's own bundle."

These weren't edge cases. When I dug into the data, I found systematic problems.

I compared my scores against Mozilla Observatory on 229 real sites. The results were uncomfortable:

Sites that Observatory rated A+ were getting D from me
Sites that Observatory rated F were getting A+ from me
Overall correlation: 56% — barely better than flipping a coin between two adjacent grades

I had two opposite problems at the same time: too harsh on well-configured sites, too lenient on poorly-configured ones.

What went wrong

Problem 1: SPA false positives

Modern web apps use Single Page Application architecture — one HTML file serves all routes. When you request /actuator/env or /.env on a Netlify SPA, you get a 200 OK with the app's homepage. My scanner saw 200 OK and concluded the file was accessible.

Result: a React site with zero backend vulnerabilities gets flagged for Spring Boot actuator endpoints, PHP config files, and WordPress REST APIs. The score tanks to F.

One user scanned his SPA on Netlify, got 0/10 with 23 findings — 15 of which were phantom API endpoints that didn't exist. He tried again a minute later. Same result. He probably left thinking the tool was broken.

The fix: Before checking sensitive paths, I now probe a random nonsensical URL. If the site returns 200 OK with HTML (the SPA shell), I know it's a catch-all. Every subsequent check compares the response body against this fingerprint — if they match, it's the same SPA shell, not a real file.

That same Netlify site now scores 8.2/10 B with 3 real findings instead of 23 false ones.

Problem 2: severity inflation

When I first built the scanner, I made a deliberate choice: rate conservatively, alert too much rather than too little. I figured it was better to flag a missing CSP as High and have a user add it, than to call it Low and have them ignore it.

That logic felt responsible. But it backfired completely.

Missing CSP? High. Missing X-Frame-Options? Medium. Missing HSTS? High. Session cookie without HttpOnly? High. My scanner was screaming "danger" at sites that were fundamentally fine — just missing some defense-in-depth layers.

When I researched how the security industry actually rates these findings, I realized how far off I was:

Finding	My initial rating	Industry consensus	Source
Missing CSP	High	Low (CVSS 2.1-3.1)	Tenable, Acunetix, Bugcrowd VRT
Missing HSTS	High	Medium (CVSS 4.8-6.5)	Tenable, Probely, OWASP
Missing X-Frame-Options	Medium	Low (CVSS 2.1-4.3)	Bugcrowd P4-P5
Session cookie no HttpOnly	High	Low (CVSS 2.0-3.5)	Requires existing XSS to exploit
Missing Referrer-Policy	Low	Informational	Bugcrowd P5
Source maps exposed	High	Medium (CVSS 3.5-5.3)	Info disclosure, not directly exploitable

The key insight came from a user who put it perfectly: "The headers are nice-to-have, not vulnerabilities. A 3/10 score is misleading for a site with TLS 1.3, solid auth, and tested sanitization."

He was right. A missing security header is the absence of a mitigation, not the presence of a vulnerability. A missing CSP doesn't create XSS — it removes a layer of defense against XSS if one already exists. Those are two fundamentally different things.

Professional penetration testers and bug bounty platforms (Bugcrowd VRT, HackerOne) consistently rate missing headers as P4-P5 (Low/Informational) across millions of real submissions. I was rating them as active threats.

Problem 3: no detection context

My scanner treated every finding identically regardless of what the site actually does. A missing CSP on a static portfolio with zero JavaScript gets the same severity as a missing CSP on an e-commerce site loading 12 third-party scripts. Those aren't the same risk. But my score said they were.

What I rebuilt

Findings first, score second

The most important lesson from user feedback: nobody complained about the score formula. They complained about individual findings being wrong.

A perfectly calibrated scoring model applied to false findings produces false scores. I invested most of my effort into making every finding defensible before touching the scoring.

Changes deployed:

SPA catch-all detection eliminates false positives on Netlify, Vercel, and Cloudflare Pages
Bundle-aware XSS detection skips framework internals (React, Vite, Next.js bundles use dangerouslySetInnerHTML internally — flagging it was misleading)
Platform-aware email checks skip SPF/DMARC on *.netlify.app, *.vercel.app and similar — you don't control that DNS, so the finding isn't actionable
Industry-calibrated severities for all 20+ finding types, each sourced on CVSS v3.1, Bugcrowd VRT, and OWASP WSTG

Technology detection

My tech detection is powered by Wappalyzer's open-source database (7,500+ technologies) combined with headless browser execution via Cloudflare Workers. I now pass JavaScript global variables detected through real browser execution and DNS records (MX, TXT, NS) to the detection engine — revealing hosting providers, email services, and CDN layers without additional requests.

I benchmarked my detection against the real Wappalyzer browser extension on 445 sites. On a real-world test like Doctolib, I match 5 out of 5 of Wappalyzer's key detections (Rails, Cloudflare, Sentry, Didomi, Bot Management) and catch 2 extras (Ruby runtime, Google Tag Manager). There are gaps in smaller libraries (Preact, PDF.js) that I'm closing.

Scoring methodology

Each finding's severity is now aligned with industry standards:

Severity	What it means	Based on	Examples
Critical	Immediate risk, exploitable now	CVSS 9.0-10.0	Exposed `.env` with credentials, SSL failure, database RLS bypass
High	Significant weakness	CVSS 7.0-8.9	Secrets in JS, weak TLS protocol
Medium	Real but conditional risk	CVSS 4.0-6.9	Missing HSTS, open redirect, session cookie without Secure flag
Low	Defense-in-depth gap	CVSS 0.1-3.9	Missing CSP, missing X-Frame-Options, XSS code patterns
Info	Context, no action needed	CVSS 0	SSL configured correctly, Observatory grade, detected tech stack

Every severity decision is traceable to a published source: CVSS v3.1, Bugcrowd VRT, OWASP WSTG, or the CWE Top 25. I keep the full mapping documented internally — every finding has a CWE ID, a WSTG test reference, and a rationale.

What I detect that others can't

Adaptive Backend Probing. When my scanner detects backend credentials in your bundled JavaScript — Supabase URLs and anon keys, Firebase configs — it doesn't just flag "key exposed." It automatically tests whether that key can actually access your data. Are your database tables visible? Is Row Level Security properly configured? Can anonymous users read data they shouldn't?

This transforms a finding from "your key is visible in the JS" into "your key allows reading the users table without authentication."

SPA-Aware Scanning. I detect Single Page Application routing and adapt all checks accordingly. Sites on Netlify, Vercel, and Cloudflare Pages no longer get flagged for sensitive files and API endpoints that are actually just the SPA shell returning 200 OK for everything.

What I'm honest about

This score is not a risk prediction

My score measures your observable security posture from the outside. It doesn't predict whether you'll be breached. A site with a perfect score can have SQL injection in its login form — I can't see that without access to your code.

Think of it as a health checkup, not a diagnosis. It tells you what's visible and what to fix first.

I measure more than Observatory — and that creates divergence

Mozilla Observatory tests 10 things (all security headers). I test 50+. My scores won't always match Observatory — and that's intentional. A site with perfect headers but an exposed .env file gets A+ from Observatory and a much lower score from me. I think that's the right behavior.

Missing mitigations ≠ vulnerabilities

This is worth repeating: a missing CSP header does not make your site vulnerable to XSS. It removes a layer of defense. I score it accordingly — as Low, not Critical.

If you see a Low finding for a missing header, it means "this would strengthen your security posture" — not "you're being hacked right now."

What's next: contextual scoring

Right now, every missing CSP gets the same severity. But a missing CSP on a static portfolio with zero JavaScript is effectively Informational — there's nothing to protect against. The same missing CSP on a site loading Google Tag Manager, Stripe.js, and Intercom has real consequences — those third-party scripts are exactly the attack surface that CSP is designed to control.

I already detect your tech stack, your third-party scripts, your framework. The next step is using that context to adjust severity dynamically:

0 third-party scripts, no inline JS → CSP missing is Info (nothing to protect)
1-5 scripts, no inline → CSP missing is Low (limited surface)
6+ third-party scripts or inline JS → CSP missing is Medium (real attack surface)
eval() or dangerouslySetInnerHTML with user input → CSP missing is High (active risk)

No scanner does this today. Observatory, SecurityHeaders, Qualys — they all score findings in isolation. I'm building toward a score that understands your actual attack surface.

Sources

My methodology is built on published standards, not arbitrary choices:

CVSS v3.1 Specification — Severity ranges for individual findings (FIRST.org)
Bugcrowd Vulnerability Rating Taxonomy — P1-P5 severity from millions of real bug bounty submissions
OWASP Web Security Testing Guide — Test categorization and methodology
CWE Top 25 2024 — Common weakness scoring
Mozilla Observatory Scoring — Reference baseline for header grading
Qualys SSL Labs Rating Guide — Grade cap methodology for SSL
OECD/JRC Handbook on Composite Indicators — Composite scoring framework
Lepochat et al. (2025) "One Does Not Simply Score a Website" — IEEE critique of website scoring algorithms
Tenable, Acunetix, Probely — Individual finding severity benchmarks (CSP, HSTS, Source Maps)

AmIHackable scans your website from the outside — the same perspective an attacker has. It detects your technology stack, checks your security configuration, and gives you a prioritized list of what to fix. Scan your site now.

DEV Community