DEV Community

Cover image for I built a free audit tool that runs 12 checks in parallel against any domain. Here is the architecture.
Adam McClarin
Adam McClarin

Posted on

I built a free audit tool that runs 12 checks in parallel against any domain. Here is the architecture.

I spent the past few months building Canopy Guard, a free website audit tool that combines SEO, AEO, and GEO visibility scoring with a full security posture check. One scan, one report, about 15 seconds.
This is the technical breakdown of how it works.
The problem
I audit websites for clients as part of my regular work. Every engagement started with the same routine: run the site through an SEO checker, then a separate security header scanner, then manually check for structured data, then look at robots.txt. Four tools, four tabs, four different report formats, and none of them cross-referenced their findings.
I wanted a single scan that checked everything and surfaced the gaps between visibility and security.
Architecture
The backend is a Node.js Express server written in TypeScript, deployed on Railway. The frontend is a React app on Vercel.
When a user enters a domain, the frontend POSTs to /api/scan on the Railway backend. The backend runs 12 scan modules in parallel using Promise.all:
const [dns, tls, headers, htmlStructure, schema, qa, geo,
crawlRisk, endpoints, links, vulns, bizLogic] =
await Promise.all([
checkDNS(domain),
checkTLS(domain),
checkSecurityHeaders(domain),
checkHTMLStructure(domain),
checkSchemaMarkup(domain),
checkQADensity(domain),
checkGEO(domain),
checkAICrawlRisk(domain),
checkExposedEndpoints(domain),
checkInternalLinking(domain),
checkVulnerabilities(domain),
checkBusinessLogic(domain),
]);
Each module is an async function that fetches specific data from the target domain and returns structured results.
The scan modules
DNS: Resolves the domain via Google's public DNS API (dns.google/resolve). Returns whether the domain resolves and the IP address.
TLS: Checks HTTPS reachability, HSTS header presence and max-age value, and whether HTTP redirects to HTTPS.
Security Headers: Checks for all six critical headers: Content-Security-Policy, Strict-Transport-Security, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, and Permissions-Policy.
HTML Structure: Fetches the full page HTML and parses it for H1 count, meta description presence and length, canonical URL match, and page title.
Schema Markup: Extracts all blocks, parses them, identifies FAQPage and Organization types, and flags structural errors like missing @context.<br> Q&amp;A Density: Strips HTML tags, splits into sentences, and calculates the ratio of question-pattern sentences to total sentences. This measures how &quot;answer engine ready&quot; the content is.<br> GEO: Measures chunking efficiency (how well content divides into ~350-token blocks based on header/paragraph structure), citation precision (ratio of specific data points to generic text), and checks for llms.txt at the domain root.<br> AI Crawl Risk: Fetches robots.txt, classifies the policy as PERMISSIVE/BALANCED/RESTRICTIVE/NONE, checks for AI-specific bot blocks (GPTBot, Anthropic, Google-Extended, CCBot, ByteSpider), and looks for crawl-delay directives.<br> Exposed Endpoints: This one was interesting to build. It probes 12 common sensitive paths (/.env, /.git/config, /graphql, etc.). The tricky part: sites with catch-all redirects return 200 for every path. So the module first fetches a guaranteed-nonsense path to detect catch-all behavior. If detected, it compares each probe&#39;s response body length and content-type against the catch-all fingerprint to filter out false positives.<br> Internal Linking: Counts unique internal links on the homepage and samples a few to estimate link depth.<br> Vulnerabilities: Checks server headers for version disclosure and outdated software signatures.<br> Business Logic: Checks for author/publisher attribution markup and cross-references sitemap URLs against homepage links to find orphaned pages.<br> Scoring<br> Each module feeds into a scoring function that normalizes results to 0-1:<br> const seo_score = scoreSEO(htmlStructure, links);<br> const aeo_score = scoreAEO(schema, qa);<br> const geo_score = scoreGEO(geo);<br> const security_posture_score = scoreSecurity(<br> tls, headers, crawlRisk, endpoints, vulns<br> );<br> The scoring weights are calibrated based on what actually impacts discoverability and security posture. For example, in SEO scoring, crawlability gets the highest weight (0.25) because nothing else matters if bots cannot reach your page. In security scoring, TLS validity (0.15) and security headers (0.25 distributed across 6 headers) carry the most weight.<br> Cross-Reference Intelligence<br> This is the differentiator. After scoring, the report engine maps findings across layers:</p> <p>geo_branch.llms_txt_status vs ai_crawl_risk.robots_policy: If llms.txt is MISSING and robots is PERMISSIVE, flag as CRITICAL. AI scrapers have access with no citation guidance.<br> application_security.exposed_endpoints vs GEO context: If endpoints are exposed, AI RAG parsers can index internal routes from JavaScript bundles.<br> business_logic_gaps.data_provenance_leak vs overall visibility: If content has no attribution markup, AI training sets can ingest without linking back.</p> <p>Lead capture<br> When a user wants their PDF report, they enter their email. The frontend sends the lead data to the Railway backend, which writes it to a Notion database via the Notion API. Name, email, domain, all four scores, full report JSON, and a Status field (New/Reviewed/Booked/Closed).<br> The PDF generates entirely in-browser using a print-ready HTML template opened in a new window.<br> What I would do differently<br> If I were starting over, I would add a headless browser module (Playwright) for JavaScript-rendered sites. The current HTML parser uses server-side fetch, which misses content rendered client-side. That is the biggest gap in the current scan accuracy.<br> I would also add a competitor comparison feature: scan two domains side by side and diff the results.<br> Try it<br> Free, no signup: <a href="https://thecanopyguard.com">https://thecanopyguard.com</a><br> The code is not open source yet, but I am considering it. Would love feedback on the scoring methodology, especially the GEO layer.<br> Adam McClarin, CISSP<br> Meraki is Love Digital | Soulful TechShareContent{<br> &quot;$schema&quot;: &quot;<a href="https://json-schema.org/draft/2020-12/schema">https://json-schema.org/draft/2020-12/schema</a>&quot;,<br> &quot;title&quot;: &quot;UnifiedVisibilityAndSecurityAudit&quot;,<br> &quot;description&quot;: &quot;Data schema for a combined SEO/AEO/GEO optimization and cybersecurity audit report.&quot;,<br> &quot;type&quot;: &quot;object&quot;,<br> &quot;required&quot;: [<br> &quot;audit_id&quot;,<br> &quot;target_domain&quot;,<br> &quot;timestapastedPlatform at a glance<br> The CNAPP features offered by Singularity™ Cloud Security brings hyper automation and AI into security auditing. The platform offers modules for cloud security posture management (CSPM), cloud detection and response (CDR), and cloud infrastructure entitlement management (CIEM),pasted</p>

Top comments (0)