DEV Community

MikeL
MikeL

Posted on

Why Single-Layer Tech Detection Misses Half the Stack (And How to Fix It)

Most tech stack detection tools use a single method: they download a webpage's HTML and match it against a pattern library. Wappalyzer checks for wp-content in URLs to detect WordPress. BuiltWith looks for _next/ paths to identify Next.js.

This works well for front-end technologies. But it completely misses the infrastructure layer — CDNs, hosting providers, security services, and server-side platforms that never expose themselves in HTML.

In this post, I'll walk through why single-layer detection fails, what you're missing, and how combining multiple detection methods gives you the full picture.

The Problem: HTML-Only Detection Has Blind Spots

Here's what a typical Wappalyzer-style scan returns for a production website:

{
  "technologies": [
    {"name": "React", "confidence": 100},
    {"name": "Webpack", "confidence": 100},
    {"name": "Google Analytics", "confidence": 100}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Three technologies. Looks complete, right?

But this site is actually running behind Cloudflare's CDN, using Let's Encrypt certificates, hosted on Vercel, and serving responses through nginx. None of that shows up in the HTML.

That's because HTML-level detection only sees what the browser sees: script tags, meta elements, cookies, and URL patterns. Everything happening at the DNS, TLS, and HTTP layer is invisible to it.

Layer 1: Wappalyzer Fingerprinting (The Baseline)

HTML fingerprinting is still the foundation. Libraries like wappalyzergo maintain 3,500+ technology signatures that match against:

  • HTML content and DOM structure
  • JavaScript global variables
  • Cookie names and values
  • URL patterns (/wp-content/, /_next/, /assets/)
  • Meta tags and headers referenced in the page

This catches most front-end frameworks, CMS platforms, analytics tools, and JavaScript libraries. For many use cases, it's sufficient.

But if you're doing security audits, competitive analysis, or lead enrichment, you need the full stack — not just what's visible in the DOM.

Layer 2: DNS CNAME Analysis

Every website's DNS records tell a story. When a site uses Cloudflare, its CNAME records typically point to something like cdn.cloudflare.net. Vercel uses cname.vercel-dns.com. AWS CloudFront uses cloudfront.net.

Here's how DNS detection works in practice:

// Simplified DNS CNAME lookup
cnames, err := net.LookupCNAME(domain)
// Then match against known patterns:
// "cloudflare.net"  → Cloudflare
// "fastly.net"      → Fastly
// "vercel-dns.com"  → Vercel
// "amazonaws.com"   → AWS
Enter fullscreen mode Exit fullscreen mode

This catches infrastructure that's completely invisible to HTML scanning:

CNAME Pattern Technology
*.cloudflare.net Cloudflare CDN
*.fastly.net Fastly
*.vercel-dns.com Vercel
*.amazonaws.com AWS (S3, CloudFront, ELB)
*.azurewebsites.net Microsoft Azure
*.herokuapp.com Heroku
*.netlify.app Netlify
*.shopify.com Shopify

DNS lookups add roughly 20-50ms to a detection request, but the intelligence gained is significant. You now know the hosting and CDN layer — information that HTML detection simply cannot provide.

Layer 3: TLS Certificate Inspection

TLS certificates reveal which certificate authority (CA) a site trusts, which often maps directly to their security infrastructure.

// Connect to the site and inspect the TLS certificate
conn, err := tls.Dial("tcp", domain+":443", &tls.Config{})
cert := conn.ConnectionState().PeerCertificates[0]
issuer := cert.Issuer.Organization[0]
// Match against known issuers:
// "Let's Encrypt"           → Let's Encrypt
// "Cloudflare, Inc."        → Cloudflare SSL
// "Amazon"                  → AWS Certificate Manager
// "DigiCert Inc"            → DigiCert
Enter fullscreen mode Exit fullscreen mode

This detection layer tells you:

  • Let's Encrypt — free automated certificates, common with smaller deployments
  • Cloudflare, Inc. — using Cloudflare's SSL proxy (even if DNS doesn't show it)
  • Amazon — AWS Certificate Manager, confirms AWS infrastructure
  • DigiCert / Sectigo — enterprise-grade certificates, signals larger organization

TLS inspection is especially useful for security audits. Knowing the certificate authority and expiration helps assess a site's security posture.

Layer 4: HTTP Header Matching

HTTP response headers are another goldmine. Many servers and platforms identify themselves through headers that never appear in the HTML:

Server: nginx/1.24.0
X-Powered-By: Express
X-Vercel-Id: iad1::abcde-123456
CF-Cache-Status: HIT
X-Cache: Hit from cloudfront
Enter fullscreen mode Exit fullscreen mode

Custom header patterns I match against:

Header Value Pattern Technology
Server nginx nginx
Server Apache Apache
X-Powered-By Express Express.js
X-Powered-By PHP PHP
CF-Cache-Status (any) Cloudflare
X-Vercel-Id (any) Vercel
X-Amz-Cf-Id (any) AWS CloudFront

Headers are free — you already have the HTTP response from the HTML scan, so this adds zero latency.

Putting It All Together

When you run all four layers simultaneously, the detection picture changes dramatically. Here's a real comparison:

HTML-only detection:

  • React, Webpack, Google Analytics

Four-layer detection:

  • React, Webpack, Google Analytics (HTML)
  • Cloudflare CDN (DNS)
  • Cloudflare SSL (TLS)
  • nginx (HTTP headers)
  • Vercel (HTTP headers)

That's 7 technologies vs. 3 — more than double the coverage.

Adding CPE Identifiers for Security

Each detected technology gets a CPE (Common Platform Enumeration) identifier where available. CPE is a standardized naming scheme for software, maintained by NIST.

For example:

  • nginx → cpe:2.3:a:f5:nginx:*
  • Apache → cpe:2.3:a:apache:http_server:*
  • WordPress → cpe:2.3:a:wordpress:wordpress:*

With CPE identifiers, you can query the National Vulnerability Database to find known CVEs for every technology in a site's stack. This turns tech detection into a security audit tool.

Performance: Why Caching Matters

Multi-layer detection is slower than HTML-only scanning. DNS lookups, TLS handshakes, and the HTTP request itself all add latency. A typical 4-layer scan takes 2-5 seconds.

That's why caching is critical. I use a 24-hour cache keyed on the post-redirect domain (not the input URL — important for sites that redirect). Cached responses return in under 5ms and don't count against API quotas.

The key insight: cache on the final domain after following redirects. If someone queries example.com and it redirects to www.example.com, the cache key should be www.example.com. Otherwise you'll get cache misses for the same site.

The Architecture

The whole system is built in Go with:

  • chi router for HTTP routing
  • SQLite for caching, rate limiting, and usage tracking
  • wappalyzergo for HTML fingerprinting
  • net stdlib for DNS lookups
  • crypto/tls stdlib for certificate inspection

It runs on a single Fly.io machine. Total codebase is about 3,000 lines of Go.

I chose Go because:

  1. Excellent concurrency — all 4 detection layers run simultaneously using goroutines
  2. Fast TLS and DNS operations via the standard library
  3. Single binary deployment — no runtime dependencies
  4. Low memory footprint for a single-machine deployment

Try It Yourself

I built DetectZeStack as a REST API that implements all four detection layers. You can try the live demo on the homepage — no signup required.

The API also supports:

  • Batch analysis: up to 10 URLs per request
  • Stack comparison: see shared vs. unique technologies across sites
  • Webhook alerts: get notified when a domain is analyzed, with HMAC-signed payloads

Free tier is 100 requests/month.


What detection methods have you found useful for identifying website technologies? I'd love to hear about approaches I haven't considered.

Top comments (0)