MikeL

Posted on Feb 16 • Edited on Apr 16 • Originally published at detectzestack.com

Why Single-Layer Tech Detection Misses Half the Stack (And How to Fix It)

#showdev #webdev #go #api

Most tech stack detection tools use a single method: they download a webpage's HTML and match it against a pattern library. Wappalyzer checks for wp-content in URLs to detect WordPress. BuiltWith looks for _next/ paths to identify Next.js.

This works well for front-end technologies. But it completely misses the infrastructure layer — CDNs, hosting providers, security services, and server-side platforms that never expose themselves in HTML.

In this post, I'll walk through why single-layer detection fails, what you're missing, and how combining multiple detection methods gives you the full picture.

The Problem: HTML-Only Detection Has Blind Spots

Here's what a typical Wappalyzer-style scan returns for a production website:

{
  "technologies": [
    {"name": "React", "confidence": 100},
    {"name": "Webpack", "confidence": 100},
    {"name": "Google Analytics", "confidence": 100}
  ]
}

Three technologies. Looks complete, right?

But this site is actually running behind Cloudflare's CDN, using Let's Encrypt certificates, hosted on Vercel, and serving responses through nginx. None of that shows up in the HTML.

That's because HTML-level detection only sees what the browser sees: script tags, meta elements, cookies, and URL patterns. Everything happening at the DNS, TLS, and HTTP layer is invisible to it.

Layer 1: Wappalyzer Fingerprinting (The Baseline)

HTML fingerprinting is still the foundation. Libraries like wappalyzergo maintain 3,500+ technology signatures that match against:

HTML content and DOM structure
JavaScript global variables
Cookie names and values
URL patterns (/wp-content/, /_next/, /assets/)
Meta tags and headers referenced in the page

This catches most front-end frameworks, CMS platforms, analytics tools, and JavaScript libraries. For many use cases, it's sufficient.

But if you're doing security audits, competitive analysis, or lead enrichment, you need the full stack — not just what's visible in the DOM.

Layer 2: DNS CNAME Analysis

Every website's DNS records tell a story. When a site uses Cloudflare, its CNAME records typically point to something like cdn.cloudflare.net. Vercel uses cname.vercel-dns.com. AWS CloudFront uses cloudfront.net.

Here's how DNS detection works in practice:

// Simplified DNS CNAME lookup
cnames, err := net.LookupCNAME(domain)
// Then match against known patterns:
// "cloudflare.net"  → Cloudflare
// "fastly.net"      → Fastly
// "vercel-dns.com"  → Vercel
// "amazonaws.com"   → AWS

This catches infrastructure that's completely invisible to HTML scanning:

CNAME Pattern	Technology
`*.cloudflare.net`	Cloudflare CDN
`*.fastly.net`	Fastly
`*.vercel-dns.com`	Vercel
`*.amazonaws.com`	AWS (S3, CloudFront, ELB)
`*.azurewebsites.net`	Microsoft Azure
`*.herokuapp.com`	Heroku
`*.netlify.app`	Netlify
`*.shopify.com`	Shopify

DNS lookups add roughly 20-50ms to a detection request, but the intelligence gained is significant. You now know the hosting and CDN layer — information that HTML detection simply cannot provide.

Layer 3: TLS Certificate Inspection

TLS certificates reveal which certificate authority (CA) a site trusts, which often maps directly to their security infrastructure.

// Connect to the site and inspect the TLS certificate
conn, err := tls.Dial("tcp", domain+":443", &tls.Config{})
cert := conn.ConnectionState().PeerCertificates[0]
issuer := cert.Issuer.Organization[0]
// Match against known issuers:
// "Let's Encrypt"           → Let's Encrypt
// "Cloudflare, Inc."        → Cloudflare SSL
// "Amazon"                  → AWS Certificate Manager
// "DigiCert Inc"            → DigiCert

This detection layer tells you:

Let's Encrypt — free automated certificates, common with smaller deployments
Cloudflare, Inc. — using Cloudflare's SSL proxy (even if DNS doesn't show it)
Amazon — AWS Certificate Manager, confirms AWS infrastructure
DigiCert / Sectigo — enterprise-grade certificates, signals larger organization

TLS inspection is especially useful for security audits. Knowing the certificate authority and expiration helps assess a site's security posture.

Layer 4: HTTP Header Matching

HTTP response headers are another goldmine. Many servers and platforms identify themselves through headers that never appear in the HTML:

Server: nginx/1.24.0
X-Powered-By: Express
X-Vercel-Id: iad1::abcde-123456
CF-Cache-Status: HIT
X-Cache: Hit from cloudfront

Custom header patterns I match against:

Header	Value Pattern	Technology
`Server`	`nginx`	nginx
`Server`	`Apache`	Apache
`X-Powered-By`	`Express`	Express.js
`X-Powered-By`	`PHP`	PHP
`CF-Cache-Status`	(any)	Cloudflare
`X-Vercel-Id`	(any)	Vercel
`X-Amz-Cf-Id`	(any)	AWS CloudFront

Headers are free — you already have the HTTP response from the HTML scan, so this adds zero latency.

Putting It All Together

When you run all four layers simultaneously, the detection picture changes dramatically. Here's a real comparison:

HTML-only detection:

React, Webpack, Google Analytics

Four-layer detection:

React, Webpack, Google Analytics (HTML)
Cloudflare CDN (DNS)
Cloudflare SSL (TLS)
nginx (HTTP headers)
Vercel (HTTP headers)

That's 7 technologies vs. 3 — more than double the coverage.

Adding CPE Identifiers for Security

Each detected technology gets a CPE (Common Platform Enumeration) identifier where available. CPE is a standardized naming scheme for software, maintained by NIST.

For example:

nginx → cpe:2.3:a:f5:nginx:*
Apache → cpe:2.3:a:apache:http_server:*
WordPress → cpe:2.3:a:wordpress:wordpress:*

With CPE identifiers, you can query the National Vulnerability Database to find known CVEs for every technology in a site's stack. This turns tech detection into a security audit tool.

Performance: Why Caching Matters

Multi-layer detection is slower than HTML-only scanning. DNS lookups, TLS handshakes, and the HTTP request itself all add latency. A typical 4-layer scan takes 2-5 seconds.

That's why caching is critical. I use a 24-hour cache keyed on the post-redirect domain (not the input URL — important for sites that redirect). Cached responses return in under 5ms and don't count against API quotas.

The key insight: cache on the final domain after following redirects. If someone queries example.com and it redirects to www.example.com, the cache key should be www.example.com. Otherwise you'll get cache misses for the same site.

The Architecture

The whole system is built in Go with:

chi router for HTTP routing
SQLite for caching, rate limiting, and usage tracking
wappalyzergo for HTML fingerprinting
net stdlib for DNS lookups
crypto/tls stdlib for certificate inspection

It runs on a single Fly.io machine. Total codebase is about 3,000 lines of Go.

I chose Go because:

Excellent concurrency — all 4 detection layers run simultaneously using goroutines
Fast TLS and DNS operations via the standard library
Single binary deployment — no runtime dependencies
Low memory footprint for a single-machine deployment

Try It Yourself

I built DetectZeStack as a REST API that implements all four detection layers. You can try the live demo on the homepage — no signup required.

The API also supports:

Batch analysis: up to 10 URLs per request
Stack comparison: see shared vs. unique technologies across sites
Webhook alerts: get notified when a domain is analyzed, with HMAC-signed payloads

Free tier is 100 requests/month.

What detection methods have you found useful for identifying website technologies? I'd love to hear about approaches I haven't considered.

DEV Community