DEV Community

MikeL
MikeL

Posted on • Originally published at detectzestack.com

I Built an API That Detects 7,200+ Technologies — Here's How It Works

You visit a website. Within seconds, you want to know: what's it built with? What CDN? What framework? What analytics?

I needed this for a project — bulk tech stack detection across thousands of domains. Wappalyzer's browser extension is great for one-off lookups, but I needed an API that could handle volume, return structured data, and catch things the browser extension misses.

So I built DetectZeStack, a tech stack detection API in Go. It scans 7,200+ technologies using four detection layers. Here's how it works under the hood.

The Problem With Single-Layer Detection

Most tech detection tools rely on one method: matching patterns in HTML, headers, and JavaScript. That's what Wappalyzer does, and it's genuinely good at it.

But it misses things:

  • DNS-level infrastructure (CDNs, hosting providers identified by CNAME records)
  • TLS certificate issuers (tells you who provides their SSL — Cloudflare, AWS, Let's Encrypt)
  • Infrastructure headers that aren't in the fingerprint database

A site behind Cloudflare with a React frontend might only show "React" with single-layer detection. You'd miss the CDN, the certificate authority, and the hosting provider.

The Four Detection Layers

Layer 1: Wappalyzer Fingerprinting (7,200+ signatures)

The foundation. I use wappalyzergo, which ports Wappalyzer's fingerprint database to Go. It analyzes:

  • HTML content (meta tags, script sources, DOM patterns)
  • HTTP response headers (Server, X-Powered-By, etc.)
  • JavaScript variables and objects
  • Cookie names and patterns

This alone catches most frontend frameworks, CMS platforms, analytics tools, and e-commerce platforms.

Layer 2: DNS CNAME/NS Fingerprinting (111 signatures)

Here's where it gets interesting. When you resolve a domain's DNS, the CNAME chain reveals infrastructure:

stripe.com → stripe.com.cdn.cloudflare.net → ...
Enter fullscreen mode Exit fullscreen mode

That CNAME tells you Cloudflare is involved, even if the HTTP headers are scrubbed clean.

I maintain 111 DNS signatures mapping CNAME patterns to technologies:

  • *.cloudfront.net → Amazon CloudFront
  • *.fastly.net → Fastly
  • *.netlify.app → Netlify
  • *.vercel-dns.com → Vercel
  • *.herokuapp.com → Heroku

The DNS lookup runs in parallel with the HTTP fetch, so it adds zero latency. If DNS times out (2-second cap), the scan still returns HTTP-based results.

Layer 3: TLS Certificate Analysis

Every HTTPS connection includes a TLS handshake with the server's certificate. The certificate issuer reveals the SSL/TLS provider:

Certificate Issuer Technology
Cloudflare, Inc. Cloudflare SSL
Amazon AWS Certificate Manager
Let's Encrypt Let's Encrypt
Google Trust Services Google Cloud
DigiCert Inc DigiCert

This is essentially free — the cert info is already in the TLS handshake, no extra request needed.

Layer 4: Custom Header Matching

Some infrastructure providers add unique headers that aren't in Wappalyzer's database:

  • X-Railway-Request-Id → Railway (PaaS)
  • X-Amz-Cf-Pop → Amazon CloudFront (edge location)
  • X-Nf-Request-Id → Netlify

These fill gaps where standard fingerprinting falls short.

Deduplication

When multiple layers detect the same technology, the API deduplicates by name. If Wappalyzer detects "Cloudflare" from headers AND DNS detects "Cloudflare" from CNAME, you get one entry — not two.

Higher-confidence detections take priority. Wappalyzer's pattern match at 100% confidence beats a DNS-only detection at 80%.

What the Output Looks Like

Here's a real scan of stripe.com:

curl "https://detectzestack.com/demo?url=stripe.com"
Enter fullscreen mode Exit fullscreen mode
{
  "url": "https://stripe.com",
  "domain": "stripe.com",
  "technologies": [
    {
      "name": "Amazon S3",
      "categories": ["CDN"],
      "confidence": 100,
      "description": "Amazon S3 or Amazon Simple Storage Service...",
      "website": "https://aws.amazon.com/s3/",
      "icon": "Amazon S3.svg"
    },
    {
      "name": "Amazon Web Services",
      "categories": ["PaaS"],
      "confidence": 100,
      "website": "https://aws.amazon.com/"
    },
    {
      "name": "DigiCert",
      "categories": ["SSL/TLS certificate authority"],
      "confidence": 70
    },
    {
      "name": "HSTS",
      "categories": ["Security"],
      "confidence": 100
    },
    {
      "name": "Nginx",
      "categories": ["Web servers", "Reverse proxies"],
      "confidence": 100,
      "cpe": "cpe:2.3:a:f5:nginx:*:*:*:*:*:*:*:*"
    }
  ],
  "categories": {
    "CDN": ["Amazon S3"],
    "PaaS": ["Amazon Web Services"],
    "Security": ["HSTS"],
    "SSL/TLS certificate authority": ["DigiCert"],
    "Web servers": ["Nginx"]
  },
  "meta": {
    "status_code": 200,
    "tech_count": 5,
    "scan_depth": "full"
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice the DigiCert entry with 70% confidence — that came from TLS certificate analysis (Layer 3), not HTML fingerprinting.

And here's github.com, which returns 8 technologies:

curl "https://detectzestack.com/demo?url=github.com"
Enter fullscreen mode Exit fullscreen mode
{
  "technologies": [
    { "name": "Amazon S3", "categories": ["CDN"], "confidence": 100 },
    { "name": "Amazon Web Services", "categories": ["PaaS"], "confidence": 100 },
    { "name": "C3.js", "categories": ["JavaScript libraries"], "confidence": 100 },
    { "name": "Contentful", "categories": ["CMS"], "confidence": 100 },
    { "name": "GitHub Pages", "categories": ["PaaS"], "confidence": 100 },
    { "name": "HSTS", "categories": ["Security"], "confidence": 100 },
    { "name": "React", "categories": ["JavaScript frameworks"], "confidence": 100 },
    { "name": "Sectigo", "categories": ["SSL/TLS certificate authority"], "confidence": 70 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Contentful (CMS), C3.js (charting), React (frontend), Sectigo (TLS) — all from a single API call.

Architecture Decisions

Why Go? Concurrency is first-class. The DNS lookup, HTTP fetch, and TLS extraction all run in parallel goroutines. A typical scan completes in 1-2 seconds.

Why not just wrap the Wappalyzer npm package? Performance and deployability. The Go binary is a single executable, ~15MB, runs on a $3/month Fly.io instance. No Node.js runtime, no headless browser, no Puppeteer.

Why SQLite for storage? The API caches scan results to avoid hammering target sites. SQLite is perfect for this — single-file database, zero configuration, handles thousands of concurrent reads. It runs alongside the API process on the same machine.

Why not headless browser rendering? Some JavaScript-heavy sites would benefit from it, but it would 10x the infrastructure cost and response time. Wappalyzer's static analysis catches the vast majority of technologies. If you need rendered-page analysis, tools like Wappalyzer's browser extension are the right choice.

Try It

The /demo endpoint is free, no signup needed:

# Try it right now
curl "https://detectzestack.com/demo?url=your-site.com"
Enter fullscreen mode Exit fullscreen mode

For production use (higher rate limits, change tracking, history), it's on RapidAPI with a free tier — 100 requests/month, no credit card.

There are also alternatives worth considering: Wappalyzer's npm package if you want to self-host detection, and BuiltWith if you need historical data going back years. DetectZeStack's differentiator is the multi-layer detection approach and the structured API response with confidence scores.


If you're building anything that needs tech stack data — competitive analysis, security auditing, lead enrichment — I'd love to hear about your use case. Drop a comment or find me on Twitter/X.

Top comments (0)