DEV Community

MikeL
MikeL

Posted on

I Built an API That Detects 7,200+ Technologies on Any Website — Here's How the Detection Works

Every developer has wondered "what is that site built with?" I built an API to answer that question programmatically, and the detection system behind it is more interesting than you'd think.

Here's how it works under the hood.

The Problem

Browser extensions like Wappalyzer can tell you what technologies a website uses. But if you want to check hundreds or thousands of sites programmatically — say, for market research, security audits, or competitive analysis — you need an API.

BuiltWith offers one at $995/month. Wappalyzer's (now owned by Hostinger) starts at $450/month. I wanted something affordable, so I built my own.

The result is DetectZeStack, a Go API that combines four different detection methods to identify technologies on any website.

The Four Detection Layers

A single HTTP request to a website gives you surprisingly little information. To get accurate results, you need to look at multiple signals.

Layer 1: Wappalyzer Fingerprinting (~7,200 signatures)

This is the heavy hitter. The open-source wappalyzergo library maintains thousands of fingerprint patterns that match against:

  • HTTP response headers (e.g., X-Powered-By: Express → Node.js + Express)
  • HTML body content (e.g., <meta name="generator" content="WordPress">)
  • JavaScript globals (e.g., window.Shopify → Shopify)
  • Cookie names (e.g., _ga → Google Analytics)

Each pattern has a confidence score. When a site's response matches a known pattern, we know what's behind it.

// The core fingerprinting call
fingerprintsWithInfo := d.wap.FingerprintWithInfo(
    fetchResult.Headers,
    fetchResult.Body,
)
Enter fullscreen mode Exit fullscreen mode

This single function call does most of the work — checking HTTP headers and HTML body against all ~7,200 patterns.

Layer 2: DNS CNAME Resolution (111 signatures)

Here's where it gets interesting. Many hosting platforms and CDNs require you to set a CNAME record pointing to their infrastructure. By resolving a domain's DNS records, you can detect technologies that leave zero traces in HTTP responses.

For example:

  • example.com → d1234.cloudfront.net → Amazon CloudFront
  • example.com → example.netlify.app → Netlify
  • example.com → example.ghost.io → Ghost CMS
// DNS lookup runs in parallel with HTTP fetch
go func() {
    techs := DetectFromDNS(ctx, domain, d.dnsResolver)
    dnsCh <- dnsResult{techs: techs}
}()
Enter fullscreen mode Exit fullscreen mode

This runs concurrently with the HTTP request — no extra latency. I maintain 111 CNAME suffix-to-technology mappings covering CDNs, hosting platforms, email providers, and more.

Layer 3: TLS Certificate Inspection (8 issuers)

The TLS handshake happens before any HTTP data is exchanged, and the certificate issuer tells you something about the infrastructure:

  • Cloudflare issues its own certs for proxied sites
  • Amazon certs suggest AWS infrastructure
  • Google Trust Services suggests Google Cloud
  • Let's Encrypt is common on self-hosted setups
var tlsSignatures = []TLSSignature{
    {"Cloudflare", "Cloudflare", []string{"CDN"}},
    {"Amazon", "Amazon Web Services", []string{"Cloud hosting"}},
    {"Let's Encrypt", "Let's Encrypt", []string{"SSL/TLS certificate authority"}},
    // ... 5 more
}
Enter fullscreen mode Exit fullscreen mode

During the HTTP fetch, the TLS certificate issuer is extracted from the connection and matched against known issuers.

Layer 4: Custom HTTP Header Matching (3 signatures)

Some platforms add distinctive headers that wappalyzergo doesn't cover yet. Rather than waiting for upstream, I maintain a small set of custom signatures:

var customHeaderSignatures = []HeaderSignature{
    {"X-Railway-Request-Id", "exact", "Railway", []string{"PaaS"}},
    {"X-Amz-Cf-Pop", "exact", "Amazon CloudFront", []string{"CDN"}},
    {"X-Nf-", "prefix", "Netlify", []string{"PaaS", "CDN"}},
}
Enter fullscreen mode Exit fullscreen mode

These catch services like Railway.app (no Wappalyzer entry) and Netlify edge headers.

How It All Fits Together

The detection flow runs in two parallel paths:

Request: "analyze stripe.com"
         |
    +----+----+
    |         |
  HTTP      DNS
  Fetch    Lookup
    |         |
    +- Wappalyzer fingerprinting (headers + body)
    +- Custom header matching
    +- TLS cert issuer extraction
    |         |
    +----+----+
         |
    Deduplicate & merge
         |
    Sort alphabetically
         |
    Return result
Enter fullscreen mode Exit fullscreen mode

DNS and HTTP run concurrently, so total latency is max(http_time, dns_time) rather than the sum. For most sites, results come back in 1-3 seconds.

The deduplication step is important — if Wappalyzer detects "Cloudflare" from a cf-ray header AND the DNS CNAME points to Cloudflare AND the TLS cert is issued by Cloudflare, we only report it once. Multiple detection paths increase confidence without duplicating results.

Real Output

Here's what the API returns for stripe.com:

{
  "domain": "stripe.com",
  "technologies": [
    {
      "name": "Amazon S3",
      "categories": ["CDN"],
      "confidence": 100
    },
    {
      "name": "Amazon Web Services",
      "categories": ["PaaS"],
      "confidence": 100
    },
    {
      "name": "DigiCert",
      "categories": ["SSL/TLS certificate authority"],
      "confidence": 70
    },
    {
      "name": "HSTS",
      "categories": ["Security"],
      "confidence": 100
    },
    {
      "name": "Nginx",
      "categories": ["Web servers", "Reverse proxies"],
      "confidence": 100,
      "cpe": "cpe:2.3:a:f5:nginx:*:*:*:*:*:*:*:*"
    }
  ],
  "meta": {
    "status_code": 200,
    "tech_count": 5
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice the cpe field on Nginx — that's a Common Platform Enumeration identifier. It lets you cross-reference detected technologies against the NVD vulnerability database. Not every technology has a CPE, but when one exists, the API includes it.

SSRF Protection

When you build an API that fetches arbitrary URLs, security is critical. Without protection, someone could pass http://169.254.169.254/latest/meta-data/ and read your cloud instance's metadata.

The fetcher validates every URL before making a request:

  1. Parse and normalize the URL
  2. Resolve the domain to IP addresses
  3. Block private/reserved IP ranges (127.x, 10.x, 169.254.x, etc.)
  4. Block non-HTTP(S) schemes
  5. Follow redirects (up to 10) but re-validate each redirect target

This is the boring-but-critical part that you skip at your peril.

Try It Yourself

The API has a free demo endpoint — no signup, no API key:

curl "https://detectzestack.com/demo?url=shopify.com"
Enter fullscreen mode Exit fullscreen mode

This returns the full detection result. Rate-limited to 20 requests per hour per IP, which is enough to play around with.

For production use, there's a free tier (100 requests/month) and a Pro tier ($9/month for 1,000 requests) on RapidAPI.

What I'd Do Differently

If I were starting over:

  1. Start with DNS detection earlier. I added it late and was surprised how much it catches that HTTP-only detection misses. Ghost blogs, Netlify sites, and many email providers are invisible to Wappalyzer but obvious from DNS.

  2. Build the TLS layer from day one. Same story — the certificate issuer is free information you already receive during the handshake. Not extracting it is leaving data on the table.

  3. Use wappalyzergo, not Wappalyzer's npm package. The Go port is maintained by ProjectDiscovery, compiles to a single binary, and runs faster than the Node.js version. If you're building a service, Go is a solid choice here.

The Stack

  • Language: Go 1.24
  • Router: chi
  • Database: SQLite (yes, in production — it works great for read-heavy workloads)
  • Detection: wappalyzergo + custom layers
  • Hosting: Fly.io (single region, single instance)
  • DNS: Cloudflare

Total monthly hosting cost: ~$5.


If you have questions about the detection approach or want to see how a specific site gets detected, drop a comment — I'll run it through the API and share the results.

Top comments (0)