You're doing competitive analysis on a competitor's site. Or you're qualifying sales leads and need to know which prospects run Shopify. Or you're on a security audit and need to confirm which CMS versions are deployed across a client's domain portfolio.
In all of these cases, you need to answer the same question: what technology is this website actually running?
This post walks through how to detect website tech stacks — from manual approaches using raw HTTP inspection, to automated detection using an API. By the end you'll have working Python code for both approaches.
The Manual Approach: What Browsers Know That You Don't See
Before reaching for any tool, it helps to understand what signals technology leaves behind. There are four main layers to inspect.
1. HTTP Response Headers
Servers often leak stack information directly in response headers. A quick curl -I can reveal a surprising amount:
import httpx
def check_headers(url: str) -> dict:
resp = httpx.get(url, follow_redirects=True, timeout=10)
interesting = {}
header_signals = {
"x-powered-by": "runtime/framework",
"server": "web server",
"x-generator": "CMS",
"x-drupal-cache": "Drupal",
"x-shopify-stage": "Shopify",
"x-wix-request-id": "Wix",
}
for header, label in header_signals.items():
if header in resp.headers:
interesting[label] = resp.headers[header]
return interesting
print(check_headers("https://example.com"))
Common findings: X-Powered-By: PHP/8.1.2, Server: nginx, X-Generator: WordPress 6.4.
2. HTML Source Patterns
The page HTML itself is full of fingerprints — meta tags, script paths, CSS class names, and comment blocks:
import re
import httpx
def check_html(url: str) -> list[str]:
resp = httpx.get(url, follow_redirects=True, timeout=10)
html = resp.text
detected = []
patterns = {
"WordPress": [
r'/wp-content/',
r'/wp-includes/',
r'<meta name="generator" content="WordPress',
],
"Shopify": [
r'cdn\.shopify\.com',
r'Shopify\.theme',
],
"Next.js": [
r'__NEXT_DATA__',
r'/_next/static/',
],
"Nuxt.js": [
r'__NUXT__',
r'/_nuxt/',
],
"Wix": [
r'static\.parastorage\.com',
r'X-Wix-Meta-Site-Id',
],
}
for tech, fingerprints in patterns.items():
if any(re.search(p, html) for p in fingerprints):
detected.append(tech)
return detected
3. DNS Records
DNS records can reveal infrastructure choices — CDN providers, email providers, and hosting platforms often leave clear trails:
import dns.resolver
def check_dns(domain: str) -> dict:
results = {}
try:
cname = dns.resolver.resolve(domain, 'CNAME')
results['cname'] = [str(r) for r in cname]
except Exception:
pass
try:
mx = dns.resolver.resolve(domain, 'MX')
mx_hosts = [str(r.exchange) for r in mx]
if any('google' in h for h in mx_hosts):
results['email'] = 'Google Workspace'
elif any('outlook' in h or 'microsoft' in h for h in mx_hosts):
results['email'] = 'Microsoft 365'
except Exception:
pass
try:
txt = dns.resolver.resolve(domain, 'TXT')
for record in txt:
record_str = str(record)
if 'v=spf1' in record_str:
results['spf'] = record_str
except Exception:
pass
return results
4. TLS Certificate Details
The TLS certificate's Subject Alternative Names (SANs) can reveal CDN providers and related domains:
import ssl
import socket
def check_tls(domain: str) -> dict:
ctx = ssl.create_default_context()
with ctx.wrap_socket(socket.socket(), server_hostname=domain) as s:
s.connect((domain, 443))
cert = s.getpeercert()
info = {
"issuer": dict(x[0] for x in cert["issuer"]),
"subject": dict(x[0] for x in cert["subject"]),
"san": [v for _, v in cert.get("subjectAltName", [])],
}
return info
Cloudflare-issued certs, for example, are a dead giveaway that a site is behind Cloudflare's CDN.
The Problem With DIY Detection
Writing these checks yourself is instructive, but there are serious maintenance problems with the DIY approach:
Fingerprint databases go stale fast. New frameworks ship constantly. WordPress updates its patterns. Next.js changes its build output. Your regex collection will bit-rot within months.
Edge cases are everywhere. Sites behind CDNs mask headers. Headless CMS setups don't emit typical CMS fingerprints. Some sites serve different content to bots vs. browsers.
Coverage gaps. The four layers above catch the obvious cases. But what about detecting that a site uses Segment vs. Heap for analytics? Or Intercom vs. Drift for live chat? Or that their checkout is actually Stripe.js hosted on their own domain? Each of those requires its own fingerprint logic.
You'd be rebuilding Wappalyzer — which, notably, was archived in 2023 after being acquired. The commercial alternative, BuiltWith, runs $295/month at minimum. Neither is practical for a side project or small team.
Using an API Instead
The Technology Detection API on RapidAPI handles fingerprint maintenance, framework coverage, and edge cases for you. It checks headers, HTML content, DNS, scripts, cookies, and meta tags across hundreds of technologies in a single call.
Install the Python client:
pip install techdetect
Then detect the tech stack of any URL:
from techdetect import TechDetectClient
client = TechDetectClient(api_key="your_rapidapi_key")
result = client.detect("https://shopify.com")
for tech in result.technologies:
print(f"{tech.name} ({tech.category}): confidence {tech.confidence}%")
Real Output on Real Sites
Here's what you actually get back for a few popular sites.
A WordPress site:
{
"url": "https://techcrunch.com",
"technologies": [
{ "name": "WordPress", "category": "CMS", "confidence": 99 },
{ "name": "PHP", "category": "Programming Language", "confidence": 95 },
{ "name": "MySQL", "category": "Database", "confidence": 80 },
{ "name": "Cloudflare", "category": "CDN", "confidence": 97 },
{ "name": "Google Analytics 4", "category": "Analytics", "confidence": 91 },
{ "name": "Jetpack", "category": "WordPress Plugin", "confidence": 88 }
]
}
A Shopify store:
{
"url": "https://allbirds.com",
"technologies": [
{ "name": "Shopify", "category": "Ecommerce", "confidence": 100 },
{ "name": "Shopify Plus", "category": "Ecommerce", "confidence": 85 },
{ "name": "Klaviyo", "category": "Email Marketing", "confidence": 92 },
{ "name": "Yotpo", "category": "Reviews", "confidence": 78 },
{ "name": "Cloudflare", "category": "CDN", "confidence": 97 }
]
}
A Next.js app:
{
"url": "https://vercel.com",
"technologies": [
{ "name": "Next.js", "category": "JavaScript Framework", "confidence": 99 },
{ "name": "React", "category": "JavaScript Library", "confidence": 99 },
{ "name": "Vercel", "category": "PaaS", "confidence": 100 },
{ "name": "TypeScript", "category": "Programming Language", "confidence": 75 }
]
}
The confidence scores matter here. A score of 99 means the detection is based on a definitive fingerprint (like a meta tag that only WordPress emits). A score of 60-75 means the signal is suggestive but not conclusive.
Putting It Together: Bulk Detection
Here's a more practical pattern — detecting stacks across a list of URLs and filtering by technology:
from techdetect import TechDetectClient
import csv
client = TechDetectClient(api_key="your_rapidapi_key")
urls = [
"https://example1.com",
"https://example2.com",
"https://example3.com",
]
results = []
for url in urls:
try:
result = client.detect(url)
tech_names = [t.name for t in result.technologies]
results.append({
"url": url,
"cms": next((t.name for t in result.technologies if t.category == "CMS"), "Unknown"),
"technologies": ", ".join(tech_names),
})
except Exception as e:
results.append({"url": url, "cms": "Error", "technologies": str(e)})
# Write to CSV
with open("tech_results.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["url", "cms", "technologies"])
writer.writeheader()
writer.writerows(results)
print(f"Done. Results written to tech_results.csv")
Pricing Reality Check
If you're doing this at any kind of scale, here's what the options actually cost:
| Tool | Price | Notes |
|---|---|---|
| BuiltWith | $295/month | Full database access; no API on lower tiers |
| Wappalyzer | ~$250/month | Archived in 2023; APIs shutting down |
| SimilarTech | $199/month | Bulk-only; no per-URL API |
| DIY (self-maintained) | Engineering time | Stale within months |
| Technology Detection API | $9/month | 2,000 lookups/month on Pro plan |
For a side project, internal tool, or early-stage startup, the math is straightforward.
Source Code and Further Reading
The full Python client, including async support and rate limiting, is on GitHub: github.com/dapdevsoftware/techdetect-python
pip install techdetect
API docs and a free tier (no credit card required) are available at RapidAPI. The free plan covers enough requests to prototype and validate your use case before committing to anything.
If you're building something with this — a lead gen tool, a competitive intelligence dashboard, a CMS auditor — drop a comment below. Happy to talk through the architecture.
Top comments (0)