Fingerprint any website's tech stack from the command line — no Selenium, no headless Chrome

#api #javascript #tutorial #webdev

When you're qualifying leads, auditing competitors, or deciding whether an integration is worth building, the first question is usually "what is this site actually built on?" The browser-based way to answer that — spin up Puppeteer, load the page, sniff window globals and script tags — is slow, memory-hungry, and a pain to run across a list of domains. You just want the server-rendered HTML and headers analyzed, and a clean list back.

That's what /v1/analyze on SiteIntel does: you hand it a URL, it fetches the page server-side, and returns the detected technologies plus the page metadata as JSON.

One request

The API lives at https://siteintel.p.rapidapi.com and authenticates with RapidAPI headers. The url parameter is a full https:// URL, not a bare domain.

curl --request GET \
  --url 'https://siteintel.p.rapidapi.com/v1/analyze?url=https://stripe.com' \
  --header 'X-RapidAPI-Key: YOUR_KEY' \
  --header 'X-RapidAPI-Host: siteintel.p.rapidapi.com'

Same thing in Node, using the global fetch that ships in Node 18+:

const params = new URLSearchParams({ url: "https://stripe.com" });

const res = await fetch(
  `https://siteintel.p.rapidapi.com/v1/analyze?${params}`,
  {
    headers: {
      "X-RapidAPI-Key": process.env.RAPIDAPI_KEY,
      "X-RapidAPI-Host": "siteintel.p.rapidapi.com",
    },
  }
);

const data = await res.json();
console.log(data.detected_tech); // e.g. ["Cloudflare", "React"]

What comes back

The response is a flat object. The fields you'll reach for most:

{
  "query": "https://stripe.com",
  "final_url": "https://stripe.com/",
  "status": 200,
  "fetched_at": "2026-06-30T14:02:11Z",
  "title": "Stripe | Payment Processing Platform",
  "description": "...",
  "canonical": "https://stripe.com/",
  "lang": "en",
  "favicon": "https://stripe.com/favicon.ico",
  "open_graph": {
    "title": "Stripe",
    "image": "https://...",
    "site_name": "Stripe",
    "type": "website"
  },
  "detected_tech": ["Cloudflare", "React"],
  "social_links": ["https://twitter.com/stripe"],
  "emails": [],
  "server": "nginx"
}

Two things worth noting because they save you a parsing step:

detected_tech is a flat array of strings — no nested objects, no confidence scores to dig through. You can .includes("React") directly.
social_links and emails are also flat arrays, pulled from the rendered page, so you get contact surface and tech fingerprint in the same call.

final_url reflects redirects, so if a site bounces http → https or apex → www, you see where it landed. Check status before trusting the rest.

Something to build with it

Point this at a list of domains and you've got a stack-segmented list with no browser in the loop. A small loop that flags every prospect whose detected_tech includes "Shopify", or every site still on a stack you have a migration offer for:

const domains = ["https://a.com", "https://b.com", "https://c.com"];

for (const url of domains) {
  const res = await fetch(
    `https://siteintel.p.rapidapi.com/v1/analyze?url=${encodeURIComponent(url)}`,
    {
      headers: {
        "X-RapidAPI-Key": process.env.RAPIDAPI_KEY,
        "X-RapidAPI-Host": "siteintel.p.rapidapi.com",
      },
    }
  );
  const { final_url, detected_tech, server } = await res.json();
  if (detected_tech.includes("Shopify")) {
    console.log(`${final_url} → Shopify (${server})`);
  }
}

Because each call is a plain HTTP fetch, this runs fine in a serverless function or a cron job — none of the per-domain Chrome startup cost you'd pay doing it client-side.

There are two sibling endpoints on the same base URL when you need more: GET /v1/seo-audit?url=... for an on-page SEO report and GET /v1/screenshot?url=... for a rendered capture.

Working examples and the request snippets are in the repo: https://github.com/clause-netizen/siteintel-api — or grab a managed key on RapidAPI.