DEV Community

clause-netizen
clause-netizen

Posted on

Enrich a CRM lead from nothing but its domain

You get a signup. The form captured one usable field: a work email like dana@northwind.io. Now sales wants to know what this company runs, where to find them, and who to contact. Doing that by hand means opening BuiltWith, scrolling the footer for social links, and grepping the site for a contact address. SiteIntel does the three lookups in one request, so you can run it inside your signup webhook and write the result straight to the CRM.

One request, one URL

You pass a URL. You get back the tech stack, social profiles, and any public emails the site exposes. Here is the curl version against RapidAPI:

curl --request GET \
  --url 'https://siteintel.p.rapidapi.com/v1/analyze?url=https://stripe.com' \
  --header 'X-RapidAPI-Key: YOUR_RAPIDAPI_KEY' \
  --header 'X-RapidAPI-Host: siteintel.p.rapidapi.com'
Enter fullscreen mode Exit fullscreen mode

The two headers are the auth. X-RapidAPI-Key is your subscription key from the RapidAPI dashboard, and X-RapidAPI-Host has to match the API host exactly or RapidAPI rejects the call before SiteIntel ever sees it. Note the url query param takes a full https:// URL, not a bare domain.

Wiring it into a signup handler

Node 18+ ships fetch globally, so there is no dependency to install. You only have a domain off the email, so turn it into a URL before the call. This function returns a flat object you can map onto CRM fields:

async function enrich(domain) {
  const target = `https://${domain}`;
  const res = await fetch(
    `https://siteintel.p.rapidapi.com/v1/analyze?url=${encodeURIComponent(target)}`,
    {
      headers: {
        'X-RapidAPI-Key': process.env.RAPIDAPI_KEY,
        'X-RapidAPI-Host': 'siteintel.p.rapidapi.com',
      },
    }
  );

  if (!res.ok) {
    throw new Error(`SiteIntel ${res.status}: ${await res.text()}`);
  }

  const data = await res.json();

  return {
    technologies: data.detected_tech ?? [],
    socials: data.social_links ?? [],
    emails: data.emails ?? [],
  };
}

// inside your signup webhook
const domain = email.split('@')[1];
const profile = await enrich(domain);
console.log(profile);
Enter fullscreen mode Exit fullscreen mode

A couple of things worth doing in production: pull the domain with email.split('@')[1] and skip the call when it lands on a freemail host like gmail.com, since there is no company site to read. Check res.ok before parsing, because a 403 from RapidAPI returns a plain-text body, not JSON, and res.json() will throw on it.

Reading the response

The payload returns the page's metadata alongside the three lookups. The fields that matter for enrichment look like this:

{
  "query": "https://stripe.com",
  "final_url": "https://stripe.com/",
  "title": "Stripe | Financial Infrastructure...",
  "detected_tech": ["Cloudflare", "React", "HSTS"],
  "social_links": [
    "https://twitter.com/stripe",
    "https://www.linkedin.com/company/stripe"
  ],
  "emails": ["press@stripe.com"]
}
Enter fullscreen mode Exit fullscreen mode

detected_tech is the array I get the most mileage from. It is a flat list of tech names, so routing questions are a one-line membership check — no separate classifier needed: does the stack include Shopify or WooCommerce, is there a CDN like Cloudflare, did the page advertise a generator. social_links is a deduped list of the social URLs found in the markup, so you can filter it for the network you care about and write the LinkedIn URL straight to a column. emails is only what the site publishes publicly in markup, so treat it as a fallback contact, not a verified inbox.

The practical build: a lead-scoring rule in the signup webhook. If detected_tech includes a known commerce or analytics tool, the company is past the hobby stage, so I tag the lead "qualified" and push it to a sales queue:

const commerce = ['Shopify', 'WooCommerce', 'Magento'];
const qualified = profile.technologies.some(t => commerce.includes(t));
Enter fullscreen mode Exit fullscreen mode

Everything else lands in a nurture list. That decision used to wait on a human opening three tabs. Now it writes itself before the welcome email goes out.

One caveat from running this: a domain with an aggressive bot wall or a fully client-rendered site returns thinner detected_tech and an empty emails array. Default those fields with ?? [] like the code above and your handler keeps moving instead of throwing on a missing key.

Working examples and the webhook wiring are in the repo: https://github.com/clause-netizen/siteintel-api — and it is on RapidAPI if you want a managed key: https://rapidapi.com/hidanny0001/api/siteintel

Top comments (0)