Build a Sales Lead Qualification Tool with Technology Detection

#python #api #saas #startup

If you sell software, the technology your prospects use matters more than almost any other qualifying signal. A company running Shopify is a categorically different buyer than one running a custom Rails app. A team using HubSpot is further down the sales maturity curve than one using a spreadsheet.

This is called technographic data — knowing what technology stack a prospect uses — and it's how companies like Clearbit and ZoomInfo justify their price tags. The insight is simple: technology choices predict buying behavior, budget range, and fit for your product better than company size or industry alone.

This post shows you how to build a basic version of this yourself, using the Technology Detection API and about 100 lines of Python.

The Use Case: Qualifying Leads by Tech Stack

Let's make this concrete. Say you've built a Shopify plugin — maybe it handles subscription billing, or adds a custom reviews widget, or integrates with a specific 3PL. Your ideal customer is any store running Shopify.

The problem is that most lead lists don't come pre-tagged with "runs Shopify." You get a CSV of domains from a scrape, a trade show contact list, a LinkedIn export — and you have to figure out which ones are actually Shopify stores.

Manually checking 500 URLs is an afternoon of tedious work. Writing your own detector means maintaining fingerprint patterns as Shopify updates its platform. An API call per URL solves the problem cleanly.

Here's the basic pattern:

from techdetect import TechDetectClient

client = TechDetectClient(api_key="your_rapidapi_key")

def is_shopify(url: str) -> bool:
    result = client.detect(url)
    return any(
        t.name == "Shopify" and t.confidence >= 80
        for t in result.technologies
    )

print(is_shopify("https://allbirds.com"))   # True
print(is_shopify("https://techcrunch.com")) # False

That's it. Now let's scale it up.

Bulk Scanning a Lead List

Assume you have a CSV of prospect domains — call it leads.csv — with a domain column. Here's a script that scans all of them, tags each with their detected tech stack, and writes results to a new CSV.

import csv
import time
from techdetect import TechDetectClient

client = TechDetectClient(api_key="your_rapidapi_key")

TARGET_TECH = "Shopify"
CONFIDENCE_THRESHOLD = 80

def detect_with_retry(url: str, retries: int = 2):
    for attempt in range(retries + 1):
        try:
            return client.detect(url)
        except Exception as e:
            if attempt < retries:
                time.sleep(2 ** attempt)
            else:
                raise e

input_file = "leads.csv"
output_file = "leads_tagged.csv"

with open(input_file, newline="") as infile, open(output_file, "w", newline="") as outfile:
    reader = csv.DictReader(infile)
    fieldnames = reader.fieldnames + ["is_target", "detected_technologies", "confidence"]
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()

    for row in reader:
        domain = row["domain"].strip()
        url = domain if domain.startswith("http") else f"https://{domain}"

        try:
            result = detect_with_retry(url)
            target_tech = next(
                (t for t in result.technologies if t.name == TARGET_TECH),
                None
            )
            row["is_target"] = "yes" if (target_tech and target_tech.confidence >= CONFIDENCE_THRESHOLD) else "no"
            row["detected_technologies"] = ", ".join(t.name for t in result.technologies)
            row["confidence"] = target_tech.confidence if target_tech else 0
        except Exception as e:
            row["is_target"] = "error"
            row["detected_technologies"] = str(e)
            row["confidence"] = 0

        writer.writerow(row)
        print(f"{domain}: {row['is_target']}")

Run this against a 500-row lead list overnight, and you wake up to a pre-qualified CSV you can import directly into your CRM.

Building a Lead Scoring System

Binary filtering (Shopify / not Shopify) is just the start. The more interesting approach is lead scoring — ranking leads by how well their tech stack matches your ideal customer profile.

The intuition: a Shopify store using Stripe, Klaviyo, and Google Analytics is more sophisticated — and likely higher-revenue — than one using only basic Shopify with no third-party tooling.

Here's a simple scoring model:

from techdetect import TechDetectClient

client = TechDetectClient(api_key="your_rapidapi_key")

# Define scoring rules: (technology_name, points, reason)
SCORE_RULES = [
    ("Shopify",          30, "core platform match"),
    ("Shopify Plus",     25, "high-revenue indicator"),
    ("Stripe",           10, "payment sophistication"),
    ("Klaviyo",          10, "email marketing investment"),
    ("Google Analytics", 5,  "tracking maturity"),
    ("Hotjar",           8,  "CRO investment"),
    ("Yotpo",            8,  "reviews platform — growing brand"),
    ("Recharge",         15, "subscription billing — recurring revenue"),
    ("Gorgias",          10, "customer support investment"),
    ("Loop Returns",     10, "returns management — scale indicator"),
]

def score_lead(url: str) -> dict:
    result = client.detect(url)
    tech_names = {t.name for t in result.technologies}

    score = 0
    matched_signals = []

    for tech_name, points, reason in SCORE_RULES:
        if tech_name in tech_names:
            score += points
            matched_signals.append(f"+{points} {tech_name} ({reason})")

    tier = (
        "hot"  if score >= 60 else
        "warm" if score >= 35 else
        "cold" if score >= 15 else
        "disqualified"
    )

    return {
        "url": url,
        "score": score,
        "tier": tier,
        "signals": matched_signals,
        "all_technologies": [t.name for t in result.technologies],
    }

# Example
result = score_lead("https://somestore.com")
print(f"Score: {result['score']} — {result['tier'].upper()}")
for signal in result["signals"]:
    print(f"  {signal}")

A store scoring 60+ (Shopify Plus + Recharge + Klaviyo + Gorgias) is a high-intent, high-budget lead. One scoring 15 (bare Shopify, nothing else) might be too early-stage to convert.

Full Pipeline: Scan 100 URLs, Output a Prioritized CSV

Here's the complete script combining bulk scanning with scoring:

import csv
import time
from techdetect import TechDetectClient

client = TechDetectClient(api_key="your_rapidapi_key")

SCORE_RULES = [
    ("Shopify",          30, "core platform match"),
    ("Shopify Plus",     25, "high-revenue indicator"),
    ("Stripe",           10, "payment sophistication"),
    ("Klaviyo",          10, "email marketing investment"),
    ("Google Analytics", 5,  "tracking maturity"),
    ("Hotjar",           8,  "CRO investment"),
    ("Yotpo",            8,  "reviews platform"),
    ("Recharge",         15, "subscription billing"),
    ("Gorgias",          10, "customer support investment"),
    ("Loop Returns",     10, "returns management"),
]

def score_url(url: str) -> dict:
    try:
        result = client.detect(url)
        tech_names = {t.name for t in result.technologies}
        score = sum(pts for name, pts, _ in SCORE_RULES if name in tech_names)
        signals = [name for name, _, _ in SCORE_RULES if name in tech_names]
        tier = "hot" if score >= 60 else "warm" if score >= 35 else "cold" if score >= 15 else "miss"
        return {
            "url": url,
            "score": score,
            "tier": tier,
            "matched_signals": ", ".join(signals),
            "all_tech": ", ".join(t.name for t in result.technologies),
            "error": "",
        }
    except Exception as e:
        return {"url": url, "score": 0, "tier": "error", "matched_signals": "", "all_tech": "", "error": str(e)}

# Load URLs from input file
with open("prospects.csv", newline="") as f:
    domains = [row["domain"].strip() for row in csv.DictReader(f)]

results = []
for i, domain in enumerate(domains):
    url = domain if domain.startswith("http") else f"https://{domain}"
    print(f"[{i+1}/{len(domains)}] Scanning {url}...")
    results.append(score_url(url))
    time.sleep(0.5)  # stay within rate limits on free/basic plans

# Sort by score descending
results.sort(key=lambda r: r["score"], reverse=True)

# Write output
with open("scored_leads.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["url", "score", "tier", "matched_signals", "all_tech", "error"])
    writer.writeheader()
    writer.writerows(results)

# Print summary
hot = sum(1 for r in results if r["tier"] == "hot")
warm = sum(1 for r in results if r["tier"] == "warm")
cold = sum(1 for r in results if r["tier"] == "cold")
miss = sum(1 for r in results if r["tier"] == "miss")
print(f"\nDone. {hot} hot / {warm} warm / {cold} cold / {miss} not a match")
print("Results written to scored_leads.csv")

Output will be a CSV sorted by score, so your highest-priority leads are at the top. Import it into your CRM, assign the "hot" tier to an SDR queue, and run the cold/warm tier through a lower-touch sequence.

The Cost Math

This kind of enrichment used to require either a Clearbit subscription ($X,XXX/month) or a ZoomInfo seat (similar ballpark). For a developer at a small SaaS company, those price points are hard to justify before you've validated the channel.

Here's what this approach costs with the Technology Detection API:

Volume	Plan	Cost
Up to 100 URLs/month	Free tier	$0
Up to 2,000 URLs/month	Pro	$9/month
Up to 10,000 URLs/month	Ultra	~$29/month

Scanning 2,000 leads per month for $9 is a reasonable budget for almost any growth experiment. If you close even one deal from a batch of enriched leads, the ROI is obvious.

Next Steps

The script above is a starting point. A few directions worth extending:

Add CRM integration — POST scored leads directly to a HubSpot or Pipedrive custom property instead of writing a CSV
Schedule it as a cron job — Run the scan weekly against new leads as they enter your pipeline
Expand the scoring model — Add negative signals (e.g., "uses WooCommerce" scores negative if you only support Shopify)
Filter by geography — Combine with a WHOIS or IP geolocation lookup to target specific markets

The Python client and full source: github.com/dapdevsoftware/techdetect-python

Get an API key (free, no credit card required): Technology Detection API on RapidAPI

If you build something with this — especially if you adapt the scoring model for a specific niche — I'd be curious to hear what signals turned out to be most predictive.