DEV Community

Donny
Donny

Posted on

Website Tech Detector API — Complete Guide

Last year, I was working on a competitor analysis tool and found myself constantly trying to figure out what tech stack other websites were using. After building and rebuilding fragile scrapers that broke every other week, I decided to turn my frustration into a product. Here's how I built a production-ready Website Tech Detector API that now serves thousands of requests daily.

The Problem with DIY Scrapers

If you've ever tried to detect website technologies programmatically, you know the pain. I started with simple approaches like:

// My naive first attempt
const detectWordPress = (html) => {
  return html.includes('wp-content') || html.includes('wordpress');
};
Enter fullscreen mode Exit fullscreen mode

This worked... until it didn't. Websites use CDNs, minify assets, implement CSP headers, and actively block scrapers. My scripts would work locally but fail in production. Rate limiting kicked in. Cloudflare started returning 403s. I was spending more time maintaining scrapers than building features.

The breaking point came when a client's competitor analysis dashboard showed 60% "unknown" results because half my detection logic had silently failed. That's when I realized this problem was worth solving properly.

Architecture: Express → Railway → RapidAPI

I chose a straightforward but robust stack:

  • Node.js + Express for the API server
  • Railway for hosting (amazing developer experience)
  • RapidAPI for distribution and monetization
  • Redis for caching and rate limiting
  • Puppeteer for JavaScript-heavy sites

The key insight was building detection logic that combines multiple signals rather than relying on single indicators. Instead of just checking for wp-content, I look for WordPress-specific patterns across HTML structure, HTTP headers, JavaScript variables, and CSS classes.

Here's my core architecture:

// detector.js
class TechDetector {
  constructor() {
    this.detectors = new Map();
    this.loadDetectionRules();
  }

  async analyze(url) {
    const signals = await this.gatherSignals(url);
    const technologies = [];

    for (const [tech, detector] of this.detectors) {
      const confidence = detector.check(signals);
      if (confidence > 0.7) {
        technologies.push({
          name: tech,
          confidence,
          category: detector.category
        });
      }
    }

    return technologies;
  }
}
Enter fullscreen mode Exit fullscreen mode

One Real Endpoint Walkthrough

Let me walk you through my /analyze endpoint - the heart of the API:

app.post('/analyze', rateLimit, async (req, res) => {
  const { url } = req.body;

  // Input validation
  if (!isValidUrl(url)) {
    return res.status(400).json({
      error: 'Invalid URL provided'
    });
  }

  // Check cache first
  const cacheKey = `analysis:${url}`;
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }

  try {
    // Gather multiple signal types
    const signals = await Promise.allSettled([
      this.fetchHTML(url),
      this.analyzeHeaders(url),
      this.checkDNS(url),
      this.renderWithPuppeteer(url) // Only for complex cases
    ]);

    const detector = new TechDetector();
    const technologies = await detector.analyze(signals);

    const result = {
      url,
      technologies,
      analyzedAt: new Date().toISOString(),
      processingTime: Date.now() - startTime
    };

    // Cache for 24 hours
    await redis.setex(cacheKey, 86400, JSON.stringify(result));

    res.json(result);

  } catch (error) {
    console.error('Analysis failed:', error);
    res.status(500).json({
      error: 'Analysis failed',
      message: process.env.NODE_ENV === 'development' ? error.message : undefined
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

The magic happens in signal gathering. Here's a simplified version of how I detect React:

const reactDetector = {
  category: 'JavaScript Framework',
  check: (signals) => {
    let confidence = 0;

    // Check for React in global variables
    if (signals.jsGlobals?.React) confidence += 0.4;

    // Look for React DOM attributes
    if (signals.html?.includes('data-reactroot')) confidence += 0.3;

    // Check for typical React bundle patterns
    if (signals.html?.match(/react[\.-]\d+/)) confidence += 0.2;

    // Look for JSX-compiled patterns
    if (signals.html?.match(/React\.createElement/)) confidence += 0.1;

    return Math.min(confidence, 1.0);
  }
};
Enter fullscreen mode Exit fullscreen mode

Pricing Strategy for API Monetization

Pricing APIs is tricky. I studied successful APIs like Clearbit and IPinfo and landed on a freemium model:

  • Free tier: 100 requests/month
  • Starter: $9/month for 1,000 requests
  • Professional: $29/month for 10,000 requests
  • Enterprise: Custom pricing for 100k+ requests

The key insight: price based on value, not cost. A single analysis might cost me $0.02 in compute, but it saves developers hours of work. I charge $0.029 per request at the starter level, which gives healthy margins while remaining accessible.

RapidAPI handles billing and provides great analytics. I can see exactly which endpoints are popular and adjust pricing accordingly.

Lessons Learned

1. Reliability beats features early on. I spent way too much time adding edge case detections before nailing the basics. A 99% accurate WordPress detector is infinitely more valuable than 50% accurate detection for 100 frameworks.

2. Caching is everything. With 24-hour cache TTL, I serve 80% of requests from Redis. This keeps costs low and response times under 100ms.

3. Rate limiting saves your sanity. Implement it from day one. I use a sliding window approach that's more forgiving than fixed windows:

const rateLimit = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // limit each IP to 100 requests per windowMs
  standardHeaders: true,
  legacyHeaders: false,
});
Enter fullscreen mode Exit fullscreen mode

4. Error handling is a feature. Good error messages with clear remediation steps reduce support tickets dramatically.

The API now processes 50,000+ requests monthly with 99.8% uptime. Not bad for a side project that started as a personal frustration!

Building developer tools taught me that the best products solve problems you've personally felt. If you're frustrated by existing solutions, there's probably a business opportunity waiting.

Ready to stop building fragile scrapers? Check out the Website Tech Detector API and get back to building features that matter.

Top comments (0)