How I handle bulk WHOIS lookups at scale: lessons from running a domain API

#devops #security #node #python

I've been running a WHOIS API since 2016. For a long time, most users were doing one-off lookups, maybe a few hundred a day. Then at some point customers started showing up with different needs entirely. Threat intel pipelines, domain portfolio audits, registrant enrichment at scale. One customer hit 30 million requests in a single month.
That's a different problem from building a WHOIS wrapper. And most of the articles I found when dealing with it were either too basic or skipped the parts that actually bite you in production.
So here's what I've learned.

Rate limits are the easy part

The obvious problem is staying under your API rate limit. 429, back off, retry. Everyone knows this. But with WHOIS specifically there's a second layer: the upstream registries themselves. Hit certain ccTLDs too aggressively and you get silent failures, not clean errors. No 429, just empty responses or stale data that looks valid.
I ran into this with .nu and .gr. Both would return what looked like a successful response, but the data was either missing fields or inconsistent between calls on the same domain. It took a while to realize the issue wasn't my code, it was the registry throttling upstream quietly.
The fix is treating unexpected empty fields as a soft error worth retrying, not just HTTP errors.

Node.js: paced sequential

For most jobs under 100k domains, sequential with proper pacing is boring and reliable:

const API_KEY = 'YOUR_API_KEY';
const RATE_LIMIT = 60; // req/min, adjust to your plan

const sleep = (ms) => new Promise(r => setTimeout(r, ms));

async function whoisLookup(domain, retries = 3) {
  const url = `https://whoisjson.com/api/v1/whois?domain=${encodeURIComponent(domain)}`;

  for (let attempt = 1; attempt <= retries; attempt++) {
    const res = await fetch(url, {
      headers: { Authorization: `TOKEN=${API_KEY}` }
    });

    if (res.status === 429) {
      await sleep(2 ** attempt * 1000);
      continue;
    }

    if (!res.ok) throw new Error(`HTTP ${res.status} for ${domain}`);
    return res.json();
  }

  throw new Error(`${domain}: exceeded retry limit`);
}

async function bulkWhois(domains) {
  const results = [];
  const interval = 60_000 / RATE_LIMIT;

  for (const domain of domains) {
    const t0 = Date.now();

    try {
      const data = await whoisLookup(domain);
      results.push({ domain, ok: true, data });
    } catch (err) {
      results.push({ domain, ok: false, error: err.message });
    }

    const elapsed = Date.now() - t0;
    if (elapsed < interval) await sleep(interval - elapsed);
  }

  return results;
}

One thing most examples get wrong: they sleep a fixed interval regardless of how long the request took. If your interval is 1000ms and the lookup took 800ms, sleep 200ms, not 1000ms. On a 50k domain list that difference is 30+ minutes of wasted time.

Node.js: parallel with p-limit

When you need more throughput, controlled parallelism with p-limit works well. I've found concurrency 5 to be a good default on most plans. Going higher adds queuing overhead without proportional speed gains.

import pLimit from 'p-limit';

const limit = pLimit(5);

async function bulkWhoisParallel(domains) {
  const tasks = domains.map(domain =>
    limit(() => whoisLookup(domain))
  );

  const results = await Promise.allSettled(tasks);

  return results.map((r, i) => ({
    domain: domains[i],
    ok: r.status === 'fulfilled',
    data: r.status === 'fulfilled' ? r.value : null,
    error: r.status === 'rejected' ? r.reason.message : null,
  }));
}

Python: ThreadPoolExecutor

import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed

API_KEY = 'YOUR_API_KEY'
MAX_WORKERS = 5
RATE_LIMIT = 60

session = requests.Session()
session.headers['Authorization'] = f'TOKEN={API_KEY}'

def whois_lookup(domain, retries=3):
    url = f'https://whoisjson.com/api/v1/whois?domain={domain}'

    for attempt in range(1, retries + 1):
        r = session.get(url, timeout=10)

        if r.status_code == 429:
            time.sleep(2 ** attempt)
            continue

        r.raise_for_status()
        return r.json()

    raise RuntimeError(f'{domain}: exceeded retry limit')

def bulk_whois(domains):
    results = []
    interval = 60.0 / RATE_LIMIT

    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
        futures = {pool.submit(whois_lookup, d): d for d in domains}

        for future in as_completed(futures):
            domain = futures[future]
            try:
                data = future.result()
                results.append({'domain': domain, 'ok': True, 'data': data})
            except Exception as exc:
                results.append({'domain': domain, 'ok': False, 'error': str(exc)})

            time.sleep(interval)

    return results

Use requests.Session(). It reuses the TCP connection across requests and makes a noticeable difference at scale. On large lists I've measured 15-20% reduction in total runtime just from this.

Checkpointing for big runs

Anything over 100k domains should write results incrementally. If your process crashes at domain 80k, you don't want to start over.

import json, os

CHECKPOINT = 'results.jsonl'

def already_done(domain):
    if not os.path.exists(CHECKPOINT):
        return False
    with open(CHECKPOINT) as f:
        return any(json.loads(line)['domain'] == domain for line in f)

def save(result):
    with open(CHECKPOINT, 'a') as f:
        f.write(json.dumps(result) + '\n')

Then in your main loop, skip domains that are already in the file. JSONL works better than JSON here because you can append without reading the whole file.

Picking a plan

If you're processing under 5k domains per day, Pro at $10/mo covers it. For daily sweeps in the 50k-500k range, Ultra or Scale. Above that, the unlimited plans make more sense financially since you're paying per rate limit rather than per request.
The full API is at whoisjson.com if you want to try it. Free tier is 1k requests/month, no card needed.

One thing I'd do differently if I started over: build the checkpointing in from day one instead of adding it after the first crash. It's 10 lines of code and it's saved me multiple times. WHOIS pipelines run long enough that something will always go wrong halfway through.