NexGenData

Posted on May 14 • Edited on May 18 • Originally published at thenextgennexus.com

Bulk Domain WHOIS Lookup — API Alternatives to Whois.com (2026)

#apify #scraping #automation #whois

Bulk Domain WHOIS Lookup — API Alternatives to Whois.com (2026)

If you have ever tried to look up WHOIS info for more than 50 domains in a row, you know the drill: whois.com throws a captcha, the whois CLI stops resolving halfway through, and GoDaddy's WHOIS widget silently starts returning empty responses after some undocumented throttle. Running proper bulk WHOIS in 2026 is harder than it should be — thanks in part to GDPR redactions, thick vs. thin registrar models, and a fragmented TLD landscape.

A quick 2026 snapshot to set the stage: there are roughly 359 million registered domains globally (Verisign Domain Name Industry Brief, Q4 2025), spread across 1,500+ TLDs, with .com still dominant at ~157M and newer TLDs like .ai, .xyz, .io, and .dev growing 20-40% year-over-year. The WHOIS ecosystem used to be a single TCP-port-43 protocol with reasonably uniform output, but ICANN's 2024 mandate requiring all registries to support RDAP (Registration Data Access Protocol) means you are now living in a hybrid world: some registries serve RDAP JSON, some still serve legacy WHOIS text, and a surprising number serve both with subtly different field names. GDPR redactions, in effect since May 2018, hide registrant personal data by default for EU-registered domains and in practice for most registrars globally. The mental model of "WHOIS is a phonebook for domains" is outdated. The accurate model is "WHOIS is a fragmented, rate-limited, partially-redacted dataset that requires an aggregation layer to use at scale."

This post compares the main options for bulk WHOIS in 2026, shows how to build a production pipeline in Python, and covers the use cases that actually justify the effort: domain monitoring, brand protection, cybersecurity triage, and M&A diligence. If you are doing any of those at more than handful-of-domains scale, you need an automated pipeline, not a browser tab.

Why this is hard

WHOIS looks simple — one TCP connection to port 43, one record back. Reality:

Every TLD has its own WHOIS server and response format. .com uses Verisign, .io uses NIC.IO, country-code TLDs each run their own registry. Parsing is a mess.
GDPR redactions hide most registrant data. Since 2018, EU-registered domains no longer expose owner names or contact emails. Some registrars redact all domains, not just EU ones.
Rate limits vary wildly. Verisign thin WHOIS tolerates ~20 queries/sec, but query the same .com registrar's thick WHOIS and you will see 5 queries/sec before blocks.
RDAP is slowly replacing WHOIS. Some registries return only RDAP JSON now. Your parser needs both code paths.

Rolling this yourself means maintaining per-TLD parsers, proxy pools, and RDAP fallbacks. Most teams burn a week on it and then the parser breaks again when .ai changes its format.

Intermittent TLDs. Some ccTLDs (.ke, .pk, .sa) are only available via web-form WHOIS with no programmatic endpoint. Others require paid registry access.
Stale cache semantics. Registry responses include a varying TTL. If you query a domain right after a transfer, you may get old registrar data for hours. Knowing when to retry vs. accept staleness is a heuristic, not a rule.

The architecture

[Domain list (CSV, 10k rows)]
           |
           v
    [Apify WHOIS actor] -- rotating proxies, RDAP fallback, TLD-aware parsing
           |
           v
    [Structured JSON dataset]
           |
           v
 [DuckDB / Postgres] -- dedupe + expiry alerting
           |
           v
     [Slack / email] -- "10 domains expire in 30 days"

The core building block is the domain-whois-lookup actor, which handles TLD routing, RDAP fallback, rate-limit evasion, and parsing into a normalized schema.

Step 1: One-shot bulk lookup

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")

domains = [l.strip() for l in open("domains.csv") if l.strip()]

run = client.actor("nexgendata/domain-whois-lookup").call(run_input={
    "domains": domains,
    "include_raw": False,
    "rdap_fallback": True,
})

results = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Processed {len(results)} domains")

A typical result looks like:

{
  "domain": "example.com",
  "registrar": "MarkMonitor Inc.",
  "registrar_iana_id": 292,
  "creation_date": "1995-08-14T04:00:00Z",
  "expiry_date": "2028-08-13T04:00:00Z",
  "last_updated": "2025-08-14T07:01:23Z",
  "nameservers": ["a.iana-servers.net", "b.iana-servers.net"],
  "dnssec": "signedDelegation",
  "status": ["clientDeleteProhibited", "clientTransferProhibited"],
  "registrant_country": "US",
  "redacted": true
}

Notice the redacted: true flag. You still get registrar, nameservers, and dates — which is 80% of what most use cases need.

Step 2: Watch for expiries

Once results land in your dataset, a 10-line SQL query surfaces domains expiring soon:

SELECT domain, registrar, expiry_date,
       date_diff('day', current_date, expiry_date) AS days_left
FROM whois_results
WHERE expiry_date < current_date + INTERVAL 60 DAY
ORDER BY expiry_date ASC;

Wire this into a scheduled Apify run (daily), then a cron job that posts the result to Slack:

import requests, json
slack = "https://hooks.slack.com/services/..."
rows = [r for r in results if days_until_expiry(r) < 60]
if rows:
    text = "Domains expiring soon:\n" + "\n".join(
        f"{r['domain']} -> {r['expiry_date']}" for r in rows)
    requests.post(slack, json={"text": text})

Step 3: Join with DNS

WHOIS alone is often not enough. Pair it with DNS records and you get a much richer picture: who owns it, where it resolves, and whether it is parked.

The dns-propagation-checker actor resolves A, AAAA, MX, and TXT across 15+ resolvers. Combined with WHOIS you can answer:

Which competitor domains changed nameservers this week?
Which newly-registered typosquats already have MX records (meaning they are phishing-ready)?
Which domains in our portfolio are pointed at the wrong CDN?

Here is a reasonably complete typosquat-detection snippet that fans out permutations of a brand, runs WHOIS on each, and flags any that were registered in the last 30 days. This is the bones of a real brand-protection pipeline:

from apify_client import ApifyClient
from datetime import datetime, timezone, timedelta
import itertools

BRAND = "acmecorp"
TLDS = [".com", ".net", ".org", ".co", ".io", ".ai", ".app", ".dev"]

def permutations(brand):
    # character-swap, leet-speak, and homoglyph basics
    swaps = {"a": ["4", "@"], "e": ["3"], "o": ["0"], "i": ["1", "l"]}
    out = {brand, brand + "s", brand + "hq", brand + "-app", "get" + brand}
    for i, ch in enumerate(brand):
        for alt in swaps.get(ch, []):
            out.add(brand[:i] + alt + brand[i+1:])
    return out

candidates = [b + t for b, t in itertools.product(permutations(BRAND), TLDS)]

client = ApifyClient("APIFY_TOKEN")
run = client.actor("nexgendata/domain-whois-lookup").call(run_input={
    "domains": candidates,
    "rdap_fallback": True,
})

cutoff = datetime.now(timezone.utc) - timedelta(days=30)
new_squats = []
for r in client.dataset(run["defaultDatasetId"]).iterate_items():
    if not r.get("creation_date"):
        continue
    created = datetime.fromisoformat(r["creation_date"].replace("Z", "+00:00"))
    if created > cutoff:
        new_squats.append(r)

print(f"Found {len(new_squats)} newly-registered typosquat candidates")
for r in new_squats:
    print(f"- {r['domain']} registered {r['creation_date']} via {r['registrar']}")

A similar pipeline in Node.js for teams more comfortable in JavaScript:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const domains = (await Deno.readTextFile('portfolio.csv')).split('\n').filter(Boolean);
const run = await client.actor('nexgendata/domain-whois-lookup').call({
  domains,
  rdap_fallback: true,
});

const ALERT_DAYS = 45;
const now = Date.now();
const expiring = [];
for await (const item of client.dataset(run.defaultDatasetId).iterate()) {
  if (!item.expiry_date) continue;
  const days = (new Date(item.expiry_date).getTime() - now) / 86400000;
  if (days < ALERT_DAYS) expiring.push({ ...item, days: Math.round(days) });
}
expiring.sort((a, b) => a.days - b.days);
console.table(expiring.map(({ domain, registrar, days }) => ({ domain, registrar, days })));

Use cases

1. Brand protection / typosquatting monitoring. A SaaS company monitors 300 domain permutations of their brand. Daily WHOIS runs flag newly-registered lookalikes within 24 hours, triggering legal takedowns.

2. M&A tech diligence. An acquirer runs WHOIS on every domain owned by the target. Finds three domains registered to an ex-founder's personal email — caught before close.

3. Cybersecurity triage. An incident responder receives 500 suspicious URLs. Bulk WHOIS reveals 240 were registered in the past 14 days at the same registrar — strong phishing-campaign signal.

4. Affiliate / SEO portfolio management. An affiliate marketer owns 120 domains. Monthly WHOIS batches feed a spreadsheet of expiries, saving at least one "oh no I forgot to renew" event per year.

5. Drop-catching research. A domain investor monitors expiring high-value domains by running WHOIS against a curated 10k watchlist weekly. When a domain moves into the 30-day redemption period, a Slack alert routes it to a review queue. In 2025 the same investor caught a 4-letter .com that would have gone to auction for low five figures.

6. Phishing kit attribution. A security vendor scrapes newly observed phishing URLs from public feeds (URLHaus, OpenPhish), runs WHOIS on the domains, and clusters by registrar + nameserver + creation date. Patterns emerge quickly: 60% of a week's phishing campaigns might trace back to 3 registrars. That kind of clustering is not possible without a WHOIS pipeline.

Pricing comparison

Service	10k lookups cost	Historical?	RDAP?	Bulk CSV?
WhoisXML API	~$100	Paid add-on	Yes	Yes
DomainTools Iris	$395+/mo	Yes	Yes	Yes
JsonWhoisAPI	~$80	No	No	Limited
whois.com manual	Free (captcha)	No	No	No
NexGenData actor	~$15	No	Yes	Yes

For the vast majority of use cases — bulk triage, expiry monitoring, diligence — you do not need $400/mo DomainTools. Pay-per-result at $0.0015/domain is comfortably within hobby-budget range.

Common pitfalls

WHOIS is one of those protocols where every shortcut has a hidden cost. These are the ones that catch people:

.ai and .io change formats. Expect occasional parsing misses on exotic ccTLDs. The actor updates parsers monthly, but if you are parsing raw responses yourself, budget for format drift every quarter.
Registrar WHOIS vs. registry WHOIS. Thick TLDs (.org, .info) return full data from the registry. Thin TLDs (.com) require a second hop to the registrar. The actor handles this transparently, but raw WHOIS CLIs do not. If you are using the whois CLI and seeing minimal data for .com, you are probably reading the Verisign thin response and missing the registrar follow-up.
Queries are cached. Most registries serve cached data. Same-second repeated queries will not reflect updates in real time. For change detection, poll every 6-24 hours, not every minute.
Premium WHOIS is its own thing. Some registrars charge extra for "WHOIS history" — time-series data of how WHOIS records changed. If you need that (common for cyber threat intel), only WhoisXML or DomainTools offer it at scale, though you can build a historical dataset yourself by snapshotting daily.
Punycode and IDN domains. A domain like bücher.de shows up as xn--bcher-kva.de in WHOIS. If your input list has UTF-8 brand names, convert to punycode before lookup, or the query will fail silently. Python has idna.encode() for this.
Glue records. WHOIS returns nameserver hostnames, but not their IPs. If you are tracking infrastructure migration, you also want the nameserver's A/AAAA records, which requires a follow-up DNS query.
Contact email redactions. GDPR redacts the registrant email, but sometimes the registrar exposes a forwarding address like contact@domainsbyproxy.com. These are useless for direct contact but reveal the proxy service, which is an attribution signal in itself.
Registrar name inconsistencies. "GoDaddy", "GoDaddy.com, LLC", and "GoDaddy Online Services Cayman Islands Ltd." are all the same company. Normalize on IANA ID (the registrar_iana_id field), not the string name, if you are clustering.
Expiry vs. grace period vs. redemption. A domain past its expiry_date is not immediately available. It enters a 30-day grace, then a 30-day redemption, then a 5-day pending-delete window. Drop-catching pipelines need to model all four phases correctly.

How NexGenData handles this

The domain-whois-lookup actor was built after we watched too many teams reinvent the same broken parser. Specific design choices:

Dual WHOIS + RDAP by default. Every lookup tries RDAP first (faster, structured JSON, consistent schema) and falls back to legacy WHOIS if RDAP is unavailable. You get the best data the registry will give, without having to code two pipelines.

1,500+ TLD parsers maintained. We monitor TLD response-format changes and push parser updates on a rolling basis. If .ai changes its format next month, you do not need to do anything.

Automatic thick-TLD follow-through. For .com/.net, we follow Verisign's referral to the registrar and merge both responses into a single row. No double-query logic on your side.

IANA-ID based registrar normalization. The output always includes both the raw string and the canonical IANA ID, so clustering and reporting work out of the box.

Proxy rotation and polite rate-limiting. We fingerprint each TLD's tolerance and adjust concurrency automatically. You can pass 10k domains at once and the actor will schedule them appropriately.

No API key, no seat fees, pay per result. $0.0015 per lookup. 10,000 domains costs $15. There is no monthly commitment.

Conclusion

Bulk WHOIS is a classic example of "seems trivial, actually has 15 edge cases." Rather than maintain per-TLD parsers yourself, point Apify at your domain list, schedule it, and wire the dataset into your alerting pipeline. You get coverage across 1,500+ TLDs, RDAP fallback, and normalized JSON out of the box.

Start here:

Domain WHOIS Lookup — core WHOIS + RDAP with pay-per-result pricing.
DNS Propagation Checker — resolve across 15 global DNS servers.
Tech Stack Detector — fingerprint the websites behind those domains.

FAQ

Is bulk WHOIS legal?
Yes. WHOIS is a public protocol mandated by ICANN for most generic TLDs. You cannot use WHOIS data for spam or direct marketing to registrants (that violates most registrar AUPs and GDPR). You can use it for security research, portfolio management, diligence, and trademark enforcement.

Does GDPR prevent me from getting any useful data?
No. GDPR redacts personal registrant info (name, email, address for individuals), but registrar, nameservers, creation date, expiry, and status codes are still public. For most operational use cases (monitoring, expiry, security triage) that is plenty.

What if I need the redacted data — say, for a trademark claim?
Use the registrar's abuse or legal contact channel with a formal request. Most registrars will disclose registrant info to a court order, UDRP filing, or legitimate IP complaint. The WHOIS API cannot magic it through.

RDAP vs. WHOIS — which should I use?
RDAP is the future and should be your default. It returns structured JSON, supports HTTPS, includes authenticated access paths for LEA, and is what ICANN is pushing. Legacy WHOIS should be a fallback. The actor handles both automatically.

How often should I re-lookup a domain?
For expiry monitoring: weekly is plenty. For change detection (nameserver flips, registrar transfers): daily. For active incident response: on-demand is fine. There is rarely a reason to poll more than once per 24h per domain.

Can I bulk-register domains via this pipeline?
No. WHOIS is read-only. Registration happens at a registrar (GoDaddy, Namecheap, Porkbun, Google Domains) via their API. You can use the WHOIS pipeline to find available domains and then register them separately.

What about new gTLDs like .xyz, .app, .dev?
All supported. The newer gTLDs almost all serve RDAP, which actually makes them easier to handle than some legacy ccTLDs.

DEV Community

Bulk Domain WHOIS Lookup — API Alternatives to Whois.com (2026)

Bulk Domain WHOIS Lookup — API Alternatives to Whois.com (2026)

Why this is hard

The architecture

Step 1: One-shot bulk lookup

Step 2: Watch for expiries

Step 3: Join with DNS

Use cases

Pricing comparison

Common pitfalls

How NexGenData handles this

Conclusion

FAQ

Related tools

Top comments (0)