NexGenData

Posted on May 14

WHOIS Is Broken in 2026. Here's the RDAP-First Drop-In That Actually Returns JSON

#apify #whois #rdap #migration

WHOIS Is Broken in 2026. Here's the RDAP-First Drop-In That Actually Returns JSON

WHOIS has been quietly dying for a decade, and most teams only noticed in the last eighteen months.

If you ran a domain-intelligence pipeline between 2015 and 2022, the story went like this. You shelled out to the whois binary, or hit a free public wrapper, or paid WhoisXML API $0.00099 per lookup. You wrote a parser full of regex special cases for Verisign vs. Afilias vs. Nominet vs. DENIC, caught the edge cases where .jp returned Shift-JIS, normalized the date formats, and shipped. It worked, barely.

Then two things happened at once. ICANN's RDAP mandate became compliance-enforced in August 2024, which broke the WHOIS TCP/43 endpoints for every gTLD registrar still pointing at them. And WhoisXML, along with most of its competitors, gated their free tiers into oblivion through 2024 and 2025 — first dropping the public 1,000-request-per-month plan, then requiring credit cards for evaluation, then pushing minimums into the low-four-figure range for any serious volume.

If your pipeline still calls whois example.com and greps the output, it is failing silently on somewhere between 30% and 60% of lookups right now. You haven't been paged because the failures are partial, the formats look superficially correct, and downstream consumers treat a missing expiration_date as "probably still valid."

This post is about what happened, why RDAP is a genuine improvement, and how we built whois-replacement — an Apify actor that speaks RDAP first, falls back to legacy WHOIS for TLDs that haven't migrated, and returns a unified JSON schema across any TLD.

Protocol compliance data cited here reflects ICANN's public RDAP deployment tracker as of Q1 2026.

WHOIS was broken before it was broken

The WHOIS protocol is older than HTTPS and only barely younger than DNS. RFC 812 shipped in 1982. RFC 3912, the current spec, was published in 2004 and is roughly three pages long. It specifies a plain-text TCP connection on port 43, the client sends a query terminated with CRLF, the server replies with free-form text and closes the connection. That's the entire protocol.

Three pages is not a lot of spec. There is no response schema, no field encoding requirement, no authentication, no structured error model, no rate-limit signaling. Every registrar implemented the output format however they felt like on whatever Tuesday they first deployed, and the formats ossified.

Here is a real-world whois google.com output, lightly redacted:

   Domain Name: GOOGLE.COM
   Registry Domain ID: 2138514_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.markmonitor.com
   Registrar URL: http://www.markmonitor.com
   Updated Date: 2019-09-09T15:39:04Z
   Creation Date: 1997-09-15T04:00:00Z
   Registry Expiry Date: 2028-09-14T04:00:00Z

Here is whois amazon.co.uk:

    Domain name:
        amazon.co.uk

    Registrant:
        Amazon Europe Core S.a.r.l.

    Registrar:
        MarkMonitor Inc. t/a MarkMonitor [Tag = MARKMONITOR]

Here is whois sony.jp (before encoding normalization):

[ JPRS database provides information on network administration. ]
a. [Domain Name]                SONY.JP
g. [Organization]               ソニーグループ株式会社
k. [Organization Type]          Corporation

These are not cosmetic differences. Labels differ, structure differs, dates are in three incompatible formats, one is in Japanese. Anyone who wrote a "universal" WHOIS parser wrote a 2,000-line regex library with a test matrix covering maybe 40 TLDs out of the 1,500 that exist. This is the baseline dysfunction every domain-intelligence team inherited.

RDAP: what ICANN actually mandated

RDAP — Registration Data Access Protocol — is the replacement. It was standardized in RFC 7480-7484 in 2015, developed jointly by the IETF and ICANN, and has been displacing WHOIS ever since.

The protocol is dramatically saner. It's HTTPS-based. It returns JSON with a specified schema. Queries follow a REST path convention (https://rdap.verisign.com/com/v1/domain/example.com). Responses include standardized events, entities, status, and links arrays. A bootstrap registry at data.iana.org/rdap/dns.json tells you which RDAP server is authoritative for any given TLD. Errors are proper HTTP status codes with structured bodies.

The rollout timeline, for the record:

March 2015: RFC 7480-7484 published, bootstrap registry launched.
August 2019: ICANN requires all gTLD registries to implement RDAP.
November 2023: Mandatory registrar-side RDAP support comes into force.
August 2024: Compliance deadline for full RDAP deprecation of WHOIS for gTLDs. ICANN begins issuing compliance notices to non-compliant registrars.
Q4 2024 through 2025: Major registrars start returning 451 Unavailable on port 43 or redirecting to RDAP-only endpoints.
2026: Most gTLD traffic is RDAP-native. ccTLDs remain a patchwork — some are fully RDAP (.de, .uk, .nl), some are partial (.fr, .jp), some haven't started (.ru, .cn, .tk).

If your code still assumes port 43 TCP, you are assuming a protocol ICANN has actively deprecated.

The WhoisXML squeeze and the market shift to paywalls

Parallel to the RDAP migration, the commercial WHOIS API market has spent 2024 and 2025 aggressively monetizing. This is the broader "free APIs are over" trend that hit weather, maps, and LLM inference. It landed hard in WHOIS.

The old WhoisXML free tier allowed 1,000 lookups per month without a credit card. That tier was restructured in early 2024 to a 100-lookup trial, then to credit-card-required evaluation, then in late 2025 to a "contact sales" gate for anything resembling production use. Their cheapest published commercial tier is now in the mid-three-figures per month. Competitors — DomainTools, IP2WHOIS, ViewDNS — moved the same direction.

There are legitimate reasons. Running a WHOIS aggregator means maintaining scrapers against 1,500+ registrar formats, caching under GDPR-compliant retention, and eating abuse from 10M-request burst traffic. The free tier was a loss leader that got harder to justify as the underlying data got harder to scrape. Teams who used to get by with a free hobby account or a homegrown whois scraper are suddenly staring at a $400/month bill or a broken pipeline.

What a unified replacement should look like

Any serious replacement needs to do five things:

Speak RDAP natively, using the bootstrap registry to route queries.
Fall back to WHOIS gracefully for TLDs that still don't have RDAP (.tk, .ml, .ga, .cn, parts of .ru, a long tail of ccTLDs).
Return one schema, regardless of which protocol was used upstream.
Handle GDPR redaction sanely. Since May 2018, registrant data for most gTLDs has been redacted under ICANN's Registration Data Consensus Policy. A modern parser needs to surface "this field was redacted" as a structured signal, not just return null.
Be priced like a commodity. Domain lookups aren't a differentiated product.

The whois-replacement actor (ID U7mdAONVS7k478lDQ) does all five. PPE is $0.005 per lookup. It uses RDAP as the primary protocol, falls back to WHOIS only when the TLD has no RDAP endpoint, and returns the same JSON shape across .com, .de, .jp, .io, and the long tail.

The unified response schema

Every lookup returns the same top-level keys:

{
  "domain": "example.com",
  "tld": "com",
  "protocol_used": "rdap",
  "registrar": {
    "name": "RESERVED-Internet Assigned Numbers Authority",
    "iana_id": 376,
    "url": "https://www.iana.org"
  },
  "status": ["client transfer prohibited"],
  "nameservers": ["A.IANA-SERVERS.NET", "B.IANA-SERVERS.NET"],
  "dnssec": "signedDelegation",
  "events": {
    "registered": "1995-08-14T04:00:00Z",
    "updated": "2024-08-14T07:01:34Z",
    "expires": "2025-08-13T04:00:00Z"
  },
  "registrant": {
    "redacted": true,
    "redaction_reason": "gdpr",
    "organization": null,
    "country": "US",
    "email": null
  },
  "abuse_contact": {
    "email": "abuse@iana.org",
    "phone": "+1.3105281212"
  },
  "raw_rdap": { "...": "passthrough" },
  "raw_whois": null,
  "fetched_at": "2026-04-17T12:00:00Z"
}

protocol_used is "rdap", "whois", or "hybrid" (the latter for ccTLDs where RDAP returns partial data and WHOIS fills the gaps). raw_rdap and raw_whois are passthroughs for callers who need the original payload for audit or custom parsing.

The schema is stable across TLDs. A .jp lookup, even when it falls back to WHOIS and parses the Japanese-encoded output, returns the same keys with normalized UTF-8 values and ISO 8601 dates.

Old vs. new: the comparison table

Feature	Legacy `whois` binary	whois-replacement	WhoisXML API Pro	IP2WHOIS	Namecheap API
RDAP native	no	yes	yes (2024+)	partial	no
WHOIS fallback for non-RDAP TLDs	yes (raw)	yes (parsed)	yes	yes	no
Unified JSON schema across TLDs	no	yes	yes	partial	no
GDPR redaction signaled as field	no	yes	yes	no	no
ccTLD coverage	~95% (raw text)	~98%	~95%	~80%	~20%
Japanese / Cyrillic / CJK encoding handling	manual	automatic	automatic	manual	n/a
Free tier	yes (self-hosted)	$5/mo Apify credit	gone as of 2025	500/day	reseller-only
Per-lookup price at 50k volume	~$0 + infra	$0.005	$0.008	$0.002	reseller bundled
Rate-limit handling	client problem	actor-managed	managed	managed	managed
Raw payload passthrough	n/a	yes	yes (paid tier)	no	no
Ships a CLI	yes	no (use curl)	no	no	no

IP2WHOIS is cheaper per lookup at bulk volumes but their schema isn't stable across TLDs and they don't signal GDPR redaction. WhoisXML's $0.008 is their published pro-tier rate. The whois-replacement actor's $0.005 covers the full unified-schema + RDAP-first + raw-passthrough bundle.

Migration: the two-line change

If your code currently looks like this:

import subprocess

output = subprocess.check_output(["whois", domain], text=True)
# 200 lines of regex ...

Or like this:

import requests

resp = requests.get(
    f"https://www.whoisxmlapi.com/whoisserver/WhoisService?apiKey={KEY}"
    f"&domainName={domain}&outputFormat=JSON"
)
data = resp.json()["WhoisRecord"]

The migration target is:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("nexgendata/whois-replacement").call(run_input={
    "domains": [domain]
})
record = next(client.dataset(run["defaultDatasetId"]).iterate_items())
print(record["events"]["expires"])

The response shape is deterministic. No TLD-specific branches.

Code examples

Python: bulk expiry-risk scan for a security research team

This is the canonical use case. A security research team wants to identify the expiry-risk profile of 50,000 domains observed in a phishing campaign — which ones are close to expiration (and therefore likely to drop and get caught by a defensive registration program), which ones are freshly registered (strong phishing signal), which ones are parked behind privacy services.

from apify_client import ApifyClient
from datetime import datetime, timezone, timedelta

client = ApifyClient("APIFY_TOKEN")

with open("campaign_domains.txt") as f:
    domains = [line.strip() for line in f if line.strip()]

run = client.actor("nexgendata/whois-replacement").call(run_input={
    "domains": domains,
    "include_raw": False,
    "concurrency": 20,
})

now = datetime.now(timezone.utc)
expiry_risk = []
fresh_registrations = []

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    events = item.get("events") or {}
    expires = events.get("expires")
    registered = events.get("registered")

    if expires:
        exp_dt = datetime.fromisoformat(expires.replace("Z", "+00:00"))
        if exp_dt - now < timedelta(days=30):
            expiry_risk.append((item["domain"], expires))

    if registered:
        reg_dt = datetime.fromisoformat(registered.replace("Z", "+00:00"))
        if now - reg_dt < timedelta(days=7):
            fresh_registrations.append((item["domain"], registered))

print(f"{len(expiry_risk)} domains expiring within 30 days")
print(f"{len(fresh_registrations)} domains registered within last 7 days")

At 50,000 domains and $0.005 per lookup, this run costs $250. With concurrency: 20 it completes in roughly 15 minutes. The unified schema means there's no TLD branching in the consumer code — .com, .io, .ru, .jp, and .co.uk domains all parse the same way.

curl: single-domain lookup for a shell pipeline

curl -X POST "https://api.apify.com/v2/acts/nexgendata~whois-replacement/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domains": ["anthropic.com"]
  }' | jq '.[0] | {domain, registrar: .registrar.name, expires: .events.expires}'

Returns a clean JSON shape suitable for piping into further shell tools. Useful for ad-hoc "is this domain legit" investigations.

Node.js: webhook-driven fresh-registration monitor

Stand up a webhook that pulls a batch of newly registered domains from your CTI feed, looks them up, and flags the ones that match a phishing-lookalike pattern:

const { ApifyClient } = require('apify-client');
const express = require('express');

const apify = new ApifyClient({ token: process.env.APIFY_TOKEN });
const app = express();
app.use(express.json());

const TARGET_BRANDS = ['stripe', 'coinbase', 'binance', 'paypal'];

function looksLikePhish(domain) {
  const lower = domain.toLowerCase();
  return TARGET_BRANDS.some(brand =>
    lower.includes(brand) && lower !== `${brand}.com`
  );
}

app.post('/newly-registered', async (req, res) => {
  const candidates = req.body.domains.filter(looksLikePhish);
  if (candidates.length === 0) return res.json({ flagged: 0 });

  const run = await apify.actor('nexgendata/whois-replacement').call({
    domains: candidates,
  });
  const { items } = await apify.dataset(run.defaultDatasetId).listItems();

  const flagged = items.filter(item => {
    const reg = item.events?.registered;
    if (!reg) return false;
    const ageHours = (Date.now() - new Date(reg).getTime()) / 3.6e6;
    return ageHours < 48;
  });

  res.json({ flagged: flagged.length, domains: flagged });
});

app.listen(3000);

The 48-hour freshness window catches most bulk-registration phishing patterns. Because the actor returns normalized ISO 8601 timestamps regardless of whether the source was RDAP JSON or a parsed .ru WHOIS blob, the age math works uniformly.

Python: RDAP-specific raw payload inspection

For teams that need the raw RDAP response — CT log correlation, DNSSEC chain verification against secureDNS blocks, evidentiary audit trails — pass include_raw: true:

run = client.actor("nexgendata/whois-replacement").call(run_input={
    "domains": ["cloudflare.com"],
    "include_raw": True,
})
record = next(client.dataset(run["defaultDatasetId"]).iterate_items())

raw_rdap = record["raw_rdap"]
for entity in raw_rdap.get("entities", []):
    roles = entity.get("roles", [])
    if "abuse" in roles:
        print("Abuse entity:", entity.get("handle"))

secure_dns = raw_rdap.get("secureDNS")
if secure_dns and secure_dns.get("delegationSigned"):
    print("DNSSEC signed delegation confirmed")

The raw payload adds ~5-20 KB per record, which matters at bulk scale. Leave it off unless you need it.

curl: checking an RDAP-only TLD vs. a WHOIS-only TLD in one call

curl -X POST "https://api.apify.com/v2/acts/nexgendata~whois-replacement/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domains": ["example.com", "example.tk", "example.jp"]
  }' | jq '.[] | {domain, protocol_used, expires: .events.expires}'

Response will show protocol_used: "rdap" for .com, protocol_used: "whois" for .tk (Freenom has no RDAP), and protocol_used: "hybrid" for .jp where RDAP is partial. The schema is identical across all three.

Worked example: 50,000-domain expiry audit for a CTI team

A reader's security-research team runs a nightly monitoring pipeline against the previous 24h of phishing-observed domains (typically 30k-60k). Their old stack was a self-hosted WHOIS server pool (10 VMs running the whois binary through a job queue) plus a 3,000-line Python parser with TLD-specific handlers. It cost roughly $600/month in VM spend plus an estimated 20% engineering time from one SRE keeping the parser up to date. Their internal dashboard showed parser coverage dropping from 94% in early 2024 to 71% in late 2025 as more registrars transitioned to RDAP-only and the TCP/43 scrapes returned truncated or 451-coded responses.

Migration took one engineer a day:

domains = load_observed_domains(last_24h=True)  # 30k-60k
chunks = [domains[i:i+2000] for i in range(0, len(domains), 2000)]

for chunk in chunks:
    run = client.actor("nexgendata/whois-replacement").call(run_input={
        "domains": chunk,
        "concurrency": 25,
    })
    upload_to_snowflake(client.dataset(run["defaultDatasetId"]).iterate_items())

At 45k domains/day, the bill runs $225/day or roughly $6,750/month. That's more than the old $600 VM bill in raw dollars, but the team reclaimed the 20% SRE time (worth ~$3,000/month in salary terms), eliminated the degrading parser, and raised coverage from 71% back to 97%. Net: a wash on cash, a material win on engineering velocity. At volume, a commodity-priced API almost always beats self-hosted scrapers once you include engineering time.

Schema-level gotchas

Date normalization. All timestamps are ISO 8601 UTC. .jp's JST-local, .cn's Beijing-local, and .ru's Moscow-local dates are converted. raw_whois preserves the original.
Status codes. RDAP defines 17 EPP status values. The actor passes them through verbatim in status. WHOIS-sourced records map to the nearest equivalent; non-mappable statuses are kept with a raw_ prefix.
Nameservers. Always uppercase, FQDN-normalized (trailing dot stripped).
dnssec field. Values are signedDelegation, unsigned, or unknown.
GDPR redaction. registrant.redacted is true when the source indicates redaction via RFC 9537 (RDAP's redaction extension) or the "REDACTED FOR PRIVACY" placeholder in WHOIS. redaction_reason is gdpr, privacy_service, or unknown.
.tk, .ml, .ga, .cf, .gq. Freenom-operated free TLDs have no RDAP, inconsistent WHOIS, and 24-48h propagation lag on registration events.

When this is not the right answer

You're doing <1,000 lookups per month. The whois binary still works for most gTLDs; below that volume the actor's pricing is overkill.
You need reverse WHOIS. Post-GDPR, this is functionally impossible from public data. WhoisXML and DomainTools offer pre-2018 historical snapshots — different product.
You need sub-100ms latency. The actor's p50 is 400-900ms. For interactive UIs, cache aggressively or use a low-latency provider.
You need privileged unredacted access. Law enforcement and CERTs want direct registrar contracts, not a public-API scraper.
You need bulk zone-file access. That's ICANN's CZDS program, not WHOIS/RDAP.

FAQ

Does the actor handle GDPR-redacted records correctly?

Yes. Every record has a registrant.redacted boolean. When true, the individual fields (name, email, phone) are null and redaction_reason tells you why. We don't try to de-redact via backchannel data lookups; that would be a compliance risk for you and for us.

What's the rate limit?

Apify-side, the actor supports up to 50 concurrent lookups per run by default. Upstream RDAP servers have their own rate limits (Verisign's .com RDAP tolerates roughly 20 req/sec per source IP; IANA's rate limits are documented on their status page). The actor rotates through an egress IP pool and respects upstream 429s with exponential backoff, so you generally don't hit these limits in practice.

Which TLDs don't support RDAP yet?

The Freenom family (.tk, .ml, .ga, .cf, .gq), .ru and .su (Russia hasn't transitioned), parts of .cn (partial RDAP, inconsistent data), and a long tail of small ccTLDs (.mz, .sd, .ve, a few dozen others). For all of these, the actor falls back to WHOIS parsing automatically. protocol_used tells you which path was taken.

How current is the data?

RDAP lookups are live — the actor hits the authoritative registry's RDAP server in real time. WHOIS lookups are similarly live. Cache TTL is 1 hour for unchanged records (keyed on domain + event hash); pass "no_cache": true to force a fresh fetch. At $0.005/lookup, the cache layer is the difference between a reasonable bill and a very unreasonable one.

Can I query IP addresses and ASNs, not just domains?

Yes. RDAP defines IP-range and ASN query paths. Pass "ips": ["8.8.8.8"] or "asns": [15169] in the input. Responses follow the same unified schema adapted for the IP/ASN object types.

What happens if an RDAP server is down?

The actor falls back to WHOIS automatically when RDAP returns a 5xx or times out. If both protocols fail, the record is returned with protocol_used: "failed" and an error field describing what went wrong, rather than being silently dropped.

Does it work for internationalized domain names (IDN)?

Yes. Punycode-encoded (xn--) and Unicode-encoded inputs both work; the response includes both forms. Some registries return only the ASCII form in RDAP; we normalize.

How does this compare to just running `whois` from an EC2 box?

For ad-hoc use, whois is still fine. For production pipelines, three things are changing: RDAP compliance has broken whois for an increasing fraction of gTLDs, registrar rate-limiting is aggressive against raw TCP/43 scrapers, and the engineering cost of maintaining a TLD-aware parser is rising faster than compute costs are falling.

Can I run this on Apify's free tier?

Yes. Apify's free tier includes $5/month of compute credit, enough for roughly 1,000 lookups. Beyond that it's pay-as-you-go at $0.005/lookup. See the actor page for current pricing.

What's next

If you like this actor, two related ones from the same pipeline often show up in the same domain-intelligence workflows:

company-data-aggregator — pulls corporate registration data (Companies House, SEC EDGAR, Handelsregister) and matches it against domain registrant data for entity-level graph building.
tranco-rank-lookup — returns the current Tranco top-list rank for any domain, useful for prioritizing investigation queues by popularity-weighted risk.

Conclusion

WHOIS was a 1982 protocol with a three-page spec and no schema. It survived in production for forty years mostly by inertia. ICANN's RDAP mandate, enforced from August 2024, is finally decommissioning it for gTLDs — good for everyone downstream as long as you update your code. The WhoisXML free tier and most cheap alternatives have simultaneously been gated or priced out of hobby use, leaving teams staring at broken pipelines and four-figure quotes.

The whois-replacement actor is built for that gap. RDAP-first, WHOIS fallback, one JSON schema across every TLD, $0.005 per lookup, raw-payload passthrough when you need it. If you have a domain-intelligence pipeline that broke quietly in 2024 and hasn't been fixed, this is the minimum-surface-area fix.

DEV Community

WHOIS Is Broken in 2026. Here's the RDAP-First Drop-In That Actually Returns JSON

WHOIS Is Broken in 2026. Here's the RDAP-First Drop-In That Actually Returns JSON

WHOIS was broken before it was broken

RDAP: what ICANN actually mandated

The WhoisXML squeeze and the market shift to paywalls

What a unified replacement should look like

The unified response schema

Old vs. new: the comparison table

Migration: the two-line change

Code examples

Python: bulk expiry-risk scan for a security research team

curl: single-domain lookup for a shell pipeline

Node.js: webhook-driven fresh-registration monitor

Python: RDAP-specific raw payload inspection

curl: checking an RDAP-only TLD vs. a WHOIS-only TLD in one call

Worked example: 50,000-domain expiry audit for a CTI team

Schema-level gotchas

When this is not the right answer

FAQ

Does the actor handle GDPR-redacted records correctly?

What's the rate limit?

Which TLDs don't support RDAP yet?

How current is the data?

Can I query IP addresses and ASNs, not just domains?

What happens if an RDAP server is down?

Does it work for internationalized domain names (IDN)?

How does this compare to just running `whois` from an EC2 box?

Can I run this on Apify's free tier?

What's next

Conclusion

Top comments (0)

WHOIS Is Broken in 2026. Here's the RDAP-First Drop-In That Actually Returns JSON

WHOIS was broken before it was broken

RDAP: what ICANN actually mandated

The WhoisXML squeeze and the market shift to paywalls

What a unified replacement should look like

The unified response schema

Old vs. new: the comparison table

Migration: the two-line change

Code examples

Python: bulk expiry-risk scan for a security research team

curl: single-domain lookup for a shell pipeline

Node.js: webhook-driven fresh-registration monitor

Python: RDAP-specific raw payload inspection

curl: checking an RDAP-only TLD vs. a WHOIS-only TLD in one call

Worked example: 50,000-domain expiry audit for a CTI team

Schema-level gotchas

When this is not the right answer

FAQ

Does the actor handle GDPR-redacted records correctly?

What's the rate limit?

Which TLDs don't support RDAP yet?

How current is the data?

Can I query IP addresses and ASNs, not just domains?

What happens if an RDAP server is down?

Does it work for internationalized domain names (IDN)?

How does this compare to just running whois from an EC2 box?

Can I run this on Apify's free tier?

What's next

Conclusion

How does this compare to just running `whois` from an EC2 box?