Phantom Squatting: When AI Hallucinated Domains Become Attacker Infrastructure

#security #llm #appsec #cybersecurity

The Attack Is Simpler Than You Think

Researchers at Palo Alto Networks Unit 42 documented a technique they're calling phantom squatting: attackers register domain names that LLMs consistently hallucinate, then sit back and wait for the traffic.

No zero-days. No exotic exploit chains. Just a spreadsheet of domains that AI tools confidently recommend — domains that never existed legitimately — and a registrar account.

When your AI coding assistant suggests you visit some-plausible-sounding-docs-site.io to read the official documentation, and that domain belongs to an attacker, you're one click away from a phishing page or a malware download. The LLM delivered it with full confidence. You had no reason to doubt it.

This is the real-world consequence of a known LLM failure mode being weaponized at scale.

How Phantom Squatting Actually Works

LLMs hallucinate URLs the same way they hallucinate package names, citations, and API endpoints — they pattern-match against training data to generate plausible outputs, not verified ones. A model that has seen thousands of documentation sites will confidently produce docs.sometool.dev or api.someservice.io even when those domains don't exist.

Phantom squatting operationalizes this in three steps:

Discovery — Attackers (or researchers, in this case) probe LLMs with common questions: "Where do I find the docs for X?", "What's the official API endpoint for Y?" They catalog domains the model consistently invents.
Registration — The attacker registers those hallucinated domains. Real infrastructure behind a name the LLM already trusts.
Weaponization — The domain serves phishing pages, drive-by malware, or credential-harvesting forms. The attacker needs zero SEO, zero ad spend, zero social engineering. The LLM does the referring.

The attack exploits a fundamental property of how these models work: they have no ground truth about whether a URL they generate is real. They're not lying — they genuinely don't know.

What makes this particularly nasty is the trust transfer. When a human recommends a sketchy domain, users apply skepticism. When an AI assistant does it in the middle of a helpful, accurate response, that skepticism largely evaporates.

What Existing Defenses Miss

Standard browser-level protections (Safe Browsing, reputation filters) catch known-bad domains. They're reactive — a domain has to be reported and processed before it hits a blocklist.

Phantom squatted domains are purpose-built for freshness. A freshly registered domain with no prior malicious activity scores clean on reputation checks. There's no phishing report yet. There's no VirusTotal hit. The domain looks, to every automated scanner, like a legitimate new site.

Network-level DLP and WAFs inspect traffic headers and payloads — they don't evaluate whether the recommendation that sent a user there was hallucinated.

And LLM output filtering, as typically implemented, looks for known-bad content in the response: PII, profanity, policy violations. It doesn't ask: does this URL actually exist, and is the package or domain real?

That's the gap.

Where Sentinel Catches This

Sentinel ships a dedicated package and domain hallucination detector called SlopScan. It runs as a separate service in the stack and covers exactly this attack surface.
SlopScan is in a public git repo and free to use.

For Pro+ tenants with slopscan_enabled=true, every scrub request has package and domain names extracted from the LLM output and checked against live registry data before the content reaches the user. The threat scoring pipeline and SlopScan run independently — a response can pass the prompt injection check and still get flagged for a hallucinated domain.

When SlopScan flags a hit, the risk levels map directly to actions:

DANGEROUS → blocked — confirmed malicious or known typosquat
SUSPICIOUS → flagged — not in registry, or zero trust score (a freshly registered phantom domain lands here)
CAUTION → reported — exists but has warning signals

A freshly registered phantom squatted domain — clean reputation, no prior reports, but simply not what the LLM implied it was — would surface as SUSPICIOUS because the registry check fails or returns an unverifiable new registration with no history.

What the Response Looks Like

Here's an illustrative example of what Sentinel returns when an LLM output recommends a domain that doesn't check out. The prompt and package name are constructed for demonstration; the response shape is accurate to Sentinel's actual API.

Scenario: An AI assistant responds to "where do I find the API docs for DataFlowKit?" with a URL like docs.dataflowkit-api.io — a domain the model invented.

{
  "request_id": "f7e3a21b-...",
  "security": {
    "action_taken": "clean",
    "threat_score": 0.06,
    "package_scan": {
      "action": "flagged",
      "hits": [
        {
          "name": "dataflowkit-api.io",
          "ecosystem": "web",
          "trust_score": 0,
          "risk": "SUSPICIOUS",
          "flags": ["not_in_registry"]
        }
      ]
    }
  },
  "safe_payload": "Here is the API documentation for DataFlowKit..."
}

Notice: action_taken is "clean" — the response contained no prompt injection, no jailbreak, no credential leak. The threat pipeline saw nothing wrong. But package_scan.action is "flagged", and the hit surfaces with trust_score: 0 and flags: ["not_in_registry"].

Your application needs to check both fields. Here's the minimal integration pattern:

import httpx

response = httpx.post(
    "https://sentinel.ircnet.us/v1/scrub",
    json={"content": llm_output, "tier": "standard"},
    headers={"X-Sentinel-Key": "sk_live_..."},
)

result = response.json()
security = result["security"]

# Primary threat check
if security["action_taken"] in ("blocked", "neutralized"):
    return serve_safe_payload(result["safe_payload"])

# Secondary: package/domain hallucination check
pkg_scan = security.get("package_scan", {})
if pkg_scan.get("action") in ("flagged", "blocked"):
    hits = pkg_scan.get("hits", [])
    suspicious = [h for h in hits if h["risk"] in ("SUSPICIOUS", "DANGEROUS")]
    if suspicious:
        # Don't serve the raw LLM output — surface a warning to the user
        return warn_user_about_unverified_domains(suspicious)

return serve_safe_payload(result["safe_payload"])

Enable SlopScan in the dashboard: Settings → toggle "SlopScan Package Scanning" (Pro+ and above).

The Takeaway

Phantom squatting is a reminder that LLM output filtering isn't just about what the model was told to do — it's about what the model confidently got wrong.

Reputation-based defenses are blind to it. Network filters don't see it coming. And standard LLM output guardrails weren't built for it.

One thing you can do today: if you're serving LLM-generated content that might contain URLs, package names, or tool recommendations, turn on SlopScan. It checks LLM outputs against live registries before they reach your users. A freshly registered phantom domain with zero trust history gets flagged before your user clicks it — not three days later when it hits a blocklist.

The LLM doesn't know it invented the domain. Your infrastructure should.

Sentinel is an AI firewall for LLMs and agentic systems. Get started at sentinel-proxy.skyblue-soft.com — the Starter tier is free, no credit card required.