Andre 'Dezo' Queiroz

Posted on Jun 10

I parsed my own firewall logs and found which AI tools my org was really talking to — including one routing data to China

#ai #security #devops #typescript

I pointed a scanner I wrote at my own network traffic for one afternoon. It came back with eight AI services I'd never sanctioned, running quietly in the background. One of them was DeepSeek, which routes data to servers in China.

No alert fired. No DLP rule tripped. Nothing in the stack had flagged any of it — because the traffic looked exactly like what it was: ordinary HTTPS to legitimate-looking domains.

That gap — between "we have an AI policy" and "we can prove what's actually running" — is the whole problem with Shadow AI. This is how I closed part of it from logs I already had, the one matching detail most detectors get wrong, and the blind spot I won't pretend my tool doesn't have.

Why your DLP probably can't see this

The uncomfortable consensus in 2026 security circles: pattern-based DLP and CASB are structurally blind to Shadow AI. An employee pasting Q3 financials into a chatbot exposes sensitive data as a natural-language conversation over HTTPS to a legitimate host. There's no signature to match, no file to quarantine, no rule to trip.

And the cost isn't theoretical. From IBM's Cost of a Data Breach 2025:

Breaches involving Shadow AI cost ~$670,000 more on average.
20% of organizations reported a breach because of Shadow AI.
97% of orgs breached through AI lacked proper AI access controls.
63% had no AI governance policy at all.

Gartner puts a date on it: by 2030, more than 40% of organizations will hit a security or compliance incident caused by unauthorized AI.

So the problem isn't awareness anymore. It's instrumentation. You can't govern what you can't enumerate — and most teams genuinely cannot list which AI tools touch their network.

What you can see: egress logs

Here's the part that's easy to miss: you don't need to MITM anyone's TLS to know which domains a machine talked to. Your firewall, proxy, and DNS logs already record the destination host on every connection. That's enough to build an inventory — which AI services, used by how many people, how often.

The shape of the thing is a pipeline of pure functions (simplified, illustrative — this isn't the real source):

// adapters parse vendor log formats into one canonical shape
type CanonicalEvent = {
  timestamp: string
  actor: string        // user or source IP
  destinationDomain: string
}

// classify against a catalog of known AI providers
type Finding = {
  tool: string         // "deepseek", "openai-chatgpt", ...
  riskTier: "high" | "medium" | "low"
  users: number
  events: number
}

const inventory = render(
  attachCompliance(
    aggregate(
      classify(events, catalog) // events: CanonicalEvent[]
    )
  )
)

No cloud, no telemetry, the log file never leaves the host. The interesting parts aren't the plumbing — they're two decisions that separate a credible inventory from a wall of noise.

The one matching detail that breaks naive detectors: eTLD+1

The naive way to classify a domain is a substring or regex match: does the hostname contain openai.com? It fails in both directions.

False positives: openai.com.phishy.ru and notopenai.com both contain the string openai.com.
False negatives: real OpenAI traffic also hits oai.azure.com, chat.openai.com, cdn.oaistatic.com. A check tuned to one string misses the rest.

The fix is to resolve every hostname down to its registrable domain — the eTLD+1 — using the Public Suffix List, then match on that:

import { getDomain } from "tldts"

getDomain("chat.openai.com")        // "openai.com"   matches
getDomain("api.openai.com")         // "openai.com"   matches
getDomain("openai.com.phishy.ru")   // "phishy.ru"    does NOT match

chat. and api. collapse to the same registrable domain; the lookalike collapses to someone else's. It's boring, and it's the difference between an inventory an auditor trusts and one they laugh out of the room. No fuzzy matching, no regex zoo — just the PSL doing exactly one honest job.

Never report a compliance fact you didn't observe

Once the scanner knows DeepSeek is in use, the temptation is to dump its entire catalog profile: no BAA, trains on input, data stored in China.

I don't let it. The rule I enforce is narrow on purpose: the scanner only asserts a compliance fact scoped to the exact tier of hostname it actually saw. If it observed a consumer host, it reports the consumer posture — it will not invent an enterprise claim it never witnessed, and it will print unknown long before it prints compliant.

// scoped to what was observed — never inferred upward
attachCompliance(finding, observedHost)
//  saw chatgpt.com (consumer)  -> report consumer posture
//  never assert an enterprise "no-BAA" fact you didn't see
//  "unknown" is a valid answer; "compliant" is not a guess

This sounds pedantic until you remember IBM's 97%: the access-control gap is real, but a tool that over-claims to dramatize it is just FUD — and security people smell FUD instantly. A scanner that under-claims on purpose is the one you can hand to an auditor without flinching.

It's also why the "data stored in China" line for DeepSeek in my report isn't my editorializing — it's pulled from DeepSeek's own published privacy policy, with the source attached to the finding. Germany's data protection regulator reached the same conclusion and ruled the transfer unlawful in June 2025.

The blind spot I won't pretend isn't there

Network-log detection has a hard boundary, and the vendor blogs that promise "100% discovery" quietly skip it: if someone runs a local model — Ollama on their laptop talking to localhost — there is no egress, no domain, nothing for my scanner to see. And local models aren't a corner case: researchers found 175,000 exposed Ollama servers across 130 countries in early 2026.

Two more honest limits:

DoH/DoT erode DNS visibility — if resolution is encrypted to a third party, your DNS logs go dark.
Seeing payloads (not just destinations) needs TLS inspection — which is a MITM with its own failure modes that plenty of teams refuse, for good reasons.

So this is one layer. It catches the cloud-egress cases — which is most Shadow AI today — and it's honest about the rest. Anyone selling you total coverage is selling you the exact thing security folks trust least.

The takeaway

Shadow AI isn't dangerous because the tools exist. It's dangerous because when someone asks "what's running, who approved it, and where does the data go?", most orgs can't answer. The fix isn't another block list — it's enumeration you can defend.

You can build the cloud-egress half of that today, offline, from logs you already collect. The hard parts aren't the pipeline — they're the matching (resolve to eTLD+1, refuse fuzzy guesses) and the honesty (report only what you observed). Get those two right and you have an inventory worth handing to an auditor.

If someone asked you for that inventory this afternoon, how long would it take you to produce it?

I'm André Queiroz, building dezotech — tooling that helps regulated teams prove their AI is governed. The scanner above runs fully offline; the findings in this post are from a real run on my own traffic.

Top comments (1)

xulingfeng • Jun 11

eTLD+1 matching is the kind of boring detail that separates a real inventory from a filter list that just looks the part. The 'never report a compliance fact you didn't observe' rule should be printed on every security tool's splash screen. A scanner that under-claims on purpose is the only one worth handing to an auditor.