We Captured the Network Traffic of ChatGPT, Gemini, and DeepSeek to Find Out Where Their "Sources" Come From

#ai #webdev #reverseengineering #seo

When an AI assistant answers a question and shows a block of "sources," it looks like the same thing everywhere: a list of links the model relied on. In reality, each system implements that block differently — its own transport, its own response format, its own fields the interface reads citations from. We dissected the network exchange of the web clients of three systems — ChatGPT, Gemini, and DeepSeek — and in parallel ran an identical set of queries through each of them 10 times, to understand both the technical anatomy of citation and what these systems actually cite.

Disclosure first: I'm the founder of RankCaster AI, a platform that manages brand visibility in AI answers. We study the category we operate in. To avoid grading our own homework, we excluded our own domain from every table before counting, and the limitations of the method are described in the full study. This article is the technical part: how citation actually works on the wire.

Why look at network traffic at all

The original question was a marketing one: if a site ranks in Google's or Bing's top-10, will it appear among the sources ChatGPT cites for the same query?

Short answer — almost never. Across 4 queries × 2 search engines × 3 AI systems (120 top-10 positions), we found 4 URL matches. That's 3.3%. The picture is bimodal: 8 of the 12 engine×AI pairs produced zero matches, and the remaining four produced exactly one URL each. All four matches were on Bing's side; Google's side had zero. ChatGPT had zero matches with either engine.

But to count matches correctly, you first have to understand what a "source" is in each system at the wire level. Otherwise you risk comparing entities that aren't comparable. That's how a marketing question turned into a devtools session across three platforms.

Method in brief: 4 English-language B2B queries about AI-mention monitoring tools, each run 10 times per system (web search enabled, logged-out sessions, all measurements on a single day). Citation stability was measured with APR (Answer Presence Rate) — in how many runs out of ten a source made it into the answer. Sources with APR ≥ 20% went into the tables. At N=10, the confidence interval for any point is roughly ±15–20 percentage points, so we rely on the qualitative shape of the picture, not point values.

One more caveat: none of the three systems publishes its internal response schemas. Everything below is observation of public network exchange and JSON structure. Decodings of obfuscated field names are hypotheses built on client behavior, not official documentation. Endpoint and header names were recorded in our sessions and may differ across builds, regions, and accounts.

ChatGPT: a citation is bound to a text fragment

Transport. The web client sends JSON POSTs to endpoints like /conversation and receives the answer as a Server-Sent Events stream. Everything lives on chatgpt.com, with three path prefixes: /backend-api (primary), /backend-alt, and /backend-anon. The last one serves logged-out usage — but "logged out" does not mean "unidentified": every request still carries a device identifier and Cloudflare/Sentinel tokens that tie the exchange to a specific device and session. The mode hides who you are as a user; it does not make your client indistinguishable to the platform.

Request flow. Before the main exchange, the client makes a preparatory request and receives a token (our working name: conduit_token). The message then goes to /conversation with two non-standard headers — behaviorally similar to the Sentinel family of mechanisms other researchers have described: the preparatory request issues a session-bound proof of client work, without which the main call fails.

In some sessions, the same preparatory request additionally demanded a Cloudflare Turnstile token — the anti-bot check arrived not as a separate challenge page but as a condition of the very request that issues conduit_token. Two defenses merged into one step.

Citations. Sources live in the annotations[] array, inside url_citation objects with url, title, start_ix, end_ix. The last two are offsets into the generated text — the boundaries of the answer fragment the source is attached to. By analogy with the public Responses API, where start_index/end_index are documented as UTF-16 code units, and since JavaScript strings are UTF-16 indexed, these offsets are almost certainly UTF-16 as well: text.slice(start_ix, end_ix) in a browser returns exactly the cited fragment. Practical consequence for anyone parsing this: emoji and some CJK characters occupy two units (surrogate pairs), and if you count bytes or code points, the citations drift.

The key takeaway on ChatGPT: a source is attached to a specific fragment of the answer, not to the answer as a whole. For a brand to land in url_citation, the model must use content associated with it while generating that particular fragment.

What it cites. On the conceptual query "What is GEO?", ChatGPT cited the arXiv paper 2311.09735 (Princeton/Columbia — the paper that introduced the term GEO) in all 10 runs. APR 100% — more stable than any marketing blog in our sample. Plus Wikipedia and narrowly specialized blogs. Overlap with the SEO top-10: zero URLs.

Gemini: a source catalog in arrays with obfuscated field names

Transport. A streaming response inside Wiz — Google's internal JavaScript framework that also powers Docs, Maps, and Photos. The batchexecute endpoint is the standard Wiz mechanism for batching remote calls. Base: https://gemini.google.com/_/BardChatUi/data/. For message sending we observed rpcid hNvQHb (rpcid values rotate between builds); on the server side, handlers of the BardFrontendService family.

Body format. Outside: application/x-www-form-urlencoded with two meaningful fields: f.req (payload) and at=<SNlM0e> (CSRF token). The f.req field is a JSON envelope [null,"<envelope-string>"] containing a JSPB/PBLite payload: a Protobuf message serialized as a JSON array, where a field is identified by position, not name. There are no field names on the wire at all — the obfuscation is precisely in the index layout. No public .proto definitions exist for this endpoint, so all field correspondences were derived empirically.

Citations. Sources are described by a set of short masked field names. Two layers need separating here: the presence of the fields in the stream is an observed fact; the meaning of each name is a hypothesis. Our working decodings:

sourceUrl — the source URL (the URL string itself is directly visible)
Mf — presumably the source title
SR — presumably a short summary
rs — presumably reliability_score, an internal domain-trust estimate
ls — presumably last_seen_date
y6 — presumably the quoted fragment itself
K1b — presumably a URL-validity flag
GK — a character range in the answer (functional analog of ChatGPT's start_ix/end_ix)
tM — merge type (values like MERGED appear on the wire)

The two-letter names admit other plausible readings (rs — ranking_signal? retrieval_score?), and without Google's internal documentation you can't choose definitively. But the qualitative conclusion doesn't depend on decoding accuracy: alongside every source, Gemini ships a family of internal signals correlated with authority. The existence of these fields matters more than the exact meaning of each abbreviation.

What it cites. Gemini systematically surfaces large marketing and SaaS domains (Semrush, HubSpot, Zapier) and products from its own category — on one query, four different URLs from a single competitor domain made the top. A curious detail: across all runs, not a single Google property made Gemini's top sources. And it was the Bing × Gemini pair that accounted for a noticeable share of all matches with the SEO top-10.

DeepSeek: sources as an appendix to sub-queries

DeepSeek is the most transparent of the three: the web client returns a search_results[] array bound to the sub-queries the system decomposes your question into. No offset arithmetic with surrogate pairs, no masked abbreviations — but its source-selection character is the most pronounced of all.

What it cites. DeepSeek lives on news outlets and press releases: TMCnet, MarketScreener, GlobeNewswire, B2B news networks — the layer generated by press-release distribution services. It was the only system that consistently cited Chinese-language sources (BusinessNext, Alibaba Cloud). And it produced the three points of maximum stability in our whole sample: one documentation subdomain (10/10) and two tool sites (10/10 each) that appear neither in the SEO top-10 nor in the other systems.

Three systems — three different models of a "source"

	ChatGPT	Gemini	DeepSeek
Transport	JSON + Server-Sent Events	Wiz / batchexecute, JSPB	JSON, `search_results[]`
Citation binding	to a text fragment (`start_ix`/`end_ix`)	to a range (`GK`) + field catalog	to a sub-query
Internal signals	not visible	field family (`rs`, `ls`, `K1b`…)	not visible
Favorite source types	academia, Wikipedia, niche blogs	large SaaS and marketing domains	press releases, news wires, docs
URL overlap with SEO top-10	0	spot matches (Bing only)	spot matches (Bing only)

Practical consequences:

"Optimize for Google and the AI will follow" does not work in the category we studied. 3.3% overlap, zero on Google's side. Each system selects sources by its own rules, which match neither SEO ranking nor each other.

For ChatGPT, what works is content the model wants to use in a specific fragment of its answer — fragment-level binding makes "general brand awareness" useless.

For Gemini, the fields suggest an internal domain evaluation exists. If the reliability_score hypothesis is right, a domain with a trust history beats an isolated lucky article.

For DeepSeek, the distribution channel is press releases and news wires — which SEO folks have long written off as a junk channel.

Limitations

Honestly and as a list: all 4 queries come from a single product category (our own — the sampling bias is declared); we wrote the query phrasings ourselves rather than taking them from an external registry; N=10 runs gives ±15–20 pp per point; the measurement is a single-day snapshot, and all three web clients update constantly, so specific field and endpoint names are a snapshot, not canon. Don't transfer the conclusions automatically to other categories.

The full study — all tables for the 4 queries, methodology, source typology — is here: Source Overlap Between Search Engines and AI Recommendations.

If your devtools sessions show different field names, or you have data from other query categories — I'd love to compare notes in the comments.