Jan-Willem Bobbink

Posted on Apr 29

LLM trackers are quietly breaking their users' own analytics

#llm #seo #geo #ai

And nobody in SEO is talking about it yet

There is a measurement problem sitting at the centre of the AI visibility industry, and the tools sold to solve AI search measurement are the ones causing it. The pollution is silent, it is structural, and it is showing up in exactly the dataset that GEO practitioners now rely on most.

This post lays out the mechanism, why it matters more than the rank tracker problem that came before it, and the simple product fix that would clear it up.

The mechanism

When you run daily prompt tracking across ChatGPT, Perplexity, Google AI Mode, Claude and the other LLM surfaces, the model sometimes decides it needs fresh information to answer the prompt. It fires off a retrieval-augmented generation request, often through a search index, and your pages get fetched. RAG systems work by injecting retrieved content into the LLM's context window before generation, anchoring the answer to external sources rather than relying purely on training data [1]. Not every tracked prompt triggers retrieval. Models cache, they reuse prior context, and they only ground when they judge it necessary. But when you are tracking hundreds or thousands of prompts per day across multiple engines, the cumulative volume of triggered fetches is significant.

Those fetches land in your server logs as bot hits. They inflate your crawl data. They dirty your content performance analysis. And they are caused by the tool you bought to measure your AI visibility in the first place.

The scale of legitimate AI crawler activity already makes this hard to disentangle. Cloudflare's 2025 Year in Review reports that AI "user action" crawling, the category that includes pages fetched in response to user prompts, grew more than 15 times across 2025 [2]. Botify's analysis of more than 7 billion log files found that OpenAI's combined crawl of the web tripled between August 2025 and March 2026, with OAI-SearchBot and GPTBot both at all-time highs [3]. Single Grain reported GPTBot traffic growing 305% between May 2024 and May 2025 [4]. Tracker-induced fetches are sitting on top of an already noisy baseline.

Why this matters more than it did two years ago

GA4 automatically excludes traffic from known bots and spiders, and according to Google's own documentation you cannot disable this filter or see how much was excluded [5]. If your AI visibility analysis lives in GA4, the pollution does not show up there in any obvious way, which is part of why the issue has stayed invisible.

But GA4 is not where the real SEO (or any of the new abbreviations!) work is happening anymore. Log file analysis is. As Search Engine Land notes, while crawl tools like SEMrush or Screaming Frog simulate bot behaviour, log files capture what crawlers actually do in real time, including for bots that GSC and GA4 will never report on [6]. That is the only honest record of what AI systems are doing on your site.

Tools like botsanalyser.com have made server log parsing accessible to any SEO without needing to set up a data pipeline from scratch. Practitioners are increasingly using logs to answer questions that GA4 cannot: which AI crawlers visit, how often, which pages they fetch, how deep they go, and how that behaviour correlates with citations and visibility in AI answers. Search Engine Land's recent coverage of log file analysis for AI crawlers explicitly frames logs as the closest substitute for the missing feedback loop in AI search, where impressions, clicks, and indexing data simply do not exist the way they do in traditional SEO [7].

This is the dataset where the signal lives for AI search optimisation. And this is exactly the dataset that LLM tracker traffic is contaminating.

A familiar pattern, with a worse outcome

This is not the first time the SEO industry has bought a measurement tool that quietly polluted its own data. Rank trackers did the same thing to Google Search Console for years. Every time a rank tracker checked position 37 for a keyword, Google counted an impression. The more keywords you tracked, the noisier your GSC impression data became.

The proof showed up clearly when Google stopped supporting the &num=100 parameter on 12 September 2025. Within days, GSC impressions dropped sharply across the industry, with some sites reporting declines of 20 to 50 percent. The "alligator effect" graphs that many SEOs had attributed to AI Overviews snapped shut almost overnight. Search Engine Land's analysis concluded that automated crawlers had been inflating impression counts, and that the post-change baseline reflected real user activity rather than scraper noise [8]. Smith Digital framed those vanished impressions as "ghost impressions" generated by machine activity that never represented a real human seeing a result [9].

Google itself later confirmed a separate logging error that had been over-reporting GSC impressions from 13 May 2025 onwards. The fix was rolled out in April 2026, almost a full year after the bug began [10]. Between rank tracker pollution and Google's own logging bug, GSC impression data was structurally unreliable for most of 2025.

The log file version of this same problem is worse for three reasons.

First, the noise is harder to identify. Rank tracker traffic in GSC was at least bundled into a single metric you could mentally discount. LLM tracker traffic in your logs arrives with rotating user agents, sometimes through Bing's infrastructure, sometimes through Google, sometimes direct from OpenAI or Anthropic. Seer Interactive has documented how stealth AI crawling, where bots reappear under generic browser headers and unrelated IPs, makes traditional bot detection unreliable [11]. There is no clean way to label this traffic after the fact.

Second, the noise is harder to filter. You cannot simply exclude a known IP range or user agent string. The same fetches that come from real LLM grounding for real user prompts arrive through the same infrastructure as the fetches caused by your tracker. They are mechanically identical from the server's perspective. Passion Digital flagged this exact problem when it noted that misidentifying bot traffic is one of the most common errors in LLM bot tracking, particularly because not all bots clearly identify themselves and user agent strings can be spoofed [12].

Third, the dataset is being used to drive decisions, not just reporting. Log data is feeding content prioritisation, internal linking strategy, technical SEO fixes for AI crawlers, and conversations with leadership about which AI surfaces are sending qualified bot traffic. Every one of those decisions is being made on top of polluted data.

What "polluted" actually looks like

Imagine a mid-sized site running daily prompt tracking on 500 prompts across five LLM engines. Even if only a fraction of those prompt executions trigger retrieval, you are looking at potentially hundreds of additional fetches per day attributable to the tracker, on top of organic AI crawler activity.

Those fetches will tend to cluster around the pages your tracker considers most relevant to the prompts you set up, which are the same pages you are trying to evaluate. So the pollution is not evenly distributed. It is concentrated on exactly the URLs you most want clean data for.

The result is that pages with strong tracked-prompt coverage look healthier in your log analysis than they actually are, and pages outside your tracked prompt set look quieter than they actually are. The measurement is structurally biased toward the prompts you chose to monitor.

This bias matters more given how skewed the underlying crawl-to-referral economics already are. Cloudflare's crawl-to-refer ratio metric, which compares how much a platform crawls versus how much referral traffic it sends back, showed Anthropic peaking at roughly 500,000 to 1 and OpenAI peaking at around 3,700 to 1 during 2025 [13]. Practitioners are already trying to read meaningful signal out of fetch volumes that dwarf any human traffic those platforms send back. Adding tracker noise on top of that makes the signal even harder to extract.

The fix is a product decision, but it requires more than one party

There is a path out of this for vendors who care about giving practitioners clean data, and it is worth being precise about who needs to do what. The architectural reality is that the page fetches that pollute server logs are not made by the tracker itself in the most common case. They are made by the LLM provider's own crawler in response to a prompt the tracker submitted to the API. The tracker can attach any header it likes to its API call, but that header does not propagate down into the RAG fetch the LLM subsequently fires off. So a header-only fix solves the wrong half of the problem.

A clean solution stacks three mechanisms.

The first is scheduling. Trackers should run prompts in a declared time window, ideally outside peak hours for their target audience, and publish that schedule. Practitioners can then filter logs by removing crawler hits during the declared window. This works without any LLM cooperation at all and is the easiest mitigation to deploy. It is not perfect because real users prompt at all hours, but it produces a meaningful baseline correction at very low cost.

The second is the tracker activity feed. LLM tracking platforms know exactly when each prompt was executed, against which engine, and in many cases they can infer or directly observe whether a retrieval call was triggered. A timestamped export of that activity, ideally as an API endpoint, with at minimum the timestamp, engine, prompt identifier, and where possible the URLs that were fetched as part of grounding, lets practitioners reconcile log entries against tracker activity with more precision than the time window alone allows.

The third is LLM cooperation, and this is the one that closes the loop. When a tracker calls the API, the LLM provider should mark the resulting RAG crawler fetches in a way that downstream log analysis can identify. This could be a custom User-Agent suffix on the OAI-SearchBot or ClaudeBot or PerplexityBot request, an extra HTTP header passed through from the originating API call, or a published list of IP ranges used specifically for API-originated retrieval. Without this, no amount of tracker discipline cleans up the actual problematic fetches, because the tracker is not the party making them.

The combination is what gets you clean data. The time window is the easy first cut, the activity feed is the cross-reference, and LLM cooperation is what makes the filtering precise. None of the three is technically difficult. All three are product decisions about whether to give practitioners visibility into a measurement system that currently obscures itself.

The harder question is whether LLM providers will cooperate. They have less commercial incentive than tracker vendors do, and they are the bottleneck on the cleanest part of the fix.

Why no vendor has shipped this yet

Probably because the easy parts make the product look smaller and the hard part requires someone else's cooperation. A tracker that shows you "your site was fetched 40,000 times by AI crawlers last month" reads differently than a tracker that shows you "your site was fetched 40,000 times, of which 12,000 were caused by us, leaving 28,000 organic AI crawler hits." The honest version is more useful. It is also less impressive on a dashboard.

There is also a competitive logic. The first tracker vendor to publish a schedule and an activity feed effectively concedes that their measurement creates noise. No vendor wants to be the first to admit that, even though every vendor in the category has the same problem. And LLM providers, who hold the cleanest part of the fix, have even less incentive: they get the value of training data and answer grounding from those crawls, and the cost of the noise lands on publishers and SEO practitioners, not on them.

What practitioners can do in the meantime

Until vendors ship a clean activity feed, there are partial mitigations worth considering. Time-of-day patterns can sometimes isolate tracker traffic if your tracker runs on a fixed schedule. Cross-referencing fetch volumes against your tracked prompt list can flag suspiciously consistent crawl patterns on covered URLs. Comparing log data from before and after enabling a tracker, on the same site, gives you a rough estimate of the baseline shift.

None of these are substitutes for vendor-provided data. They are workarounds for a problem that should not be the customer's to solve.

The bigger point

Every AI visibility report being published right now, including the ones used to set strategy at large brands, is sitting on top of log data that has been contaminated by the measurement tools themselves. The industry is making decisions on a dataset it has not properly cleaned, because the only people who can clean it have a commercial reason not to.

That is the real story. Not that LLM trackers are bad, they are useful, but that the standard practice of evaluating AI search performance from log data is currently broken in a way that vendors could fix tomorrow and have chosen not to. The easy parts sit with tracker vendors. The hardest and most important part sits with the LLM providers themselves.

The first tracker vendor to publish a schedule and an activity feed will, briefly, look like the one with the noisier product. They will also be the only one giving practitioners data they can actually trust at the tracker layer. The first LLM provider to mark API-originated RAG fetches will give the entire industry the missing piece. None of this is hard. All of it is overdue.

Not going to start the debate about involving LLMs themselves :)

Sources

[1] Firecrawl, "What is RAG grounding?" (accessed 29-04-2026), Firecrawl Glossary.
https://www.firecrawl.dev/glossary/web-search-apis/rag-grounding

[2] David Belson, "The 2025 Cloudflare Radar Year in Review: The rise of AI, post-quantum, and record-breaking DDoS attacks" (29-01-2026), Cloudflare Blog.
https://blog.cloudflare.com/radar-2025-year-in-review/

[3] Chris Long, "OpenAI Has Tripled Their Crawl of the Web: An Analysis of 7B+ Log Files" (23-04-2026), Botify Blog.
https://www.botify.com/blog/openai-tripled-web-crawl

[4] Single Grain, "Log File Analysis for Understanding AI Crawling Behavior" (28-12-2025), Single Grain Blog.
https://www.singlegrain.com/blog-posts/analytics/log-file-analysis-for-understanding-ai-crawling-behavior/

[5] Google, "[GA4] Filter incoming data: Known bot-traffic exclusion" (accessed 29-04-2026), Google Analytics Help.
https://support.google.com/analytics/answer/9888366

[6] Search Engine Land, "Log file analysis for SEO: Find crawl issues & fix them fast" (27-11-2025), Search Engine Land.
https://searchengineland.com/guide/log-file-analysis

[7] Lauren Busby, "Why log file analysis matters for AI crawlers and search visibility" (16-04-2026), Search Engine Land.
https://searchengineland.com/log-file-analysis-ai-crawlers-search-visibility-474428

[8] Search Engine Land, "Why Google Search Console impressions fell (and why that's good)" (23-10-2025), Search Engine Land.
https://searchengineland.com/why-google-search-console-impressions-dropped-interpret-data-463677

[9] Smith Digital, "Why Google Search Console Impressions Dropped in Sept 2025" (16-12-2025), Smith Digital Blog.
https://smithdigital.io/blog/google-search-console-impression-drop-sept-2025

[10] Danny Goodwin, "Google is fixing a Search Console bug that inflated impression counts" (03-04-2026), Search Engine Land.
https://searchengineland.com/google-search-console-bug-inflated-impression-counts-473530

[11] Seer Interactive, "Perplexity, Stealth AI Crawling, and the Impacts on GEO and Log File Analysis" (30-10-2025), Seer Interactive Insights.
https://www.seerinteractive.com/insights/perplexity-stealth-ai-crawling-and-the-impacts-on-geo-and-log-file-analysis

[12] Passion Digital, "Tracking LLMs Bots on Your Site using Log File Analysis" (15-07-2025), Passion Digital Blog.
https://passion.digital/blog/tracking-llms-bots-on-your-site-using-log-file-analysis/

[13] Cloudflare, "The crawl before the fall... of referrals: understanding AI's impact on content providers" (01-07-2025), Cloudflare Blog.
https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/

Top comments (3)

Caelean Barnes • Apr 30

@jbobbink excellent article! You're absolutely correct and we see this in our data.

One point you may not be aware of - structurally, fix #3 (LLM cooperation) cannot be achieved because many tools collect answers from LLMs not through the APIs, but through the web UI/frontends of the model providers (e.g. chatgpt.com). This makes the request structurally (and intentionally) indistinguishable from a real user.

We found that the results returned by the UI of the models differed significantly from the API responses. This is akin to serpapi.com/ vs. Google's official search APIs for rank tracking.

A bit of a cat and mouse game where we're unlikely to see real cooperation!

Jan-Willem Bobbink • Apr 30

Yeah, so the solution is two folded: we need timestamps and timeframe based tracking from the prompt trackers and secondly an additional overlay based on LLM data. Like we have currently in Search Console for Googlebot to a certain extend.

Caelean Barnes • Apr 30

@jbobbink agreed!