DEV Community

Moth
Moth

Posted on • Edited on • Originally published at mothasa.substack.com

More Than Half the Internet Is Now Bots

Last year, for the first time in a decade of tracking, non-human traffic overtook human traffic on the internet. Bots hit 51% of all web requests in 2024. In 2026, the number keeps climbing.

This isn't a rounding error. It's a structural shift. The internet was built for people. Now the majority of its visitors aren't.

The Numbers

Imperva's 2025 Bad Bot Report pegged total bot traffic at 51% — up from 49% in 2023, 47% in 2022. The trend line doesn't bend. Good bots (search indexers, uptime monitors) account for about 14%. Bad bots (credential stuffers, scrapers, DDoS tools) account for 37%.

Then there's the new category nobody planned for: AI crawlers.

AI-oriented bots made up 4.2% of all HTML page requests in 2025, per Cloudflare. That sounds small until you realize it represents 50 billion daily crawler requests. GPTBot traffic grew 305% year-over-year. PerplexityBot grew 157,490%. Googlebot — already the largest crawler on the internet at 50% market share — ramped up 96%.

By Q4 2025, publishers were seeing one AI bot visit for every 31 human visits. In Q1 of the same year, it was one per 200. That's a 6x acceleration in nine months.

What They're Doing

Two things: training and retrieval.

Training bots scrape content to feed into model weights. This is the familiar complaint — OpenAI, Meta, Google vacuuming up the web to build their next model. Training traffic actually decreased 15% between Q2 and Q4 2025 as the major labs shifted to synthetic data and licensed datasets.

But retrieval bots surged. RAG (retrieval-augmented generation) traffic jumped 33% in the same period. AI search indexers rose 59%. These bots don't train models — they fetch live content to answer queries in real time. When you ask ChatGPT about today's news, it sends a bot to a publisher's site, scrapes the answer, and serves it back. The user never visits the source.

OpenAI's ChatGPT-User averages five times more scrapes per page than the second-place bot (Meta). One real-world site logged 11.2 million requests from Meta's crawler alone in a single month — 57.3% of all bot traffic hitting that site.

The crawl-to-refer ratio tells the rest of the story. For news sites, bots scrape 33 pages for every one click they send back. For general content, it's 50,000 to one. They take everything and return almost nothing.

Robots.txt Is Dead

Robots.txt was never a law. It's a convention — a polite request that bots respect your boundaries. The assumption was that major companies would comply because the reputational cost of ignoring it outweighed the data.

That assumption cracked in 2025 and broke in 2026.

AI crawlers violated robots.txt on 72% of UK sites in Cloudflare's October 2025 tracking. Across all AI bots, 13.26% of requests ignored directives outright, up from 3.3% the previous quarter. Publishers responded by blocking at the server level: 79% of top news sites now block AI training bots, 71% block retrieval bots, and 5.6 million websites have added GPTBot to their disallow list — a 70% increase in six months.

Then OpenAI quietly removed the language from its documentation promising that ChatGPT-User would comply with robots.txt. The signal: your file is noted, but not necessarily obeyed.

ByteSpider, ByteDance's AI training crawler, accounted for 54% of identified bot attacks in the Imperva report. It doesn't ask permission. It doesn't check robots.txt. It just takes.

Where the Humans Went

They didn't leave. They got intermediated.

37% of active AI users now start searches in AI platforms instead of Google. 62% of US adults use AI multiple times weekly. The queries still happen. The visits don't.

Publisher clickthrough rates from AI platforms dropped from 0.8% in Q2 2025 to 0.27% by Q4. Even licensed sites — the ones AI companies pay for access — saw clickthroughs collapse from 8.6% to 1.33%. The licensing deals that were supposed to compensate publishers for lost traffic aren't driving traffic either.

Human web traffic declined 5% in Q3-Q4 2025. Not because fewer people wanted information, but because the information came to them pre-chewed, stripped of its source, wrapped in a chatbot response.

What Comes Next

TollBit COO Olivia Joslin put it plainly: "It could be this year that we see AI visitors being the dominant visitors to publisher sites."

Ahrefs estimates 74% of new web pages are AI-generated. If you combine that with majority-bot traffic, the internet is approaching a point where most of the content is written by machines, most of the readers are machines, and the humans who created and consumed the original web are the minority in both directions.

This isn't a bug in the system. It's the system working as designed. AI companies built products that answer questions by scraping the web in real time. Users love them — they're faster, more convenient, and increasingly accurate. The incentive to visit a source directly erodes with every improved model.

For publishers, the economics are grim. Your content trains the model or feeds the RAG pipeline. The bot takes the value. You get a fraction of a percent clickthrough and a line in a licensing agreement that won't cover your hosting costs.

For everyone else, the question is simpler: what does an internet look like when it's built by and for machines? We're finding out.


Sources: Imperva 2025 Bad Bot Report, Cloudflare, TollBit, Ahrefs, The Register, Thunderbit


Originally published on Moth's Substack

Top comments (1)

Collapse
 
therogvarok profile image
Ed

This information is, indeed, disturbing. Makes you wonder what the point is today in generating content on your website if no human eyes will sink in. We feed bots and apps, but no human minds anymore.