<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tommy</title>
    <description>The latest articles on DEV Community by Tommy (@tommy2970).</description>
    <link>https://dev.to/tommy2970</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4002867%2F9e394501-bf03-4d43-983f-1f5e1f9c94a0.png</url>
      <title>DEV Community: Tommy</title>
      <link>https://dev.to/tommy2970</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tommy2970"/>
    <language>en</language>
    <item>
      <title>What actually visits a self-hosted website in 2026? Humans, AI crawlers, and 6,400 automated attacks</title>
      <dc:creator>Tommy</dc:creator>
      <pubDate>Thu, 25 Jun 2026 18:52:00 +0000</pubDate>
      <link>https://dev.to/tommy2970/what-actually-visits-a-self-hosted-website-in-2026humans-ai-crawlers-and-6400-automated-attacks-d6p</link>
      <guid>https://dev.to/tommy2970/what-actually-visits-a-self-hosted-website-in-2026humans-ai-crawlers-and-6400-automated-attacks-d6p</guid>
      <description>&lt;p&gt;I run a small self-hosted website on a Raspberry Pi 4B at home.&lt;br&gt;
A few weeks ago I started wondering: who actually visits a website in 2026?&lt;br&gt;
Not just humans. Everything.&lt;br&gt;
So I built a public observability dashboard on top of GoAccess that separates traffic into four categories: human visitors, search engine crawlers, AI retrieval agents, and automated attacks.&lt;br&gt;
The numbers from the last 17 days surprised me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4,523 human visits&lt;br&gt;
6,409 automated attack attempts&lt;br&gt;
Thousands of crawler requests from search engines and AI systems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The attacks aren't sophisticated. They're mostly automated scanners probing for .env files, WordPress admin panels, and cloud credentials — hitting every public IP on the internet regardless of what's actually running there.&lt;br&gt;
What I found more interesting was the AI agent behavior.&lt;br&gt;
AI retrieval agents (GPTBot, ClaudeBot, PerplexityBot, Amazonbot) behave differently from traditional search crawlers. They hit semantic files aggressively — llms.txt, sitemap.xml, JSON-LD structured data — and seem to index the knowledge graph structure of a site rather than individual pages. Within hours of publishing new content, multiple AI crawlers had already visited, apparently triggered by the sitemap update rather than any external link.&lt;br&gt;
A few observations I didn't expect:&lt;/p&gt;

&lt;p&gt;Combined machine traffic consistently exceeds human traffic&lt;br&gt;
AI agents discovered new content faster than Google did&lt;br&gt;
The semantic structure exposed by the site seems almost as important as the content itself&lt;br&gt;
Even a Pi on a residential ISP receives constant automated scans (380+ attempts/day average)&lt;/p&gt;

&lt;p&gt;I made the dashboard public because I think the machine side of the web is underobserved.&lt;br&gt;
The modern web feels less like "users visiting pages" and more like a parallel ecosystem of crawlers, AI agents, and automated systems running continuously alongside human visitors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.tourl"&gt;stats.lake8.dev/geo.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two questions:&lt;br&gt;
Are others tracking AI agents separately from traditional search crawlers?&lt;br&gt;
Has anyone else noticed AI retrieval systems indexing semantic structure (JSON-LD, llms.txt) faster than they index page content?&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>webdev</category>
      <category>raspberrypi</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
