<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kishan Savaliya</title>
    <description>The latest articles on DEV Community by Kishan Savaliya (@kishansavaliya).</description>
    <link>https://dev.to/kishansavaliya</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F540703%2Ffbf65baa-efd0-4769-bf7f-39d48163876a.png</url>
      <title>DEV Community: Kishan Savaliya</title>
      <link>https://dev.to/kishansavaliya</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kishansavaliya"/>
    <language>en</language>
    <item>
      <title>How 77% of a Magento Store's Traffic Turned Out to Be Bots — and the nginx Fix That Stopped It</title>
      <dc:creator>Kishan Savaliya</dc:creator>
      <pubDate>Wed, 27 May 2026 13:54:23 +0000</pubDate>
      <link>https://dev.to/kishansavaliya/how-77-of-a-magento-stores-traffic-turned-out-to-be-bots-and-the-nginx-fix-that-stopped-it-17g1</link>
      <guid>https://dev.to/kishansavaliya/how-77-of-a-magento-stores-traffic-turned-out-to-be-bots-and-the-nginx-fix-that-stopped-it-17g1</guid>
      <description>&lt;p&gt;A store owner pinged me with a worrying screenshot: Google Analytics showed a sudden spike of "active users," almost all from a single country. Their&lt;br&gt;
  first guess was a viral moment. It wasn't. It was a bot flood — and once I pulled the server logs, the numbers were brutal: &lt;strong&gt;77% of all incoming &lt;br&gt;
  requests were automated traffic&lt;/strong&gt; hammering the site.&lt;/p&gt;

&lt;p&gt;Here's exactly how I diagnosed it and shut it down, with the config you can reuse.&lt;/p&gt;

&lt;p&gt;## Step 1: Read the actual logs, not the dashboard&lt;/p&gt;

&lt;p&gt;Analytics dashboards lie to you about bots, because many bots &lt;em&gt;execute JavaScript&lt;/em&gt; and show up as real "users." The truth is in the web server logs. I&lt;br&gt;
  pulled the last 15 minutes of nginx access logs and aggregated by client IP and user-agent.&lt;/p&gt;

&lt;p&gt;The pattern was instantly obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~77% of requests&lt;/strong&gt; came from one cloud-hosting IP range (a data-center, not real humans), rotating &lt;strong&gt;fake Chrome user-agents&lt;/strong&gt; — including Chrome
version numbers that &lt;em&gt;don't exist yet&lt;/em&gt;. That's a dead giveaway.&lt;/li&gt;
&lt;li&gt;The rest were crawlers stuck in &lt;strong&gt;infinite pagination crawl-traps&lt;/strong&gt; like &lt;code&gt;/blog/tag/x/page/11410&lt;/code&gt; and &lt;code&gt;?p=6095&lt;/code&gt; — URLs that should never have been
generated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;## Step 2: Understand &lt;em&gt;why&lt;/em&gt; a firewall won't help (the part people get wrong)&lt;/p&gt;

&lt;p&gt;The owner's instinct was "just block the IP with a firewall." But the site sits behind &lt;strong&gt;Cloudflare&lt;/strong&gt;, and that changes everything:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you're behind a CDN/proxy, your origin server only sees the &lt;strong&gt;CDN's&lt;/strong&gt; IP addresses at the network layer. The real visitor IP lives in the&lt;br&gt;
  &lt;code&gt;X-Forwarded-For&lt;/code&gt; HTTP header — which &lt;code&gt;iptables&lt;/code&gt; (a layer-3/4 firewall) &lt;strong&gt;cannot read&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So &lt;code&gt;iptables -s &amp;lt;botIP&amp;gt; -j DROP&lt;/code&gt; blocks &lt;em&gt;nothing&lt;/em&gt;. The block has to happen where the real IP is visible: at the CDN, or in &lt;strong&gt;nginx&lt;/strong&gt;, which &lt;em&gt;can&lt;/em&gt; parse&lt;br&gt;
  the header. nginx is also the perfect place because a &lt;code&gt;403&lt;/code&gt; there is served instantly — &lt;strong&gt;before&lt;/strong&gt; the heavy application (Magento/PHP) ever boots.&lt;/p&gt;

&lt;p&gt;## Step 3: The nginx rules&lt;/p&gt;

&lt;p&gt;Two &lt;code&gt;map&lt;/code&gt; blocks (HTTP context) read the real client IP from &lt;code&gt;X-Forwarded-For&lt;/code&gt; and flag bad traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;  &lt;span class="c1"&gt;# Real client IP behind a CDN is the FIRST token of X-Forwarded-For&lt;/span&gt;
  &lt;span class="k"&gt;map&lt;/span&gt; &lt;span class="nv"&gt;$http_x_forwarded_for&lt;/span&gt; &lt;span class="nv"&gt;$bad_ip&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kn"&gt;default&lt;/span&gt;                       &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Non-compliant / spoofed crawlers&lt;/span&gt;
  &lt;span class="k"&gt;map&lt;/span&gt; &lt;span class="nv"&gt;$http_user_agent&lt;/span&gt; &lt;span class="nv"&gt;$bad_ua&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kn"&gt;default&lt;/span&gt;                       &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kn"&gt;"~*&lt;/span&gt;&lt;span class="s"&gt;(Bytespider|PetalBot|MJ12bot|DotBot)"&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;   
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, inside the &lt;code&gt;server&lt;/code&gt; block, drop them before PHP and kill the pagination crawl-trap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$bad_ip&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$bad_ua&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# No listing has thousands of pages — anything past page 9 is junk.&lt;/span&gt;
  &lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="p"&gt;~&lt;/span&gt; &lt;span class="sr"&gt;"/page/[1-9][0-9]+/?$"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;410&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth calling out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;410 Gone&lt;/code&gt; beats &lt;code&gt;404&lt;/code&gt;&lt;/strong&gt; for crawl-traps — it tells well-behaved crawlers (Googlebot, GPTBot) to &lt;em&gt;permanently&lt;/em&gt; drop the URL.&lt;/li&gt;
&lt;li&gt;Don't block the good bots. I left &lt;strong&gt;Googlebot and ClaudeBot untouched&lt;/strong&gt; (the site's &lt;code&gt;robots.txt&lt;/code&gt; allows them) and only blocked the spoofed/abusive
ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate before reloading.&lt;/strong&gt; Always &lt;code&gt;nginx -t&lt;/code&gt; first, then &lt;code&gt;nginx -s reload&lt;/code&gt; (zero downtime). One unquoted &lt;code&gt;{2,}&lt;/code&gt; regex in a &lt;code&gt;location&lt;/code&gt; will take
your whole site down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;## Step 4: The result&lt;/p&gt;

&lt;p&gt;After the reload, I sampled live traffic across a 10-minute window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~95% of requests&lt;/strong&gt; were now served as cheap &lt;code&gt;403&lt;/code&gt;/&lt;code&gt;410&lt;/code&gt; responses &lt;em&gt;by nginx&lt;/em&gt;, never reaching Magento.&lt;/li&gt;
&lt;li&gt;PHP/database load from the flood &lt;strong&gt;dropped to zero&lt;/strong&gt; — the server breathed again.&lt;/li&gt;
&lt;li&gt;Real users, Googlebot, and legitimate crawlers were completely unaffected.&lt;/li&gt;
&lt;li&gt;Within minutes the attacker's volume began to taper (bots back off once they keep hitting walls).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The durable follow-up is to push the same block up to &lt;strong&gt;Cloudflare's WAF&lt;/strong&gt; (a rule on the hosting ASN + Bot Fight Mode), so the junk is dropped at the&lt;br&gt;
  edge and never even reaches the origin as a &lt;code&gt;403&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;## The takeaway&lt;/p&gt;

&lt;p&gt;If your store's analytics suddenly balloon with traffic from one country or one network and your bounce rate looks weird, &lt;strong&gt;check your server logs &lt;br&gt;
  before you celebrate.&lt;/strong&gt; A surprising amount of "traffic" is bots that inflate your numbers, burn your server resources, and — on Magento — can flood&lt;br&gt;
  your customer table with fake registrations. The fix is usually cheap and fast &lt;em&gt;if&lt;/em&gt; you block at the right layer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Kishan Savaliya, an Adobe-Certified Magento &amp;amp; Hyvä developer. I help store owners with exactly this kind of thing — performance, security, and &lt;br&gt;
  clean code. If your store feels slow or you're seeing strange traffic, you can find me and what I do at &lt;br&gt;
  &lt;a href="https://kishansavaliya.com" rel="noopener noreferrer"&gt;kishansavaliya.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>magento</category>
      <category>webdev</category>
      <category>nginx</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
