Benyamin Khalife

Posted on Jun 8

How I Stopped Counting Bots as Visitors

#php #security #webdev #opensource

A few months ago I was looking at the analytics on one of my projects. The numbers looked decent — hundreds of daily visits, decent traffic from search. But something felt off. The server logs told a completely different story.

Half of those "visitors" were scanners probing for .env files. A quarter were bots hammering /wp-login.php. Maybe ten percent were actual humans.

Google Analytics had no idea. It was counting everything.

That's the problem I wanted to fix.

The gap nobody talks about

Every analytics tool I know of works the same way: a JavaScript snippet fires when a page loads, and the visit gets counted. The problem is that bots, scrapers, and scanners don't run JavaScript — but they still hit your server, and your server-side analytics still records them.

Some tools try to filter bot traffic after the fact, using lists of known bot user-agents or behavioral heuristics. But these lists are always behind, always incomplete, and never aware of the specific threats targeting your application.

I already had a firewall — xZeroProtect — running on my projects. It was blocking scanners, rate-limiting aggressive IPs, and verifying crawlers via double-DNS. It knew, with high confidence, which requests were real humans.

The insight was simple: if the firewall already knows who's a real visitor, why not record that?

How it works

In xZeroProtect, every request passes through a chain of checks before it reaches your application:

Incoming request
       │
  Whitelisted? ──────────────────────────► Pass through
  Verified crawler (Googlebot etc.)? ────► Pass through  
  Banned IP? ────────────────────────────► Block
  Rate limit exceeded? ──────────────────► Block
  Suspicious path? ──────────────────────► Block
  Bad User-Agent? ────────────────────────► Block
  Payload attack (SQLi, XSS...)? ─────────► Block
       │
  All checks passed ─────────────────────► Real visit ✓

Any request that reaches the bottom has survived every check. That's the right moment to record a visit — not before, not after.

The API is intentionally simple. You pass a closure to enableTracking(), and it fires for every verified real visit:

use Webrium\XZeroProtect\XZeroProtect;
use Webrium\XZeroProtect\VisitInfo;

$firewall = XZeroProtect::init();

$firewall->enableTracking(function (VisitInfo $visit) {
    // store however you like — the library doesn't care
    $pdo->prepare("INSERT INTO visits ...")
        ->execute($visit->toArray());
});

$firewall->run();

The library never touches your database. It hands you a VisitInfo object and gets out of the way.

What VisitInfo gives you

The $visit object carries everything you need, parsed and ready:

$visit->ip              // '94.182.11.42'
$visit->path            // '/blog/my-post'
$visit->method          // 'GET'
$visit->referer         // 'https://google.com'
$visit->timestamp       // 1749388800
$visit->date()          // '2026-06-08 14:30:00'

// Device info — parsed from User-Agent, no external service
$visit->device->browser         // 'Chrome'
$visit->device->browserVersion  // '124.0'
$visit->device->os              // 'Windows'
$visit->device->osVersion       // '10/11'
$visit->device->type            // 'desktop' | 'mobile' | 'tablet'
$visit->device->isMobile        // false

// Unique visitor fingerprint
$visit->fingerprint     // 'a3f8c2...' (64-char SHA-256 hash)

// Flat array — ready for a direct DB insert
$visit->toArray()

The device detection is built in — no third-party service, no API call, just a User-Agent parser that covers Chrome, Firefox, Safari, Edge, Opera, Samsung Internet, IE, and all major operating systems.

The fingerprint

This is the part I'm most happy with.

Traditional unique visitor tracking either uses cookies (which require consent banners and get cleared) or stores raw IPs (which is a privacy problem). I wanted something in between.

The fingerprint is a SHA-256 hash of three things: the visitor's IP address, their User-Agent string, and today's date.

$raw = implode('|', [
    $request->ip,
    $request->userAgent,
    date('Y-m-d'),   // resets daily
]);

$fingerprint = hash('sha256', $raw);

This means:

The same person visiting twice today gets the same fingerprint — you can deduplicate
Tomorrow their fingerprint is different — no persistent cross-session tracking
The raw IP is not stored in the fingerprint — it cannot be reversed
No cookies, no JavaScript, no consent required

It's not perfect — two people on the same NAT with the same browser will collide — but for the purpose of counting unique daily visitors it's good enough, and it respects privacy by design.

Counting unique visitors becomes a simple query:

$firewall->enableTracking(function (VisitInfo $visit) use ($pdo) {
    // Only record the first visit of the day for each fingerprint
    $seen = $pdo->prepare(
        "SELECT 1 FROM visits 
         WHERE fingerprint = ? AND DATE(visited_at) = CURDATE()"
    )->execute([$visit->fingerprint])->fetchColumn();

    if (!$seen) {
        $pdo->prepare("INSERT INTO visits ...")
            ->execute($visit->toArray());
    }
});

Why opt-in, and why a closure?

Two deliberate design decisions worth explaining.

Opt-in: Tracking is disabled by default. You call enableTracking() to turn it on. This keeps the library's core purpose — protecting your application — separate from the analytics concern. If you don't need tracking, you pay zero cost for it.

Closure instead of configuration: I could have designed this as a config option with a built-in storage backend. But that would mean the library needs to know about your database, your schema, your connection. Instead, you own the storage completely. Want to write to MySQL? Redis? A log file? A third-party analytics API? The library doesn't care.

// Write to database
$firewall->enableTracking(fn(VisitInfo $v) => $db->insert('visits', $v->toArray()));

// Write to a log file
$firewall->enableTracking(fn(VisitInfo $v) => 
    file_put_contents('/var/log/visits.log', json_encode($v->toArray()) . "\n", FILE_APPEND)
);

// Send to an external service
$firewall->enableTracking(fn(VisitInfo $v) => 
    Http::post('https://my-analytics.example.com/ingest', $v->toArray())
);

Same API, any storage.

Errors never reach your visitors

One more thing: the callback runs inside a try/catch.

private function recordVisit(Request $request): void
{
    if (!$this->trackingEnabled || $this->visitorCallback === null) {
        return;
    }

    try {
        ($this->visitorCallback)(new VisitInfo($request));
    } catch (\Throwable) {
        // Tracking must never crash the application
    }
}

If your database is down, if your callback throws, if anything goes wrong — the visitor still sees your page. Tracking is infrastructure, and infrastructure fails. The firewall's job is to protect your application; it shouldn't become a new point of failure.

The result

After running this for a while, the difference is striking. My "real" visitor count is about 40% of what Google Analytics was reporting. The other 60% was noise — bots, scanners, crawlers, and monitoring tools that JavaScript analytics was happily counting as humans.

The data is smaller, but it's accurate. And because the firewall is already running, there's no extra overhead — the tracking happens as a side effect of protection that was already in place.

If you want to try it:

composer require webrium/xzeroprotect

The full API reference and configuration docs are on GitHub. There's also a WordPress plugin if you want the dashboard out of the box.

DEV Community