Oleksii Antoniuk

Posted on Jun 3 • Originally published at oleant.dev

A Lie Detector for HTTP Requests: Analytics Through Time

#laravel #webdev #analytics #security

Remember that guy from the last article who visited my site using Firefox 140.0 while the rest of the world was still stuck in the 130s? Back then, it felt like an exotic anomaly, a visit from a "guest from the future." But when these "time travelers" start showing up in formation, you realize: it's time to stop being surprised and start classifying them.

(Read the details in the article on my Blog "Through the Looking Glass of Logs: Karachi Police, DuckDuckGo, and IPv6 Magic" )

If the first version of my Laravel analytics package was just a "peep-hole" in the door, version 1.3.0 has evolved into a full-scale digital customs checkpoint—complete with X-rays, biometrics, and an elephant's memory. Let’s look under the hood and break down how to distinguish a real human from a script that desperately wants to be your friend.

Chapter 1: The Evolution of Paranoia (Scoring System)

In the beginning, I was naive. I believed the internet was a world of black and white: if the User-Agent said "Googlebot," it was a robot—we’d shake its hand and show it to the indexing section. If it said "Mozilla/5.0," it was a human—we’d pour them a coffee and log them into the "Visitors" table.

But the reality of 2026 quickly shattered those rose-colored glasses. Today, even the laziest spam-bot, whipped up by a high schooler on a weekend, can mimic the latest Chrome build so skillfully that standard verification methods just give up. The bot no longer screams "I am a robot!". It enters quietly, subtly, rubbing its virtual palms together.

Realizing this, I turned my package into a digital investigator that no longer makes snap judgments. Now, it uses a weighting system (Scoring). The package doesn't issue a verdict immediately—it observes, cross-references facts, and builds a "suspicion file."

Here is what this real-time interrogation protocol looks like:

"Appeared out of nowhere" (+35 points): Imagine someone materializing right in the middle of your kitchen, bypassing the front door. If a request knocks on an internal page (e.g., straight to /checkout or deep into the blog) without a Referer header, my inner detective narrows his eyes. Regular people rarely teleport—they follow links.

"Digital Paleozoic" (+60 points): Suddenly, a guest on Windows XP pops up in the logs. In 2026! It’s like a gentleman in rusty knight’s armor showing up at a high-tech gala. Most likely, we’re looking at an old botnet using ancient libraries for requests. This is a major red flag, almost a conviction.

"Cloud Residency" (+100 points): If an IP check shows the guest lives in an Amazon, Hetzner, or DigitalOcean data center—game over. Normal people don't browse the web while physically sitting in a server rack in Frankfurt. This is an instant bot status, with no right to appeal.

When the total score in this "suspect card" breaks the 70-point threshold, the visit is officially flagged as "non-human."

This approach allows us to stay polite to real people who might just have a weird browser config or a paranoid antivirus stripping referers, while ruthlessly filtering out the "stealth bots" trying to leak into your clean analytics.

Chapter 2: Snitch Ports and the Referer Loop

Sometimes bots mess up on things so absurd that it feels like the script developer decided to wink at me from the shadows. In version 1.3.0, I implemented two traps that have become my favorites in the hunt for automated "guests."

1. Port Leak: The Spy Who Came from the Control Panel

Imagine a customer walking into your store, but their badge says "Junior Hacking Intern at cPanel." You’d be on guard, right? That’s exactly how it looks in the logs.

A real user comes to you from Google, social media, or types the address manually. But vulnerability scanners often work "in tandem" with hijacked hostings or control panels. As a result, a header hits my analytics: Referer: https://some-shadow-site.com:2083 ...

Port :2083 is the classic entry point for cPanel. A real person cannot "accidentally" follow a link from another server's admin panel to your site. It’s a dead giveaway that someone just finished dissecting a neighboring hosting and is now targeting your project. In my config, these "snitch ports" (including Plesk :8443 and Webmin :10000 ) have a weight of 100.

One hit like that, and the bot is instantly sent to the digital ban list before it can even say "Hello World."

2. Referer Loop:** Failing the Turing Test at the Starting Line

Bots try their hardest to look like "one of us." They know that a missing referer is suspicious (we already assigned +35 points for that in Chapter 1), so they try to simulate it. But sometimes they do it with the grace of a bull in a china shop.

I call this the Referer Loop. A bot lands on your page and sets the Referer header to... that exact same page.

Bot Logic: "If I say I came from here, the server will think I just clicked an internal link."

My Analytics Logic: "Buddy, this is your first visit. You haven't been inside yet to navigate from anywhere to anywhere."

Bot Logic: "If I say I came from here, the server will think I just clicked an internal link."

My Analytics Logic: "Buddy, this is your first visit. You haven't been inside yet to navigate from anywhere to anywhere."
If the system sees a page referencing itself during the very first appearance of that IP in a session—the masks are off. Turing test failed, and the bot score gets another 50 points. A living human must first arrive at the site before they can start moving between its pages.

How it looks in code

For those who like to "get their hands on" the implementation, here is the config snippet responsible for this "interrogation" stage:

/** 
* 4. SUSPICIOUS REFERER PORTS 
* Technical ports in the Referer header (cPanel, Plesk, etc.) 
* Real users almost never arrive from these ports. 
*/
'port_leak' => [
    2082, 2083, // cPanel
    2086, 2087, // WHM
    8443, 8880, // Plesk
    2222,       // DirectAdmin
    10000,      // Webmin
],
'weights' => [
    // Traffic arriving from technical control panels
    'port_leak'    => 100,
    // Referer loop detection (URL == Referer on first hit)
    'referer_loop' => 50,
],

Chapter 3: Snowball Effect — The Magic of Retroactive Retribution

If the previous methods were the "border patrol" at the gates, then the Snowball Effect is the work of internal security with access to the archives. This is the most powerful and, I admit, my favorite feature of the 1.3.0 release.

Imagine this: a bot visits you. Not a clunky Python script, but an elite "stealth operative." For the first three pages, it behaves perfectly:

It maintains pauses, mimicking human reading patterns.
It provides flawless headers.
It even "scrolls" (simulates activity).

My system looks at it and says: "Okay, buddy, you look human. Go ahead." We log it as a "clean" visitor, and it ends up in your beautiful statistics. But on the fourth page, the bot slips up.

Professional curiosity takes over, and it wanders into a "honeypot"—trying to read the /.env file or peek into /wp-admin.

In older versions of analytics, we would have simply flagged that fourth visit as "bot activity." But this is absurd! If it tried to steal your access keys at 2:05 PM, it means that at 2:00 PM it wasn't a fan reading your Laravel articles either. It was an enemy lying in wait.

In version 1.3.0, the system switches to Retroactive Retribution mode: "You sneaky piece of hardware!" says the package. "If you're a bot now, you've been a bot all along."

How it works technically: As soon as an IP address hits a critical weight (threshold) or lands in a Honeypot, the package triggers a background task (via Laravel Queue or Command). It pulls up the history of that IP for the last 60 days and "re-colors" all its past, seemingly "clean" visits with bot status.

/** 
* Snowball Effect (Retroactive Cleanup) 
* Automatically flags historical sessions of a newly identified bot. 
*/
'cumulative' => [
    'enabled' => true,
    'history_window_days' => 60, // How far back we are willing to "take revenge"
],

Your statistics are cleaned up retroactively. You open the dashboard and see that the "garbage" hits that managed to slip through disguised as real people have simply vanished. This isn't just analytics; it's a self-cleaning ecosystem.

Chapter 4: UA Detector or "Your Mustache is Falling Off"

To understand how my config works, you need to look at what bots are actually sending in their headers. Here are two classic examples from my logs that are guaranteed to fail the check:

"The Humble Automator"

python-requests/2.31.0 or GuzzleHttp/7

Why it's a bot: No guesswork needed here. These request libraries honestly admit what they are. In the config, they live in the suspicious_ua section:

/**
* 2. SUSPICIOUS USER-AGENTS (Common fragments)
* Common library or tool strings used by scrapers and automated tools.
*/
'suspicious_ua' => [
  'python-requests',
  'guzzlehttp',
  'go-http-client',
  'curl',
  ...
],

"The Forgetful Mimic"

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

Why it's a bot: Windows NT 6.1 (Windows 7) and IE 9 in 2026? This is either a very sad computer in a rural library or (more likely in 99% of cases) an old botnet using decade-old presets. This gets penalized via obsolete_os :

/**
* Obsolete OS versions (Windows XP, 2000, etc.) 
*/
'obsolete_os' => [
  'Windows NT 5',
  'Windows NT 6',
  'Mac OS X 10',
],

Code Implementation: Setting the Weights
And here is the heart of the system—the weight matrix in config/visit-analytics.php . This is where we define how strict our "border control" will be.

/* Scoring Weights (Suspicion Matrix) */
'weights' => [
    // Direct entry to internal page without Referer (bot behavior)
    'no_referer'     => 35,
    // Attempting to trick the system by setting Referer to current URL
    'referer_loop'   => 50,
    // If IP belongs to a data center (AWS, Hetzner, etc.) — instant 100% bot
    'datacenter'     => 100,
    // Click speed exceeding human capabilities (< 2 seconds between pages)
    'speed_anomaly'  => 50,
    // Visiting a "honeypot" (e.g., /.env or /wp-admin)
    'honeypot'       => 100,
    // Those weird ports in the referer (cPanel, Plesk)
    'port_leak'      => 100,
],

How the Honeypot Works

This is the most effective and fastest way to purge your logs. We add a list of paths to the config that a normal user of your Laravel app would never visit:

'honeypot_paths' => [
    '/.env',
    '/wp-admin',
    '/.git',
    '/bitrix',
    '/config.php',
    '/phpinfo.php',
],

The moment a script knocks on /.env , it gets is_bot = true and a maximum bot_score . And thanks to the Snowball Effect, all its previous attempts to "act human" on the home page are instantly nullified.

Chapter 5: What’s Next? (Filament Announcement)

Collecting data and filtering it with virtuosity is only half the battle. The real "dopamine hit" comes when you see the results of your work not in raw database tables, but in beautiful, intuitive graphics. There is nothing more satisfying than watching the gray "bot curve" decline while the green graph of real, live humans grows steadily.

Right now, I’m working on bringing all this "looking-glass magic" into the visual plane. In the next major release, I’m planning full integration with Filament. Why dig through configs via the terminal when you can manage your digital security through a stylish UI?

Here’s what’s on the horizon:

Real-time Dashboard: A set of widgets showing your site’s "pulse" in real time. You’ll literally see who the system has "neutralized" just now and for what reason.

Interactive Bot Map: Visualization of attacks and visits.
You’ll be able to see clearly where the Palo Alto scanners are coming from and where the "honest" search engines are lurking. It will look like a cyber-command headquarters.

Admin Panel Weight Management: No more editing .php files just to change a single sensitivity threshold. You’ll be able to fine-tune scoring weights, add new suspicious ports, or update your honeypot list with a few clicks without leaving the comfort of the Filament panel.

But first things first. The path to perfect analytics is a marathon, not a sprint. So stay tuned: I promise the updates will be "tasty," technically elegant, and incredibly useful for those who value the purity of their data.

Epilogue: Purity as a Religion

Ultimately, after implementing behavioral analysis and port checks, my traffic charts have "slimmed down" significantly. Но это та самая диета, которая идет на пользу. Now I know for sure: if I see a visit from Linux via IPv6, it's either my old friend, the "digital detective," or a genuinely interested pro—not just another Palo Alto Networks script that decided to read my Terms of Service.

Logs are not just text. They are a signature. And now, my Laravel package can distinguish the calligraphy of a living person from the mechanical stamps of a typewriter.

DEV Community