DEV Community: BotConductStandard

BotConduct Behavioral Briefing — May 2026 Receiver-side observations on automated actor activity across multiple industry verticals.

BotConductStandard — Tue, 19 May 2026 02:33:54 +0000

https://botconduct.org/research/behavioral-briefing-may-2026/

When Scrapers Stop Forgetting: What Autobrowse Means for the Receiver Side

BotConductStandard — Sat, 09 May 2026 02:10:06 +0000

This week, Browserbase open-sourced Autobrowse — a browser agent that learns a target site through 3–5 iterations, writes the discovered path to a markdown file, and the next agent reads that file before starting. Each iteration costs less. Each iteration finds endpoints the previous one missed.

The release has been received as a productivity tool. It is also a structural shift in how the offensive side of bot management works — and most defensive infrastructure has not caught up.

This article explains why the implications are larger than the release suggests, and what the receiver side has to do differently from this point forward.

What changed

Until the past few weeks, the assumption underlying nearly all bot management was that automated traffic was stateless. A scraper started from zero on each session. It would attempt requests, get blocked or rate-limited, and either succeed within the constraints of its programmed logic or fail visibly.

That assumption let signature-based detection work. WAF rules, JA4 fingerprinting, IP reputation databases, and bot scoring all rely on the same premise: bots repeat predictable patterns within their cognitive limits, so identifying those patterns blocks them.

Autobrowse breaks that assumption at the architectural level.

The agent runs against a target site. It tries, fails, learns, tries again. After 3–5 rounds it converges on a path that works. It writes that path to a local markdown file. The next agent loads the file and skips straight to the working approach.

The numbers from the release:

Craigslist scrape: $0.22 / 71 seconds → $0.12 / 27 seconds after graduation
Form-fill task: $1.40 → $0.24 by the fourth run
A federal grants portal scrape converged into a single undocumented JSON endpoint the agent discovered autonomously, replacing 28 pages of crawling
These are not theoretical numbers. The repository is public. Any developer with a GitHub account and basic infrastructure can deploy this today.

Why this matters beyond cost reduction

The cost numbers are the surface story. The structural story is that scraper intelligence now compounds across sessions.

Three things stop working as defense when this assumption breaks:

Static fingerprint matching loses signal. JA4, JA3, TLS fingerprints — all of these work because bots repeat. When agents iterate against what gets blocked and rotate accordingly, the fingerprint becomes a moving target. The agent learns which signals get detected and stops emitting them.

IP reputation databases degrade faster than they can update. Reputation works on the premise that bad actors operate from identifiable infrastructure long enough to be catalogued. When an agent rotates infrastructure based on what was blocked yesterday, the reputation database is always one cycle behind.

WAF rules calibrated on historical patterns become obsolete on publication. Any rule written today against today's bot behavior is documenting a behavior that the next agent generation has already learned to avoid.

This is not a new observation in academic threat research. What is new is that the gap between offensive capability and defensive infrastructure is now demonstrable, public, and measurable in dollars.

The receiver-side principle

If signature, fingerprint, and reputation are all forms of identity-based defense — meaning they classify traffic by what it claims to be — then the only defense that scales against learning agents is classification by what the traffic actually does.

This is what we call the receiver-side principle. The site receiving the traffic observes behavior in real time, independent of declared identity, fingerprint, or origin. The classification happens at the destination, where the behavior actually unfolds, not at the perimeter where the identity is claimed.

Three properties make receiver-side observation structurally different:

It is independent of declared identity. A legitimate Claude-User and a malicious agent imitating Claude-User produce different behavioral patterns regardless of what their User-Agent says. The classification does not depend on the agent telling the truth about itself.

It does not require pre-existing signatures. A new agent class — one that has never been seen before — produces observable behavior the moment it interacts with the site. Behavioral classification can describe that behavior without prior labels.

It is harder to game than identity-based defense. An agent that learns to spoof a fingerprint or rotate IPs still has to do something on the target site. The doing is what gets observed.

Receiver-side observation does not replace WAF, bot management, or rate limiting. It supplements them with a layer that operates on different inputs and resists different attacks.

What this means in practice

For sites that depend on accurate classification of incoming traffic — publishers monetizing AI access, marketplaces protecting inventory, fintechs gating account access, SaaS platforms protecting API rate budgets — the practical implication of this week's release is concrete.

The defensive frameworks built on the assumption that scrapers forget have weeks, not quarters, before the offensive side catches up. The first generation of Autobrowse-derived tools is already running. The second generation, with refined SKILL.md sharing across operators, is a matter of when, not if.

The receiver-side question for any site operator is not "how do I block Autobrowse." That framing repeats the signature-based mistake at a higher level. The right question is: can I describe what my legitimate traffic actually does, behaviorally, well enough to recognize when something else is in the mix?

That is a question about observability, not about blocking. It requires telemetry on the receiver side, classification on behavior rather than identity, and a baseline of what normal looks like for the specific site.

A note on offensive operators

There is a corollary worth naming explicitly.

The same shift that complicates defense also complicates compliance for legitimate offensive operators — companies like Apify, Zyte, Bright Data, Oxylabs, and others that operate scraping infrastructure for legitimate enterprise customers.

These operators run SOC2-compliant programs. They have legal contracts, terms of service, and audit trails. But from the receiver side, their traffic is increasingly difficult to distinguish from hostile scrapers running modified versions of the same tools.

The asymmetry is structural: a SOC2 attestation describes the operator's internal controls. It does not produce a signature the target site can verify externally. Receiver-side behavioral classification is the only mechanism that lets a target site confirm whether incoming traffic — regardless of who claims to be sending it — actually behaves consistently with stated terms.

This has commercial implications. Legitimate operators with measurable behavioral signatures can demonstrate compliance to target sites in a way that hostile imitators cannot. That is a market that does not exist yet but will, soon.

Closing

The release of Autobrowse is not a single event. It is one inflection point in a transition that has been building for 18 months and that the signature-based defensive stack was not designed to handle.

The frameworks that assume scrapers forget have a limited shelf life. The receiver-side principle — observing behavior independently of declared identity — is the only approach that scales as agent intelligence compounds.

What we are doing at BotConduct is operating that layer. We classify what is actually visiting a site, in real time, independent of WAF, gateway, or bot management. The same telemetry that identifies hostile traffic also gives legitimate operators a way to demonstrate behavioral compliance — measured, not declared.

The shift this week is real. The defensive response has not started yet. That window does not stay open long.

About BotConduct: BotConduct operates a behavioral observation layer for web traffic, designed for the agentic era. For a sample assessment under NDA, contact hello@botconduct.org.

45% of Hostile Bot Traffic Passes Your WAF. Here's Why. What behavioral detection reveals when you cross-reference hostile actors against AbuseIPDB

BotConductStandard — Thu, 30 Apr 2026 01:14:03 +0000

Most enterprise WAFs are configured to block IPs above a certain abuse confidence threshold. AbuseIPDB threshold 50 is a common SOC default. The assumption is that hostile traffic gets caught at the gate.

We tested that assumption.

Of 240 hostile actors detected by behavior on our infrastructure over 19 days, operating from 380 distinct IPs, 45% have AbuseIPDB scores below 50. They pass standard WAF configurations because their IPs aren't reported enough times to trigger blocking. They behave hostile but they don't yet have the reputation to match.

Here's what we found, and what it means.

The data

We cross-referenced a sample of 100 hostile actors detected by behavioral analysis against two public threat intelligence sources: GreyNoise Community API and AbuseIPDB.

The methodology was simple. For each IP we asked: does any public threat feed know this is hostile?

The results, ordered by threshold:

Threshold	% of hostile actors that pass
0 reports (completely unknown)	18%
0-2 reports (noise level)	25%
0-5 reports (under the radar)	30%
Score below 25 (typical SOC threshold)	32%
Score below 50 (typical WAF threshold)	45%

The 45% figure is operational. It's not "completely invisible to threat intel." It's "low enough confidence that automated systems leave them alone."

Why this happens

Public threat intelligence works by aggregation. Someone has to report an IP. Multiple reports increase confidence. Eventually the IP crosses thresholds and gets blocked.

That model breaks against actors who do three things:

One: rotate infrastructure aggressively. A single hostile actor using residential proxies through providers like Chiron Software LLC operates from IPs that look like home internet connections. Those IPs cycle out before they accumulate reports.

Two: stay below volume thresholds. An actor making 5-15 requests per IP, then rotating, never triggers per-IP detection. The aggregate behavior is hostile. The per-IP behavior looks like noise.

Three: target sites that don't report. Most websites block hostile traffic silently. They don't submit IPs to public databases. The hostile activity happens but never enters the threat feed loop.

The result is a class of actors that operate hostile, persist for weeks, and remain technically invisible to reputation-based defenses.

What 18% completely invisible looks like

The cleanest data point is the 18% who have zero reports anywhere. We checked the profile of those 18 IPs:

12 of 18 (67%) belong to Chiron Software LLC, a US residential proxy provider
14 of 18 (78%) are categorized as "Fixed Line ISP"
13 of 18 (72%) geolocate to United States

Translation: hostile actors are running through US residential proxy networks and getting traffic that looks like home internet users. There's nothing in the IP metadata that triggers suspicion. The only way to identify them is to look at what they do, not who they are.

What this means operationally

If you depend on IP reputation to filter traffic, you're catching the actors who already burned their cover. The careful operators slip through.

Three concrete implications.

For SOC teams: AbuseIPDB threshold 50 catches the loud actors but misses 45% of the careful ones. Lowering threshold catches more but generates noise. The structural problem is that reputation-based detection has a built-in delay. By the time an IP earns a reputation, the actor has rotated to a new one.

For compliance and audit: "We block known malicious IPs" is a defensible technical statement that doesn't reflect reality on the ground. The hostile traffic on your infrastructure isn't all coming from known-bad addresses. A meaningful portion is coming from addresses that no public source has flagged.

For procurement of security tools: Vendors that price by IP reputation feeds are pricing the easier 55%. The harder 45% requires behavioral measurement that most current tooling doesn't do.

How we detected what threat feeds missed

The actors that pass WAFs aren't invisible to behavioral observation. We detected them through behavioral trajectory analysis -- patterns in how they navigate, what they request first, how their sessions evolve over days, and inconsistencies between their declared identity and their technical fingerprint.

None of these signals require knowing who the actor is. All of them produce evidence that holds up under audit.

The structural difference between behavioral detection and reputation-based detection is timing. Reputation tells you what an IP did somewhere else, after someone reported it. Behavior tells you what an actor is doing on your infrastructure, right now, before anyone else sees it.

What we're publishing next

The full Bot Conduct Report 2026 will cover all 421 actors observed across 19 days, with behavioral profiles, infrastructure mapping, and the methodology in detail.

For now, the practical takeaway is narrow and verifiable: if your defense depends on IP reputation, 45% of hostile traffic is configured to walk past it.

If you want to see what hostile traffic looks like on your specific infrastructure, our Site Risk Assessment produces an independent forensic report.

Full write-up: https://botconduct.org/blog/waf-bypass-45-percent/

Methodology details available on request. Data from BotConduct Observatory, April 2026.

Alibaba Cloud and AWS host the anonymous bot harvesting our site. Yours could be next.

BotConductStandard — Sat, 25 Apr 2026 01:52:11 +0000

We run an independent observatory that measures how bots and AI agents behave on the open web. Last week we caught something that's worth writing about.

The pattern

It started with a TLS fingerprint that kept showing up across different IP addresses. Same handshake, same parameters, same JA4 hash: t13d311100_e8f1e7e78f70_d41ae481755e.

That fingerprint is interesting on its own. It tells you the client uses TLS 1.3, with 31 cipher suites and 11 extensions. But the part that matters is the ALPN field. It's empty.

Real browsers always advertise ALPN. Chrome sends h2. Firefox sends h2. Safari sends h2. They negotiate HTTP/2 because every modern browser uses HTTP/2. A client that connects with TLS 1.3 in 2026 and announces no ALPN is not a browser. It's an HTTP library — Go's net/http, Python's requests with custom TLS, something in that family.

So we already knew: not a browser. Whatever was visiting us was pretending to be one.

What it was pretending

The user agents told the rest of the story. The same JA4 fingerprint cycled through 13 different browser identities: Chrome 135 on Windows, Chrome 135 with Edge, Chrome 134 on Mac, Firefox 137, Safari 18.3, Safari 18.2, Chrome with Adguard, Chrome 131, Chrome 130, Chrome 116, ChromeOS, and a few others.

Thirteen browsers. One TLS handshake. The math doesn't work. Real users don't have thirteen browsers. Real browsers don't share TLS fingerprints. Someone built a list of common user agents and rotated through them on every request, while the underlying software stayed the same. That's deliberate. That's evasion.

Where it was coming from

We pulled the IPs and ran them through ARIN. The allocation 47.74.0.0–47.87.255.255 is assigned to Alibaba Cloud LLC (AL-3). All 107 connections from this fingerprint to our site originated from rented infrastructure inside that allocation.

So we knew where the rental came from. We didn't know who rented it. Alibaba Cloud doesn't publish customer information. The trail stops at the cloud provider's perimeter.

The detail that made it worse

While we were looking at the Alibaba traffic, the same JA4 fingerprint appeared once on a different IP: 3.91.x.x. That block belongs to Amazon Web Services, us-east-1.

One hit. Same fingerprint. Different cloud.

That changes the picture. It's not a bot operating from Alibaba Cloud. It's a bot whose operator runs the same software across multiple cloud providers. Multi-cloud isn't a coincidence. It's how you build infrastructure that's hard to take down and hard to attribute.

What it was doing

The behavior on our site was consistent with content harvesting. The bot consistently accessed paths that no organic visitor would reach. It never requested robots.txt. Not once across 107 connections. It never identified itself as a bot in any user agent. It hardcoded a referer header pointing to our home page on every request, regardless of where it actually came from.

There's also a small technical tell. One of the first paths it visited was a malformed URL: it had tried to follow a link to a Twitter profile from our home page, and it didn't resolve the URL escapes correctly. Browsers don't do that. HTML parsers built into scraping libraries do.

What we can prove and what we can't

We can prove the TLS fingerprint. We can prove the IP ranges. We can prove the user agent rotation. We can prove the never-read-robots-txt. We can prove the multi-cloud appearance of the same software. All of this is independently verifiable: ARIN for IP attribution, the JA4 spec for fingerprint interpretation, our cryptographically signed observation chain for the request data.

We can't prove who runs it. We can't prove what they do with the harvested content. We can't prove which other sites they're hitting. We can guess based on behavior — content harvesting at this scale, with this level of evasion, is consistent with AI training data collection or competitive scraping operations. But guessing isn't proof.

The part that should bother you

Both Alibaba Cloud and AWS prohibit exactly this kind of activity in their Acceptable Use Policies. AWS explicitly forbids "scraping" and "unauthorized data collection." Alibaba Cloud's terms forbid using their infrastructure for "activities that violate the legitimate rights and interests of others." Both providers wrote those rules. Neither enforces them in any way that would prevent what we're describing.

The infrastructure is rented. The policies are written. The enforcement is absent.

If you run a website, this matters to you. The bot we measured is one operator using one software stack. If our small observatory caught it in a few days of operation, the actual scale of this activity across the web is much larger. The same anonymous infrastructure is available to anyone with a credit card. The same lack of enforcement applies to everyone using it.

You probably won't see this kind of traffic in your standard analytics. Your CDN might rate-limit it, but it won't tell you what it was. Your WAF might block some of it, but it won't attribute it. The systems we built to defend the web were built when bots had names and IP reputation meant something. Anonymous operators rotating across cloud providers don't fit that model.

What we're doing about it

We're publishing what we measure. The data behind this post is part of a larger registry of observed bot behavior, classified by what bots actually do on the open web rather than what they claim. We can't identify the operators. We can identify the patterns. We think that's worth making public.

Think this bot might be hitting your site? We'll run a free vulnerability report for you. Send us your domain to hello@botconduct.org with subject "Vulnerability Report" and we'll tell you what we see.

The full methodology, registry, and cryptographically signed evidence chain: botconduct.org

We're going to keep publishing cases like this. There will be more.

— BotConduct

BotConduct Training Center: free adversarial evaluation for your AI agent

BotConductStandard — Tue, 21 Apr 2026 15:18:49 +0000

We just launched the free tier of BotConduct Training Center — an adversarial evaluation platform for AI agents.

The problem

You built an AI agent. It works great in testing. But what happens when:

A user tries to extract its system prompt?
A caller impersonates authority to bypass restrictions?
Contradictory information gets planted across a conversation?
Adversarial patterns emerge across multiple interactions?

You don't know until production. Now you can find out before.

What Training Center does

You point your agent at our API. We play an adversarial customer who progressively escalates pressure over multiple turns. Your agent responds naturally. We evaluate every response and tell you exactly where it breaks.

Two evaluation paths:

Chat/API — for chatbots, voice agents, SDR agents, customer service bots
Web crawl — for crawlers, scrapers, search agents (evolving signals, contradicting directives mid-session)

Free tier

3 evaluations
2 adversarial scenarios
Detailed violation report
Ed25519 signed certificate
Badge for your README

No signup. No API key.

Quick start

curl -X POST https://botconduct.org/api/v3/training-center/start \
  -H "Content-Type: application/json" \
  -d '{"bot_name":"MyAgent","operator":"me","scenarios":["C1","C3"]}'

Full examples in Python, Node.js, and cURL:
https://github.com/alemizrahi1/agent-stress-test

Interactive playground:
https://botconduct.org/playground/

Professional tiers

Need more? Level 1 Basic ($500), Professional ($3,500), and Full Certification ($12,000) add more adversarial scenarios, longer sessions, forensic reports, and certificates citable in enterprise procurement and regulatory filings.

https://botconduct.org/training-center/

What are you building?

Curious what kind of agents people are working on and how they handle adversarial inputs. If you run the free test, share your results — especially the failures. That's where it gets interesting.

ai #agents #security #testing

Static compliance checklists can't measure AI agent behavior. Here's what does.

BotConductStandard — Mon, 20 Apr 2026 16:33:53 +0000

Agent-evaluation products in 2026 fall into two generations. First-generation: static pass/fail checklists. Second-generation: evaluation under changing conditions, where behavior trajectory is measured rather than endpoint state. The first generation can't answer the questions CTOs and CISOs actually ask. The second generation can — and it works the same way across every platform.

The problem with ten checks

Most agent-readiness products shipping today work the same way. Define N rules. Test whether the bot passes each. Aggregate into a score. Ship a certificate.

The appeal is obvious. It's auditable. It maps to how SOC 2 reports look. A CISO understands it without training.

The problem is also obvious once you think about production incidents. The evaluation measures observable state at a single point in time. It tells you nothing about how the agent behaves when conditions around it change — when signals evolve, when server state shifts, when adversarial inputs arrive. These are the situations that cause real production incidents, and they are precisely what static evaluation cannot measure.

The community already said this

On recent threads about agent-readiness tooling, the paraphrased reaction from sophisticated technical commenters has been: "10 static checks is like SEO in 10 static checks. It misses the point."

That critique is correct. The market is already splitting into two camps, and first-generation tools are being read as legacy.

What second-generation looks like

Instead of testing compliance with fixed rules, second-generation evaluation measures behavior trajectory under evolving conditions. The agent is placed in environments where directives can change during the session, where signals can contradict, where adversarial inputs test discipline.

What gets measured is not a state at a single point in time, but the decision trajectory across the scenario — what the agent chose when forced to interpret ambiguous inputs, how it recovered from errors, whether it held scope under pressure.

The specific scenarios, thresholds, and evaluation criteria are not disclosed publicly. This is deliberate: revealing the mechanism would let operators tune agents to pass without demonstrating genuine compliance. The methodology is a closed oracle — reproducible internally, verifiable externally through cryptographically signed observation records, but not publicly described.

What the report looks like

First-generation reports produce checkmarks:

[✓] Identifies as bot
[✓] Respects standard directives
[✗] Publishes declaration URL
Score: 87/100

Second-generation reports produce trajectories:

T+0s   | Session initialized, agent fetched initial directives
...    | Scenario-specific events recorded with timestamps
T+N    | Agent made decision in response to changing conditions
...    | Multiple decision points across the session

Verdict: [PASS|FAIL] per scenario
Reason: Specific agent behaviors in context,
        with cryptographically signed observation IDs
        for each event.

The first shows the state. The second shows the decision. In a production incident, only the decision matters.

Cross-platform by design

The certification is infrastructure-neutral. An agent certified by the methodology is recognized the same way by a site behind Cloudflare, one running DataDome, one with in-house infrastructure, and one with nothing at all. It doesn't compete with bot-management vendors — it's the independent layer they can cite. Like a passport for AI agents: issued once, honored everywhere.

The same principle applies to the regulatory plane. One certification bundles compliance evidence against multiple frameworks simultaneous ly — EU AI Act, GDPR, California SB 1001, RFC 9309, W3C TDMRep, EU DSM Directive. Instead of demonstrating compliance six separate times against six separate auditors, the operator is evaluated once and the result can be cited in any jurisdiction.

Why this distinction is urgent now

Regulatory pressure is specific about conduct. EU AI Act Article 50 requires disclosure during interaction, not at deployment. GDPR rights apply per-request. California SB 1001 demands honest identification in the context of a conversation. These are dynamic obligations, not static attestations.

Enterprise buyers ask operational questions. A CTO doesn't ask "does it pass a 10-check list." They ask how the agent behaves when conditions in the real deployment environment change.

Incidents are documented. Recent disclosures in the infrastructure-vendor space have confirmed AI-accelerated attacks exploiting agent platforms. The evaluation framework appropriate to this threat model is not a checklist.

What BotConduct is building

BotConduct Training Center is designed second-generation from day one. Level 1 is static hygiene (basic sanity is the floor). Level 2 measures behavior under evolving conditions. Level 3 measures conduct integrity under adversarial probing. Each evaluation produces a cryptographically signed trajectory, not a checklist.

Each observation is signed with Ed25519 and recorded in an append-only chain. Public key at botconduct.org/.well-known/bcs-public-key.pem. Anyone can verify any observation via botconduct.org/api/verify-observation/{id} without trusting us.

If Moody's rates bonds and FICO rates people, BotConduct rates how an AI agent behaves when nobody is watching — and the certificate works across every platform.

Landing + pricing: botconduct.org/training-center
Regulatory foundation: RFC 9309, EU AI Act Art. 50, EU DSM Directive Art. 4, California SB 1001, W3C TDMRep, GDPR.

Discussion welcomed. What scenarios would you want to see in a second-generation evaluation of your own agents? What does your team currently use to measure agent behavior under change?

194 IP Addresses. One Fake iPhone. Six Days Undetected. published: true

BotConductStandard — Sat, 18 Apr 2026 14:28:43 +0000

A scraper ran on our network for 6 days using 194 different Tencent Cloud IPs. Every request carried a fake iPhone User-Agent (iOS 13.2.3 from 2019). It never read robots.txt. It never identified itself. It averaged 1.8 requests per IP -- staying below every rate limiter, every WAF rule, every IP-based detection system.

In your analytics, this looks like 194 different people casually browsing on iPhones. No alert. No anomaly. Nothing to investigate.

The numbers:

194 unique IPs (all ASN 132203, Tencent Cloud)
362 requests over 6 days
Fake iPhone UA (iOS 13.2.3 -- released November 2019)
1.8 hits per IP average (evades all IP-based detection)
Never read robots.txt
Hit paths across entire site including /es/, /de/, /fr/, /no/, /zh/
All datacenter IPs -- no real iPhone connects from a datacenter

What this means:
If you run e-commerce, it has your prices. If you run media, it has your content. If you run SaaS, it mapped your app. And you never saw it because every request looked like a real user.

We caught it by measuring behavioral conduct -- not counting IPs.

Full forensic breakdown: https://botconduct.org/report/april-2026/part-2/
Part 2 of the State of Bot Conduct series. Part 1: https://botconduct.org/report/april-2026/part-1/

BotConduct.org -- Behavioral scoring for bots and AI agents.

GPTBot follows content invisible to humans. TwitterBot and ClaudeBot don t.

BotConductStandard — Fri, 17 Apr 2026 19:27:17 +0000

We run a behavioral observation network that scores how bots and AI agents conduct themselves when they visit websites. We scored 172+ operators. The results were eye-opening.

GPTBot: 8 content requests in 14 seconds

On April 17, 2026, OpenAI s GPTBot visited our network from IP 74.7.241.33 -- verified against OpenAI s own published ranges at openai.com/gptbot.json.

In a single session of 51 seconds, it made 39 requests. 8 of those went to content not visible to human visitors. All 8 in a 14-second burst.

GPTBot does not render CSS. It parses raw HTML and follows every anchor tag it finds -- visible or not. It cannot tell the difference between content meant for users and content that is hidden from the rendered page.

A 00B company s flagship crawler, navigating the web blind.

TwitterBot and ClaudeBot: zero

X Corp s TwitterBot and Anthropic s ClaudeBot visited the same pages. Same HTML. Same content -- visible and hidden.

Neither followed any hidden content.

Three crawlers. Three of the biggest tech companies in the world. Same test. Two understood what humans can see. One didn t.

The full leaderboard

This is not a cherry-picked comparison. We scored 172+ bot operators on behavioral conduct. Here is how the named operators rank:

The pattern: the biggest name does not mean the best behavior. Some of the most well-funded AI companies run crawlers less sophisticated than open-source projects with zero budget.

What happens when a crawler can t see

Hidden content exists everywhere on the web: honeypots, bot detection systems, anti-scraping layers, admin panels, internal tooling. A crawler that follows everything blindly will:

Trigger every honeypot it encounters
Get flagged by every bot detection system
Scrape content it was never meant to access
Get blocked, rate-limited, and blacklisted

This is not about ethics. This is about engineering. Rendering CSS is a solved problem. Google s crawler does it. Anthropic s does it. X s does it. OpenAI s does not.

We contacted OpenAI

We emailed opt-out@openai.com on April 17, 2026 with 48 hours notice before publication. No response as of this writing. If they respond, we will update this post.

This is Part 1 of 5

We are publishing one finding per day:

Part 1 (today): GPTBot and hidden content
Part 2: 194 rotating IPs with a fake iPhone User-Agent. Six days. One cloud provider.
Part 3: The crawler that ignored its own standard
Part 4: What bot traffic actually costs you
Part 5: A free tool to see what is hitting YOUR site right now

Full report with research disclaimer: botconduct.org/report/april-2026/part-1

Want to see what bots do on your site? Free sensor, 30 seconds, one line of code: botconduct.org/sensor.html

I watched 145 bots visit my site for two weeks. Here is what I learned.

BotConductStandard — Thu, 16 Apr 2026 04:13:40 +0000

Two weeks ago I put a fresh site online and started logging every request. I wanted to answer a simple question: how much of my traffic is actually human?

Turns out, barely any.

The raw numbers

Across those two weeks I observed 145 distinct bots hitting the site. Some declared themselves honestly. Some pretended to be iPhones from 2019. Some came in through Cloudflare. Some came in through rotating AWS IPs and never stopped.

I was interested in more than just counting them. I wanted to know how each one behaved — not the identity, the conduct. Did it read robots.txt? Did it respect rate limits? Did it avoid obviously private paths? Did it keep a stable user-agent across requests?

I ended up with a scoring system. Each bot got a number between 0 and 100 based on observable behavior.

The distribution was surprising.

The well-behaved majority

The bots at the top of the ranking are exactly the ones you would expect. Major search engines. AI crawlers from the big labs. A few SEO tools. Social preview bots.

GPTBot (OpenAI), ClaudeBot (Anthropic), Bingbot (Microsoft), Bytespider (ByteDance), Baiduspider, YandexBot, Meta's scraper, redditbot — all landing at 100 out of 100.

It makes sense once you think about it. These companies operate massive crawling infrastructure. They know every site they hit is watching. They have compliance teams. Their crawlers are boring in the best way — they announce themselves, stay within limits, and leave.

The hostile minority

The bottom of the ranking was where it got interesting.

About 27% of bots scored below 50. A few of them were recognizable — L9Explore, the crawler operated by LeakIX, probing sensitive paths aggressively. Keydrop Scanner doing credential probing. A stream of anonymous WordPress scanners hammering /wp-admin on every domain they find.

The worst offender was a single IP on AWS that sent 2,562 requests in one day. No user-agent. No interest in robots.txt. Just walking through every endpoint it could find.

Another favorite: a bot presenting itself as iPhone; iPhone OS 13_2_3 — an iOS version from late 2019. Nobody real is running that in 2026. The user-agent is a lie and the behavior matches. Distributed across dozens of residential IPs.

The middle is the interesting part

The polar ends of the distribution are easy. Known good bots are good. Obvious scanners are obviously malicious.

The middle third is where real decisions live. Crawlers from cloud providers like Tencent sat around 36. Not malicious per se, but also not identifying themselves well and using rotating IPs. If I were running a site that mattered, would I let those through? Block? Rate-limit?

This is the category where block everything automated destroys legitimate use cases (partners, vendors, research tools) and allow everything destroys your servers. It's where the real work is.

What I built

I stopped logging and started building. The passive observations became an API — you send it a suspicious request, it sends you back a score and a recommended action.

The action is one of four: allow, throttle, challenge, block. Anything my middleware can handle in three lines.

verdict = bcs.score(
    user_agent=request.headers["User-Agent"],
    path=request.path,
    headers=dict(request.headers),
)

if verdict["action"] == "block":
    return 403

The rubric that produces the score is proprietary, but the verdicts are public. Every bot I scored shows up in a public registry with its current rating. Operators can claim their entries and upgrade to a cryptographically signed identity if they want higher trust.

What it changed for me

Before this experiment, I treated automated traffic as a nuisance. Something to filter, block, ignore.

After two weeks of looking closely, I think about it differently. The web is becoming a conversation between automated agents — and most of them are trying to do their jobs well. The bad ones are loud, and they get all the attention, but they are the minority.

Giving the well-behaved agents a way to prove it — and the sites a way to verify it — seems like a better answer than the status quo of blocking everything automated.

If you want to try it

If you run a bot or agent: there is a public certification flow. It takes 30 seconds for basic certification, a few minutes for something more serious.
If you run a site: the API has a free tier (5,000 scores per month) if you want to experiment.

Everything is at botconduct.org. The first production site running this end-to-end is importsignals.com — their bot policy page is a reasonable reference if you want to see what it looks like in the wild.

Would love to hear from other people who have measured their bot traffic seriously. I suspect the 27% hostile number is conservative.

Follow-up thread and registry updates at @botconduct.
Rafa Mizrahi