HTTP status code 402 has existed since 1997. The original HTTP/1.1 specification reserved it "for future use." For nearly three decades, no one found a use for it. Last week, Cloudflare and Stack Overflow changed that.
The system is called pay-per-crawl. When an AI crawler requests a page from a participating website, the server responds with HTTP 402 — Payment Required — along with a price header. The crawler either pays or leaves. No negotiation. No "we'll give you exposure." Cash or nothing.
Cloudflare handles roughly one in five websites on the internet. That's not a niche experiment. That's infrastructure-level monetization of AI training data, deployed at the same scale the crawlers themselves operate at.
How the Tollbooth Works
The technical implementation is straightforward. Publishers set a flat per-request price for their domain. Cloudflare's Web Application Firewall identifies AI crawlers, categorizes them, and serves the 402 response instead of the content. Crawlers authenticate using Ed25519 key pairs and HTTP Message Signatures — cryptographic proof that the bot is who it claims to be, not a spoofed user agent.
Two payment flows exist. In the reactive model, the crawler hits the wall, sees the price, and retries with a payment header. In the proactive model, the crawler sends a maximum price upfront and gets waved through if the publisher's rate is at or below that threshold. Cloudflare acts as merchant of record — it aggregates billing, charges crawlers, and distributes earnings.
Publishers get three options per crawler: allow free access, charge the configured rate, or block entirely. They can exempt specific crawlers for partnerships.
Matthew Prince, Cloudflare's CEO, put it plainly: "AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators."
The Internet Is Becoming a Paid API
Stack Overflow is the first major partner, and the reasoning is transparent. Fifteen years of developer questions and answers — the largest structured knowledge base for software engineering on Earth — have been ingested by every major language model. Stack Overflow wants payment for future access, and it wants to distinguish between companies that negotiate bulk licensing deals (like its existing Frontier AI Labs partnerships) and those that send bots to scrape without asking.
Josh Zhang from Stack Overflow noted something telling: when they started serving 402 responses, some crawlers that had previously been getting 403 blocks simply stopped trying. The bots weren't sophisticated enough to handle a payment negotiation. They were built to scrape or give up. Nothing in between.
The publisher coalition backing this is not small. The Atlantic, BuzzFeed, Time, O'Reilly Media, Quora, Ziff Davis, Gannett, Internet Brands, and ADWEEK have all signed on. Renn Turiano at Gannett called blocking unauthorized scraping "critically important."
The Numbers Behind the Scraping
The Wikimedia Foundation — which runs Wikipedia — reported that 65 percent of its most expensive traffic since January 2024 came from bots. Bandwidth consumed for downloading multimedia content grew 50 percent. Automated requests, the foundation said, "have grown exponentially" alongside AI development.
That's a nonprofit encyclopedia built by volunteers. It doesn't charge. It doesn't serve ads. And bots are eating 65 percent of its infrastructure budget.
Cloudflare processes trillions of requests daily. It can see what publishers individually cannot: the aggregate scale of AI data extraction across the entire web. The company built pay-per-crawl after "hundreds of conversations with news organizations, publishers, and large-scale social media platforms." The pattern was consistent — web scraping for AI training had reached "potentially unsustainable levels," causing slower site loads and service disruptions.
What This Actually Changes
For AI companies, the calculus shifts. Training data has been treated as a natural resource — abundant, free, and inexhaustible. Pay-per-crawl introduces a cost. Not a lawsuit-in-three-years cost. A cost-per-request cost. The kind of cost that shows up in a quarterly budget and gets scrutinized by a CFO.
The system also supports future X402 payment protocols for machine-to-machine transactions — automated billing between bots and servers with no human in the loop. The infrastructure for an AI data marketplace is being built inside HTTP itself.
For publishers, this is the first credible alternative to the binary choice between open access and total blocking. The previous options were: let crawlers take everything for free, or deploy robots.txt and hope they respect it. Most didn't.
For everyone else, this is the end of a specific era. The internet was built on a handshake: content is free, ads pay for it, and search engines send traffic back. AI crawlers broke every part of that deal. They take the content. They don't show ads. They don't send traffic — they synthesize answers from your work and present them as their own.
HTTP 402 was reserved for future use in 1997. The future arrived. It looks like a toll road.
If you found this useful, check out my AI prompt packs on Polar.sh — battle-tested prompts for developers.
Top comments (0)