DEV Community

Tom Tokita
Tom Tokita

Posted on • Originally published at tokita.online

Tokenmaxxing Is a Symptom. Here's the Disease Every Enterprise Is Ignoring.

NVIDIA's vice president of applied deep learning, Bryan Catanzaro, said something in an Axios interview in April 2026 that should have stopped every enterprise AI roadmap cold:

"For my team, the cost of compute is far beyond the costs of the employees."

That is not a critic talking. That is the VP of the company selling the chips that power every AI datacenter on the planet. When NVIDIA's own leadership admits compute outweighs payroll, the "AI will save you money" narrative has a problem.

But most companies missed the signal. They were too busy tokenmaxxing.

Microsoft Pulled the Plug on Claude Code

In May 2026, Microsoft began cancelling the majority of its internal Claude Code licenses, redirecting thousands of engineers to GitHub Copilot CLI instead. The reversal came six months after the company opened broad access to Claude Code across its Experiences + Devices division, the group responsible for Windows, Microsoft 365, Outlook, Teams, and Surface.

Adoption was fast. Engineers, project managers, and designers embraced it for prototyping and development. The problem wasn't the tool. It was token-based pricing at enterprise scale with no consumption governance. Monthly bills became unpredictable and high enough to trigger a fiscal-year-end pullback.

Microsoft's $5 billion Foundry deal with Anthropic and Anthropic's $30 billion Azure compute commitment both remain intact. Not a relationship break. A cost-control correction.

A company with functionally unlimited resources still could not absorb uncapped AI token spend across thousands of users. That should tell you something.

Uber Burned Its Entire 2026 AI Budget by April

Uber's CTO, Praveen Neppalli Naga, confirmed to The Information in April 2026 that the company had exhausted its entire annual AI coding tools budget in four months. Claude Code was rolled out in December 2025. Adoption climbed from 32% of engineers in February to 84% classified as agentic coding users by March. By spring, 95% were using AI tools monthly, roughly 70% of committed code originated from those tools, and 11% of live backend updates were written by agents with no human in the loop.

The per-engineer cost: $150 to $250 per month on average, with power users running between $500 and $2,000. Naga himself reported spending $1,200 in a two-hour demo session. The tool didn't fail. Engineers didn't misuse it. They used it for exactly the workloads it was designed to handle. From a productivity standpoint the rollout was a success. From a finance standpoint it was a runaway.

Uber compounded the dynamic by ranking engineers on internal leaderboards based on Claude Code usage. That created a cultural incentive to consume more tokens. The teams driving adoption were not the same teams managing the spend.

They measured who was using AI. They never measured what it cost per unit of output.

Tokenmaxxing: When the Metric Becomes the Game

The term "tokenmaxxing" describes employees running trivial or unnecessary tasks through AI tools to inflate their usage numbers. Amazon employees admitted to the practice in May 2026 after the company set internal AI usage targets and tracked consumption through leaderboards. Workers reported feeling pressure to hit token quotas, even though Amazon publicly stated the numbers would not factor into performance reviews.

At Meta, the same dynamic played out through an internal tracking tool called "Claudeonomics," which ranked employees by their AI token consumption. The leaderboard reportedly showed 60 trillion tokens consumed in a 30-day period before Meta killed it after media coverage.

This is Goodhart's Law in real time. The moment token consumption became a tracked metric, it stopped being a useful measure of anything. Employees optimized for the number, not for the work the number was supposed to represent.

Tokenmaxxing isn't an employee behavior problem. It is a governance design failure. If you measure consumption without measuring value, you get consumption without value.

The Goldman Sachs Math That Should Scare Every CFO

Goldman Sachs published a research report forecasting that agentic AI will drive a 24-fold increase in global token consumption by 2030, reaching 120 quadrillion tokens per month. Their breakdown: a standard chatbot consumes roughly 1,000 tokens per session. An embedded copilot uses over 5,000 tokens per day. A continuously active autonomous agent burns through 100,000 or more tokens per day.

NVIDIA CEO Jensen Huang has said he expects 100 AI agents working alongside every human employee at NVIDIA by 2036.

Do the multiplication. 100 agents per employee, at 100,000 tokens per day per agent, is 10 million tokens per employee per day. Multiply that by any mid-size engineering team and the numbers become absurd before you even discuss pricing.

Gartner projects that by 2030, inference costs on a one-trillion-parameter model will be over 90% cheaper than in 2025. But their own analyst, Will Sommer, cautioned: "Chief Product Officers should not confuse the deflation of commodity tokens with the democratization of frontier reasoning." Agentic models require 5 to 30 times more tokens per task than standard models. Consumption growth will outpace falling unit costs. And AI providers are not going to pass through the full savings.

Cheaper tokens, more tokens per task, exploding number of tasks. The bill goes up.

The Pattern Is Obvious. The Fix Is Not Complicated.

Microsoft, Uber, Amazon, Meta. Four of the most technically sophisticated companies on earth. All hit the same wall. The pattern:

  1. Executive mandate pushes broad AI adoption
  2. Leaderboards or usage metrics track consumption volume
  3. No mechanism ties consumption to business value
  4. Token-based pricing creates unpredictable, escalating costs
  5. Budget blowout triggers reactive pullback or cancellation

The disease is not AI. The disease is adoption without governance. No consumption gates, no cost ceilings, and no way to tie a token to a deliverable.

I wrote about pre-action gates and agent production safety months before these headlines. The principle is the same whether you are running 100 Codex agents like OpenClaw's $1.3 million month or deploying Claude Code across 10,000 engineers. If there is no gate between the request and the spend, the spend wins.

The companies that will survive the agentic era are not the ones that adopt fastest. They are the ones that build harnesses before they build agents. Measure output, not tokens. Set cost ceilings per user, per team, per task category. Attribute consumption to deliverables, not leaderboard positions.

Tokenmaxxing is what happens when you skip that step.

Top comments (0)