Coinbase Cut Its AI Spend in Half Without Throttling Engineers - Here's the Playbook

#ai #llm #cloud #devops

Coinbase halved its AI spend while token usage kept growing exponentially. CEO Brian Armstrong posted the breakdown on X this week — five concrete levers, no access caps, and 91% of engineers never hit the old usage limits.

That last point matters. This isn't a story about restricting developers. It's a story about routing smarter.

"We're experimenting with defaulting to open weight GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task."

— Brian Armstrong, CEO Coinbase

What actually changed

Armstrong outlined five levers Coinbase pulled:

Gateway defaults — Engineers now default to GLM 5.2 (Zhipu AI) and Kimi 2.7 (Moonshot AI), both open-weight models. They can override, but the default does the heavy lifting.
Task-based routing — Prompts are automatically matched to the best model by difficulty and price. Not every task needs Opus.
Caching — Hit rate went from 5% to 60%. That's a 12x improvement and the single highest-leverage change in the whole list.
Lean context — Start fresh sessions when switching tasks. Don't drag stale context around.
Spend visibility — Per-engineer token usage is visible, with an explicit expectation attached: "The more you spend on AI, the more impact we expect." No hard caps, just accountability.

Why this is bigger than one company's infra post

GLM 5.2 runs at roughly $1.40/$4.40 per million input/output tokens. Anthropic Opus 4.8 is $5/$25 — a 3–6x price differential that compounds fast at Coinbase-scale token volumes.

Coinbase isn't alone. Snowflake's CEO found GLM 5.2 competitive with Opus 4.7 at a fraction of the cost. Lindy, an AI startup, moved off Claude entirely to DeepSeek v4. These aren't one-off experiments — they're signals that enterprise budget pressure is shifting real workloads to cheaper open-weight models.

That's direct revenue pressure on Anthropic and OpenAI, both of which are approaching or actively building towards IPO moments that require durable enterprise revenue growth.

What to do

If you're running AI infra at any scale, three of Coinbase's five tactics are independently implementable right now:

Audit your caching hit rate. If it's under 20%, you're leaving money on the table. Prompt structure often drives this more than infrastructure.
Route by task complexity. Not everything needs your smartest (most expensive) model. Classify tasks and route accordingly — even a basic "simple / complex" split moves the needle.
Default down, opt up. Flip the gateway default to a cheaper model. Let engineers escalate when they need to. The data shows most won't need to.

Open-weight Chinese models (GLM, Kimi, DeepSeek) carry licensing and data residency considerations worth checking against your compliance requirements — especially in regulated industries. Routing policies can also introduce silent quality degradation at edge cases, which Armstrong's post doesn't address. Test before you trust.

Source: Let's Data Science · Armstrong's X post (June 28, 2026)

✏️ Drafted with KewBot (AI), edited and approved by Drew.

Top comments (1)

Alex Shev • Jun 30

The playbook only works if the savings show up as routing policy, not as a memo to engineers to be careful. Once the cheap/default path is encoded in the gateway and exceptions are explicit, cost control stops fighting developer flow.