DEV Community

HybridTechie
HybridTechie

Posted on

GitHub Copilot switched to usage-based billing

On June 1, GitHub Copilot switched to usage-based billing, and Claude Opus 4.6 went from a 7.5× premium-request multiplier to 27×. If you've been running a frontier model in agentic mode all day like me, your bill didn't nudge. It multiplied.

None of this was a surprise. GitHub announced it on April 27. They shipped a preview bill in May so you could watch the number climb before it was real. They even told us why - in their words, a quick chat and a multi-hour autonomous session "can cost the user the same amount." That was never going to last, and they said so out loud.

We were warned. We just didn't look until the quota drained.

The reaction now will be predictable: cap spend, ration the expensive models, "right model for the right job.", emails about being token responsible.

All of it is true. None of it is quite the point.

Here's what people miss: tokens are still cheap. One request costs almost nothing. That's exactly the trap. A tiny cost, times a volume nobody is watching, is how a quick chat and a six-hour agent run quietly end up costing the same. You don't notice - until the multiplier jumps and the bill makes it obvious.

Usage-based billing didn't create that waste. It just made the price finally tell the truth about the work. So the problem was never the price. It was that we couldn't see it - and now we can. What that's worth doing something about, concretely:

Measure before you ration. Make "why this model, for this task" answerable with a number, not a shrug. You can't right-size what you've never sized.

Forecast it like any other cost. Token spend is a line item now - treat it like cloud. Model it, attribute it, review it monthly. The next price change shouldn't be the thing that informs you.

Make the right choice the lazy choice. People reach for the biggest model because choosing is friction. Fix that with defaults and routing, not willpower - discipline that depends on remembering doesn't scale.

So why not just leave it on Auto all the time? Because vendor Auto optimises for the vendor, not for you. It blends their reliability and capacity needs with cost - "route to whatever's healthy right now" is a load-balancing goal as much as a quality one. And it's capped where it matters: in Copilot Chat and the IDE, task-based Auto routing is currently prioritises to lower-multiplier models - it won't reach for a 27× Opus on your behalf. Great for cost. But it also means Auto isn't what you lean on when you genuinely need the frontier model. Some vendors even hide which model they picked, so you can't tell what you're getting.

The teams that come out fine won't be the ones reacting to the June invoice. They'll be the ones who treated the warning as the last free one.

Top comments (0)