Laurent DeSegur

Posted on Apr 9 • Originally published at oldeucryptoboi.com

Claude Code Is Burning Through Your Quota. Here's What's Actually Happening and How to Fix It.

#claudecode #anthropic #devtools #costoptimization

Peak-hour throttling, shared subscription pools, a March promotion rollback, and a separate wave of "this feels broken" reports Anthropic says it's investigating. A breakdown of what's confirmed, what's not, and the highest-value tactics to stretch your usage.

If you've been using Claude Code heavily in the last few weeks and feel like your quota is evaporating faster than it used to, you're not imagining it. But you're probably conflating at least two separate things — and possibly three.

I dug through Anthropic's docs, help center, official posts, GitHub issues, Reddit threads, and recent coverage. The clearest picture as of April 8, 2026: Claude and Claude Code usage is constrained by a mix of normal token economics, shared subscription limits, deliberate peak-hour throttling, and a separate wave of complaints about abnormally fast quota drain that Anthropic has said it is investigating.

Here's what's confirmed, what's not, and what you can actually do about it.

What is confirmed right now

Usage limits are shared across all Claude surfaces. Claude.ai, Claude Code, and Claude Desktop all count toward the same pool. For paid plans, the key meter is your five-hour session limit, plus weekly limits for some models. Anthropic's help center explicitly says all those surfaces share the same quota.

Peak-hour throttling is real and intentional. Anthropic officially posted that during weekday peak hours, your five-hour session drains faster than before, while weekly limits stay the same. The official peak window is 5 AM to 11 AM PT (8 AM to 2 PM ET). Their own post says token-intensive background jobs should be shifted off-peak to stretch session limits, and estimates about 7% of users would newly hit session limits because of this change.

The March promotion ended. From March 13 through March 28, 2026, Anthropic ran a temporary promotion that doubled five-hour usage outside peak hours on weekdays. That promotion has ended. Anyone comparing early or mid-March behavior to late March or April behavior may be misreading a promotion rollback as a sudden regression. It's not a bug — it's the baseline returning to normal.

Anthropic acknowledged abnormal Claude Code drain. Separately from the peak-hour policy, Anthropic acknowledged that people were hitting Claude Code usage limits "way faster than expected" and said it was actively investigating. That acknowledgement came after many users reported unusually steep drain beyond what the documented peak-hour policy would explain.

What users are complaining about

Recent complaints are unusually consistent. Public GitHub issues and Reddit threads report:

Single prompts consuming 3% to 7% of a session
Five-hour windows being depleted in 20 minutes to 2 hours
Usage meters jumping while idle
Mismatches between the web usage meter and CLI behavior

Those are user reports, not all independently verified by Anthropic, but they are widespread and recent.

The precise takeaway: some faster drain is intentional during peak hours. Some additional "this feels broken" behavior has been widely reported and partly acknowledged as under investigation. Treat those as two separate phenomena.

The 10 most reliable ways to avoid running out

Ranked by impact, based on what Anthropic's own documentation and current policy directly support.

1. Move heavy Claude Code work outside 8 AM–2 PM ET on weekdays

This is the single most reliable subscription-saving tactic right now because it directly matches Anthropic's current peak-hour policy. Large refactors, repo-wide scans, long planning sessions, background jobs — do them before 8 AM ET, after 2 PM ET, or on weekends.

If you're on the US East Coast, your morning coding session is the most expensive time to use Claude Code. Shift heavy work to afternoons or evenings.

2. Use Sonnet as your default, reserve Opus for the hardest steps only

Anthropic's Claude Code docs explicitly say Sonnet handles most coding tasks well and costs less than Opus. Switch to Opus only for architecture decisions, complex debugging, or multi-step reasoning that Sonnet can't handle.

In Claude Code, use /model to switch mid-session. For simple subagent work, Anthropic recommends configuring Haiku as the subagent model.

3. Lower or disable extended thinking unless the task truly needs it

Extended thinking is on by default. Thinking tokens are billed as output tokens. The default budget can be tens of thousands of tokens per request depending on the model.

Anthropic's own cost guidance suggests:

Use /effort to lower reasoning effort
Disable thinking in /config
Set MAX_THINKING_TOKENS=8000 for cheaper runs

This is one of the highest-leverage cost controls available. Most routine coding tasks don't need deep reasoning chains.

4. Reset context aggressively between unrelated tasks

Token costs scale with context size. Anthropic recommends /clear between unrelated work. Their docs also suggest:

/rename before clearing so you can later /resume the session
/compact with custom preservation instructions when you want a smaller summary instead of a full history

A session that has accumulated 50,000 tokens of context from a previous task is spending those tokens on every subsequent API call — even if the new task has nothing to do with the old one.

5. Make prompts narrower, earlier

Anthropic's docs are explicit: vague prompts like "improve this codebase" trigger broad scanning, while targeted requests like "add input validation to the login function in auth.ts" reduce file reads and token spend.

In practice, this is a direct token-saving trick because it reduces search breadth, tool calls, and follow-up correction loops. The agent doesn't need to explore if you tell it where to look.

6. Keep CLAUDE.md short and move specialized instructions into skills

CLAUDE.md is loaded into context at session start. Anthropic recommends keeping it under 200 lines. Workflow-specific material should move into skills because skills load on demand.

If your CLAUDE.md is 500 lines of coding conventions, deployment procedures, and project context, you're paying for all of that on every single API call — even when you're just asking Claude to fix a typo.

7. Offload verbose data before Claude sees it

Anthropic recommends hooks and skills for preprocessing. Their example: filtering a huge test or log output down to just error lines before Claude reads it. This can cut context from tens of thousands of tokens to hundreds.

For typed languages, they also recommend language-server-based code intelligence plugins. "Go to definition" is cheaper than grep plus opening multiple candidate files.

8. Use subagents carefully, and avoid agent teams when credit is tight

Subagents are useful because only the summary comes back to the main conversation. But agent teams are much more expensive. Anthropic's docs say agent teams create separate Claude instances with separate contexts and can use about 7x more tokens than standard sessions when teammates run in plan mode.

Good for autonomy. Bad for budget.

9. Use plan mode before implementation on expensive tasks

Anthropic recommends plan mode for complex work so Claude explores the codebase and proposes an approach before making changes. This is a subtle cost saver: it prevents expensive wrong turns and rewrites.

They also recommend stopping bad runs early with Escape and using /rewind to back up to a previous state instead of starting over.

10. Inspect overhead directly

/stats on Pro or Max to inspect usage patterns
/cost for API billing
/context to see what's consuming space
Configure the status line for continuous visibility

MCP tool definitions are deferred by default (which helps), but /context can reveal when tools or instructions are still bloating the session.

The easiest mistakes that secretly burn credit

ANTHROPIC_API_KEY in your shell environment. If this is set, Claude Code will use that API key instead of your Pro or Max subscription — creating direct API charges instead of consuming included subscription usage. Anthropic calls this out very clearly. If your bill looks wrong, check environment variables first.

Mixing chat and coding in the same usage window. Because Claude app usage and Claude Code share the same limit pool, spending a lot of tokens in the web app before opening your terminal can make Claude Code feel "mysteriously" constrained. Your five-hour window is already partially drained before you start coding.

Leaving extra usage enabled without a cap. Anthropic's help center says extra usage switches you to standard API pricing after you hit your plan limit. You can set a monthly spending cap — or leave it unlimited. It also notes you can slightly exceed your chosen cap on the final allowed request because the system checks limits before the request and computes exact token consumption after.

The workflow that actually works

If you want the best chance of not running out, here's the workflow that matches Anthropic's own recommendations:

Start a fresh session for each distinct task
Keep the ask narrow — file path, function name, failing test, stack trace
Run Sonnet first — escalate to Opus only if Sonnet can't handle it
Keep effort low until Claude proves it needs more reasoning
Stop any bad trajectory quickly — Escape, then /rewind
Schedule heavy work off-peak — before 8 AM ET or after 2 PM ET

For big repositories, do not ask Claude to "understand the whole codebase" unless that's really the task. Give it the exact subsystem, file path, or function name. Anthropic explicitly says vague prompts cause broad scanning and higher token use.

For logs and test output, never paste raw giant blobs if you can filter first. Pre-filter to failures, errors, stack traces, changed files, and affected modules only.

For repetitive workflows, prefer reusable skills over re-explaining your conventions every session. Skills load on demand. CLAUDE.md loads on every call.

What I would not trust without caution

Claims that a specific Claude Code version causes "10x" or "100x" token inflation, or that all idle drain is a bug, are not fully confirmed in official docs. Anthropic says there is a small amount of background token usage for summarization and command processing — typically under $0.04 per session — so some idle consumption is normal. The larger idle-drain complaints remain user reports and investigation threads rather than a published root-cause analysis.

The Reddit and GitHub communities have theories about multiple overlapping causes for March's usage crisis. Only two parts are clearly confirmed: peak-hour tighter session pacing, and Anthropic's statement that some users were hitting limits faster than expected in Claude Code.

One important change if you use third-party agent tools

As of April 4, 2026, standard Claude subscriptions no longer cover third-party tools like OpenClaw. Continued use requires pay-as-you-go or usage bundles. If part of your "Claude Code credit drain" is actually coming from external agent tooling, that's now a separate cost path.

Bottom line

The highest-confidence, highest-value tactics: work off-peak, use Sonnet first, cut thinking budget, keep sessions narrow and short-lived, move specialized instructions into skills, preprocess logs, and verify you are not accidentally billing through ANTHROPIC_API_KEY.

Those are the tips most directly supported by Anthropic's own documentation and current policy. Everything else is informed speculation until Anthropic publishes the results of its investigation.

Follow me on X — I post as @oldeucryptoboi.

DEV Community