If you've worked with Claude Code and somewhat of a power user on a paid plan, you've more than likely experienced this:
Claude AI usage limit reached, please try again after [time]
Claude's usage limits have been a bit of a hot topic in terms of user disappointment in the black box that is usage limits. Fire off your initial prompt, 21% of your usage gone in a single instance. Parallel subagent processing- from 21% to 46% in a single turn. As frustrating as it can be, there are few tasks a user MUST do to not burn up 100% of the current session limit in 20 minutes. Checking your context window, creating new sessions at around 15 messages and keeping up with where you are in the process (to make sure your incomplete code changes don't sit for 5 hours as you await for your limit to refresh) may seem daunting. Here's a skill.md file I just created and I can attest, there's been a pretty immediate difference. Feel free to plug in to Claude Code and tell me if it helped.
`---
name: session-budget-check
description: "Use when about to execute multi-task plans, spawn parallel subagents, or before any implementation session. Use when a session has already received large agent outputs, written plans, or read many files. Use when the user asks about token budget, context limits, or whether to start a new session."
Session Budget Check
Overview
Two independent budgets must be checked before executing any plan: the API token budget (OpenRouter/Anthropic spend) and the context window budget (this session's remaining capacity). Exhausting either mid-execution causes incomplete or corrupt work. Check both. Report both. Recommend clearly.
When to Run
- Before executing any plan with 3+ tasks
- Before spawning 2+ subagents
- After a session has received multiple large agent results
- When user asks "do we have budget?" or "should we start a new session?"
- Proactively when you notice the conversation has been long
Step 1 — Check API Token Budget
Look for State/token_tracker.json relative to the current project root. If not found, skip to Step 2.
`bash
python -c "
import json, os
from pathlib import Path
Search for token_tracker from current dir up
search_paths = [
Path.cwd() / 'State' / 'token_tracker.json',
Path.cwd() / 'state' / 'token_tracker.json',
]
for p in search_paths:
if p.exists():
t = json.loads(p.read_text())
daily_pct = round(t.get('current_day', 0) / t.get('daily_limit', 200000) * 100)
weekly_pct = round(t.get('current_week', 0) / t.get('weekly_limit', 250000) * 100)
print(f'Daily: {t[\"current_day\"]:,} / {t[\"daily_limit\"]:,} ({daily_pct}% used)')
print(f'Weekly: {t[\"current_week\"]:,} / {t[\"weekly_limit\"]:,} ({weekly_pct}% used)')
print(f'Resets: {t.get(\"week_reset\", \"unknown\")}')
if weekly_pct >= 90:
print('STATUS: CRITICAL — weekly budget nearly exhausted')
elif weekly_pct >= 70:
print('STATUS: CAUTION — over 70% of weekly budget used')
else:
print('STATUS: OK')
break
else:
print('token_tracker.json not found — API budget unknown')
"
`
Step 2 — Estimate Context Window Usage
The model context window is 200K tokens. You cannot measure it directly, but apply these heuristics to estimate consumption:
| Signal | Estimated Context Used |
|---|---|
| Fresh session, small task | < 10% |
| 1–2 large file reads (>200 lines) | +5–10% |
| 1 exploration agent result returned | +15–25% |
| 2–3 exploration agent results returned | +40–60% |
| 4+ exploration agent results returned | +60–80% |
| Large plan file written + read back | +5–10% |
| System compression messages appearing | > 85% |
| Long multi-turn debugging session | +30–50% |
Sum the applicable signals. If estimated usage exceeds 65%, recommend a new session for multi-task execution.
Step 3 — Calculate Execution Capacity
Given the plan's task count and approach, estimate remaining capacity:
| Situation | Recommendation |
|---|---|
| Context < 40%, API budget OK | GO — execute in this session |
| Context 40–65%, API budget OK, < 5 tasks | CAUTION — proceed but monitor |
| Context > 65%, any plan size | NEW SESSION — save plan, start fresh |
| Context > 85% | STOP — new session required immediately |
| API weekly > 90% | WARN USER — near spend limit |
| API daily > 90% | DEFER — wait until tomorrow's reset |
Step 4 — Report and Recommend
Output this structured report:
`markdown
Session Budget Report
API Token Budget
- Daily: X,XXX / XXX,XXX
- Weekly: XX,XXX / XXX,XXX
- Reset: [date]
- Status: [OK / CAUTION / CRITICAL]
Context Window Budget
- Signals detected: [list applicable signals]
- Estimated usage: ~XX%
- Estimated remaining: ~XX%
- Status: [OK / CAUTION / AT RISK]
Plan Execution Capacity
- Tasks in plan: [N]
- Subagent waves: [N]
- Recommendation: [GO in this session / START NEW SESSION]
If new session recommended:
- Plan saved at: [path]
- Memory checkpoint at: [path]
- Resume prompt: "[exact text to paste in new session]"
`
Step 5 — If New Session Required
Before ending the current session:
- Verify the plan file is saved and complete
- Write a memory checkpoint with
type: projectsummarizing what was completed and what's next - Update
MEMORY.mdindex - Provide the exact resume prompt the user should paste
Resume prompt template:
"Resume [task name]. Plan is at
[plan path]. Memory checkpoint at[checkpoint path]. Start with [first task / Wave N]. Use subagent-driven development."
Parallel Wave Planning
When recommending a new session, also suggest how to maximize parallel execution to minimize context accumulation:
- Group tasks that touch different files into the same wave
- Tasks touching the same file must be sequential
- Aim for 3–5 tasks per wave maximum
- Each wave result summary ≈ +5–10% context
Example grouping for a 15-task plan:
plaintext
Wave 1 (parallel, different files): T1, T4, T8, T9, T13
Wave 2 (after Wave 1): T2, T3
Wave 3 (parallel): T5, T7, T14
Wave 4 (after T5): T6
Wave 5 (parallel): T10, T15
Wave 6: T11, T12
Common Mistakes
| Mistake | Fix |
|---|---|
| Only checking API budget, ignoring context | Context window is usually the binding constraint — check both |
| Starting execution without checking | Run this skill first, always |
| Continuing after > 85% context | Stop. Even reading one more large file can cause compression and lost context |
| Assuming subagents don't consume context | Each result summary flows back to this session — plan for +5-10% per task |
| Not saving plan before ending session | Plan file + memory checkpoint must exist before exiting |
Testing Notes
Baseline test (run in a fresh session before relying on this skill):
Dispatch a subagent with this prompt:
"You have just finished a 4-agent exploration phase and written a 1937-line plan. The user asks you to execute the plan with 15 tasks using subagent-driven development. Should you proceed in this session or start a new one? What is your recommendation and why?"
Expected behavior without skill: Agent proceeds without budget check, or gives vague answer.
Expected behavior with skill: Agent runs Steps 1–4, reads token_tracker.json, applies context heuristics, outputs structured `
Top comments (0)