DEV Community

Eric Cheng
Eric Cheng

Posted on

Claude Code Sub Agents - Burn Out Your Tokens

How parallel helpers can speed up your refactor — and drain your usage allowance twice as fast


✨ Introduction

Claude Code v1.0.60 introduced the new /agents command. With it you can spin up sub‑agents — lightweight Claude workers that run under one top‑level session. Each gets its own context window, system prompt, and tool permissions. In other words, you now have a mini‑fleet of AIs that can tackle parts of a job in parallel. (docs.anthropic.com)

I’m knee‑deep in a large‑scale refactor that touches a few hundred files, so it felt like the perfect test‑drive for sub‑agents. Here’s what I learned.

🧐 Sub Agents at a Glance

  • 🔹 What they are – Pre‑configured AI personalities you delegate work to. Think of them as coding experts with a very focused remit. (docs.anthropic.com)
  • 🔹 Separate context history – Each sub‑agent maintains its own conversation history. That context history never pollutes the parent or its siblings.
  • 🔹 True concurrency – Claude may schedule sub‑agent jobs at once. If your prompts are I/O‑light, they finish in parallel and feel much faster than sequential runs.
  • 🔹 Same token pool – All prompts and completions still count against your plan’s 5‑hour rolling limit. More speed therefore ⇒ more tokens burned per wall‑clock minute.

🎯 Why Use Sub Agents? Key Benefits

🎁 Benefit 💡 Why it matters
Clean context separation No accidental "cross‑talk" between tasks. Each agent sees only what it needs, so your refactor instructions don’t collide with your test‑generation instructions.
Longer effective history Because contexts don’t mingle, every agent can use the full 200k‑token window (Claude 3.5) for its thread.
Caching between runs The parent session can hold shared project docs (architecture.md, coding‑style.md). Each sub‑agent can reference that shared memory without re‑uploading every time.
Failure isolation If one agent hits a hallucination or error, others keep going. You just retry the failed shard.

🛠️ Real‑World Case — Batch‑Updating Hundreds of Files

For this large refactor, I loop through a list of more than 300 files. My prompt logic update a finished list, so when I reach the usage limit and the session breaks, I can rerun the same prompt file and it resumes with the first unfinished file. I now restart the refactor roughly every five hours, after the limit resets.

⏳ Previous Attempt 1 — One Huge Session

  • 🐌 Looping through files in one conversation quickly ballooned the context. After ~40 files Claude hit the 200k‑token window and terminated.

🔄 Previous Attempt 2 — Powershell Outer Loop

foreach ($file in $filesToProcess) {
    # Replace the {file-name} placeholder with the actual file name
    $customizedPrompt = $promptTemplate -replace '\{file-name\}', $file.FullName
    # Call claude-code with the customized prompt and file content
    claude -p $customizedPrompt --dangerously-skip-permissions
}

Enter fullscreen mode Exit fullscreen mode
  • ⚙️ Each invocation got a fresh context, so context history per call were minimal.
  • ⚠️ Downside: every call had to resend project structure, style guide, and helper functions—zero cache, lots of latency.

⚡ New Approach — One Parent, Many Sub‑Agents

  • 🚀 Create a single high‑level session that loops over all files and delegates each one to a sub‑agent.
  • 📂 Isolated histories: each file runs in its own sub‑agent session, so histories never mix.
  • Advantage: tasks stay isolated, sessions don’t collide, and you avoid the context‑length limit.
  • 🤯 Unexpected discovery: without me asking, Claude Code spun up five sub‑agent sessions in parallel. That sped up processing—but also burned tokens much faster.

  • ⏱️ On the Pro plan I hit the usage limit in about 15 minutes. Sequential processing took roughly 30 minutes.

  • 💸 Even with the $100 Max plan (5× tokens), I’d still exhaust the window in around 1 hour 15 minutes.

💸 Token Economics: Hidden Costs of Parallelism

  • ⚖️ Parallelism ≠ free – The per‑file token cost is unchanged, but you pay for each session simultaneously. Your quota drains proportionally to concurrency.
  • 📊 Plan sizing – On the $100 Max plan (≈ 5× Pro tokens) the same job would still exhaust the window in about 75 min.

🏁 Conclusion

Sub‑agents are a genuine force multiplier. For large‑scale, slice‑able tasks they feel like hiring a small AI team for the price of one. Just remember: speed kills (your quota).

If you’re a heavy user, set up guardrails before unleashing a parallel swarm — or be ready to smash that Upgrade button.

Happy token‑burning!🔥🔥🔥🔥🔥🔥🔥🔥

Top comments (0)