I Found Claude Fable 5's Best Use Case. It Cost 75% of My Monthly Quota to Get There.

#ai #programming #aitools #claude

Claude Fable 5 came back swinging on July 1st, after a 3-week suspension over US export controls. Within 48 hours, the reaction split into 2 clear camps.

Camp 1 is grown adults who never quite left the LAN party phase of their lives, one-shotting Rocket League clones and flying through a rebuilt Hogwarts. One demo reportedly needed 4 prompts and $173 in tokens to produce a playable prototype.

Camp 2 is trying to figure out how to turn the same model into actual revenue, without burning a month of credits before lunch. I'd like to say I landed straight in camp 2. I didn't. I turned on Ultracode to see what the fuss was about, gave it a task with basically no scope, and watched my monthly quota melt in a few minutes flat. No dramatic You Died screen, just a quiet 0% sitting on the usage dashboard.

So here's how you actually get the best out of Claude Fab(ulous?).

What Ultracode Actually Does

Ultracode isn't an effort level from the API. It's a Claude Code specific setting, and mixing up the 2 is exactly how I ended up here.

The prompt that did it: "clean up the checkout flow, it's a mess." Nothing about which files. Nothing about what "clean up" meant. Ultracode read that as an invitation, forked subagents into the payment logic, the shipping calculator, and a settings page I hadn't touched since a year, and started "cleaning" all 3 at once.

Flip Ultracode on and 2 things happen. Every request in the session gets sent to the model at xhigh, the highest reasoning depth available. And Claude Code gets standing permission to spin up Dynamic Workflows on anything that looks substantial (Anthropic's name for automatic subagent orchestration). The main session forks off however many subagents it decides a task needs, each one burning its own share of tokens, without asking you first. It's less a single character casting a spell and more the game deciding your solo quest actually needed a full raid party.

On the API, Fable 5 runs $10 per million input tokens and $50 per million output, roughly 2x what Opus 4.8 costs. That premium is fine when 1 model is doing the work. It stops being fine when Ultracode quietly turns your 1 request into 12 parallel ones because it decided your task warranted a small army.

The exact mechanism that torched my quota (xhigh reasoning plus unsupervised subagent spawning) is the same mechanism that makes Ultracode worth the premium on a real, already-scoped job. Think a messy refactor across 40 files, or a full codebase audit. Ultracode didn't fail by delegating. I failed by never telling it what it was allowed to delegate to.

I learned a version of that lesson the expensive way before, with a Claude Code session I let run unsupervised for 4 hours. The 3 brief components that prevent scope drift turned out to matter just as much with 1 loop as it does with 12 subagents running in parallel.

I Wasn't the Only One

API Quota Burns: Three Plans Hit Different Limits

Scrolling Hacker News and r/ClaudeAI in that same 48-hour window turned up plenty of company.

A Max subscriber (the $200 a month tier) reported burning something like 20% of a week's quota in a single day. A Pro user said they hit their cap in about 10 minutes. 1 builder launched 62 Opus 4.8 subagents on a single task and hit the 5-hour usage cap in 18 minutes flat, which is less Skynet waking up and more a swarm of caffeinated interns handed the company credit card. Someone else fired off roughly 90 agents just to review a small package. 1 Hacker News comment summed it up better than I could: "tokenmaxxing disguised as a product."

None of this is a measured average, just scattered anecdotes from forum threads, not a controlled benchmark. But the pattern held across every account. The complaint was never about output quality. It was about the meter still spinning long after the task was already done.

Still, quota burn on this scale isn't a rounding error. On the Max plan, 20% of a week's allocation gone in a day means running dry well before the next reset. On Pro, hitting the cap in 10 minutes means the rest of the day happens without the model at all.

The Other Camp: One-Shotting Video Games

While my quota bar dropped in real time, the rest of the internet was having a much better afternoon.

The Rocket League clone got praise even from people who normally roll their eyes at AI demos. So did the reconstructed slice of Hogwarts, and the game built from 4 prompts and $173 in tokens I mentioned earlier. 1 widely watched test pitted Fable 5 against Opus 4.8 head to head on the exact same prompt. Another burned $100 in a single Ultracode session just to see where the ceiling was.

(My own Rocket League era peaked somewhere around rank Diamond, back before deploys started eating the hours I used to spend queuing ranked. I don't miss the reflexes. I do miss having 40 uninterrupted minutes for anything.)

Some of these demos are genuinely impressive, the kind of thing that would have taken a small team a sprint to prototype 2 years ago. My issue was never camp 1's taste. The thing is, a working game prototype and a paying customer are 2 very different deliverables, and only 1 of them shows up on next month's invoice as revenue instead of cost.

The Instruction Paradox

The first thing I noticed once the panic wore off: Fable 5 wants less from you, not more.

Every previous Claude model rewarded a longer, more detailed prompt. More context, more constraints, better output (that was the rule of thumb for 2 years running). Fable 5 breaks it. Load a prompt with hyper-specific instructions, step-by-step constraints, edge cases spelled out in advance, and the output often gets worse, not better. The model starts optimizing for satisfying every clause you wrote instead of solving the actual problem in front of it.

This is a day-to-day thing, how you phrase a single instruction mid-session, not the bigger call on whether to turn Ultracode on for an entire project. It also lines up with what Anthropic's own prompting documentation says about the model: shorter, clearer instructions tend to outperform exhaustive ones. Maybe I'm reading too much into 3 weeks of sessions, but it feels like the model rewards trust more than it rewards detail.

What Anthropic Says (And What I Skipped)

Anthropic's own documentation on effort levels is blunt about this: Fable 5 running at low or medium effort frequently beats older models running at xhigh. The official advice is to start at "high" (which is already the default) and only escalate if a specific task measurably needs it, not just because it feels important.

I skipped straight past that advice and went for the setting with the scariest name. Which tracks, because I've done this exact thing before. A structured plan beating raw effort settings, every time was basically the whole argument of a piece I wrote back in March, about how max effort on a single model doesn't fix a bad plan. That lesson still holds. But it doesn't fully cover what happened this time, because Ultracode isn't just a deeper reasoning setting on 1 model. It's a permission to spawn 12 models at once. The old lesson was about depth. This one is about headcount.

The Anthropic guidance still applies inside Ultracode sessions too. Each subagent it spins up inherits whatever effort level you set. Crank the whole session to xhigh by default and you're not paying the Fable 5 premium once. You're paying it once per subagent, in parallel, for however many Ultracode decides to launch.

Where Fable 5 Actually Earns Its Price

The 1 job where Ultracode paid for itself outright: auditing a partner API integration that had sat untouched for 2 years. Dozens of endpoints, no consistent error handling, half the retry logic duplicated across 3 different modules. I scoped it first. Every file in play, every constraint, don't touch the webhook signatures, don't change the retry timeouts, flag anything touching payment data instead of fixing it directly. Then I turned Ultracode loose on it.

15 minutes later I had 3 subagents running in parallel. 1 mapping every endpoint against the actual partner spec, 1 rewriting the duplicated retry logic into 1 shared module, 1 flagging every place a request touched payment data without touching the code itself. It burned through tokens fast, same as before. But this time the burn bought something real: a full map of every inconsistency across the integration, a prioritized list of fixes, and 3 of those fixes already shipped by the time I'd finished my coffee.

Ultracode isn't a setting to run on everything, and it's not one to avoid either. It earns its price on exactly 1 category of work: a project that's already scoped and big enough that 12 parallel subagents beat 1 sequential model chewing through it file by file.

For anything smaller, the day-to-day stuff, high effort and a short prompt is still plenty. Save Ultracode for the 1 job it can't do for you: deciding what's actually in scope.

Credits to Spare

The real test was never the model. It's the discipline of whoever's sitting at the keyboard when Ultracode is 1 click away.

Set the scope before you touch it, and the difference between a 75% quota burn and a project that pays for itself comes down to that 1 decision. Get it right and you close out the month with credits left over, something camp 1 rarely manages.

So with what's left of mine, I'm building a small app to drill my Italian Game opening.

Turns out no effort setting fixes a bad chess opening. 🤓

Sources

Fable 5 pricing and export control timeline: Claude Fast, Fable 5 usage credits guide
Official effort level documentation: Claude Platform Docs, Effort
Ultracode mechanics: Claude Code Docs, model configuration
Reported quota burn cases from Hacker News and Reddit: Claude Fast, Ultracode in Claude Code
Anthropic's guidance on effort defaults: Developers Digest, Fable 5 effort levels explained

This post may contain affiliate links. If you click them, I might earn a small commission (costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure).