Yuiko Koyanagi

Posted on Jul 2 • Edited on Jul 9

Use Fable 5 where it pays for itself

#ai #claudecode #agents #productivity

Claude Fable 5 is the model I reach for in Claude Code when a task is ambiguous, long-running, or full of tradeoffs — planning across steps, untangling a messy goal, weighing a design choice, keeping a long session pointed in the right direction. It's genuinely good at the hard parts.

Which is exactly why I stopped letting it do the easy ones. At $10 per million input tokens and $50 per million output — double Opus 4.8's $5 / $25 — spending it on boilerplate and lint fixes is like paying a principal engineer to reformat your imports. The work still needs doing. It just doesn't need that person doing it.

A Claude Code task is rarely one task. Take a one-liner like:

Add rate limiting to our API.

Sounds small. But it unfolds into work of very different weights:

deciding the algorithm — token bucket, sliding window, fixed window
choosing where the counter lives — in-memory, or Redis because we run multiple instances
thinking through what happens under a burst, and what a client sees on a 429
writing the middleware
wiring it into every route
writing the tests
fixing the lint and type errors
checking the edge cases — clock skew, the limiter itself failing, keys never expiring

The first three are real design decisions — the kind where a wrong call is expensive and annoying to unwind six months later. The rest is execution: careful work, but not work that rewards a frontier model over a merely-very-good one.

So I don't try to replace Fable 5. I just keep it on the decisions and hand everything else to something cheaper.

Make Fable 5 the lead, not the labor

Concretely, that means treating Fable 5 like a tech lead: it plans and delegates, and other models do the actual building.

Fable 5 = orchestrator — plans, decomposes, synthesizes. Writes no code.
Opus = deep reasoning subagent — architecture, complex debugging, review
Sonnet = mechanical work subagent — implementation, boilerplate, tests, chores
(Optionally) Codex = peer senior engineer — an independent second opinion from a different model family

Fable 5 thinks, delegates, and stitches the results together; the cheaper models handle work they're already good at. So the pricey model is only on the clock for the steps that actually reward its judgment.

The token savings were what I was after. What I didn't expect was the second effect: because Fable 5 never touches raw file contents or stack traces, its context stays clean, and it's still making sharp calls six hours into a session instead of drowning in half-finished edits.

I figured that was the whole setup — three quick steps. It wasn't.

The hard part is the subagents

On paper, it's three steps:

/model → Fable 5
Create subagents with /agents, pinned to Opus and Sonnet
Write routing rules in your CLAUDE.md — which work goes to whom

Step 1 takes ten seconds.

Steps 2 and 3 are where the actual work is. Creating a subagent is fast. Creating one that pulls its weight is not.

Three things I learned the hard way:

The description field decides who gets the work. The orchestrator picks a subagent by reading its description. "Use for reasoning-heavy tasks" is vague enough that the routing quietly drifts, and you don't notice until an answer comes back wrong.
The system prompt decides what comes back. Subagents can't see your conversation. If the prompt doesn't say what to read before acting and what to return — a conclusion, a diff, file:line findings — the orchestrator ends up cleaning up after them. Which is the one thing you were trying to avoid.
"Thinker" and "doer" is too few roles. It's a reasonable start. But real work splits into scoping, design, implementation, verification, and shipping. Collapse verification into the builder, and you have an agent reviewing its own code.

The decision that actually takes time, then, isn't picking models — it's writing a team of subagents whose roles are sharp enough that delegation just works. And that's a surprising amount of careful prose.

ccteams: install the team design in one command

This is the part I got tired of rewriting by hand, so I built a tool for it: ccteams, a package manager for Claude Code agent teams.

npm install -g ccteams
ccteams list           # see the teams
ccteams use generalist # apply one to the current project

That single command drops a full set of role-specialized subagents into .claude/ — descriptions, system prompts, tool restrictions, and a model: per agent already written — plus the orchestration rules that tell the lead session how to run them. Step 3 above (hand-writing routing rules in CLAUDE.md) ships in the box. So does the model assignment — more on that below.

Teams come stack-specific out of the box:

Team	What it's for
`generalist`	Stack-agnostic feature team: scope → design → build → QA → ship
`next-ts`	Next.js (App Router) + TypeScript + Tailwind
`go-api`	Go HTTP API backends
`python-fastapi`	FastAPI + Pydantic v2
`rails` / `django`	Rails / Django + DRF
`debug`	Reproduce → root-cause → minimal fix → regression test
`research`	Technical research that writes no code

Prefer staying inside Claude Code? There's a plugin:

/plugin marketplace add toffyui/ccteams
/plugin install ccteams@ccteams

Then /ccteams:choose-team something for backend API work picks and applies the right team from a natural-language description.

The models are already assigned

Here's the part I only got right recently: every bundled agent already ships with a model: in its frontmatter, assigned by how much reasoning the role needs. You don't add anything. ccteams use generalist lands this:

Agent	Role	Model (preset)
(main session)	Orchestrator	Fable 5 — you pick this with `/model`
`scope-planner`	Cut scope, pin down requirements	`opus`
`architect`	Design decisions	`opus`
`builder`	Implementation, boilerplate, tests	`sonnet`
`qa-reviewer`	Verification, edge-case hunting	`opus`
`shipper`	Commits, CI, chores	`sonnet`

Open any agent file and the line is right there:

---
name: architect
description: Technical design specialist. ...
tools: Read, Glob, Grep, WebSearch, WebFetch
model: opus        # ← already set by ccteams
---

The one model ccteams doesn't touch is the lead session's — that's yours to pick with /model (Fable 5). Everything below it is pinned.

The split is similar to the usual “thinker / doer” setup, but the extra roles matter. Once the work is separated into scoping, design, building, review, and shipping, delegation gets much more precise — and the model choice becomes part of the role instead of a judgment the orchestrator has to remake every time.

The important detail is that I do not assign Fable 5 to roles like architect or qa-reviewer.

That can look backwards at first. If planning and review are the important parts, why not put the best model there too? Because by the time a task reaches those agents, Fable 5 has already done the hardest part: turning an ambiguous user request into a clear, bounded problem that another model can solve.

That is where Fable 5 earns its keep — ambiguous, long-running work full of tradeoffs. Once the question is clearly framed, Opus is usually more than enough to propose a design or review a fix. And those agents still do not get the final say. They propose; Fable 5 decides what to accept.

So Fable 5 keeps control of the core judgment without spending tokens on every intermediate role. The setup does require good orchestration rules and agent prompts, but that is exactly what ccteams installs for you.

Adding Codex as the peer engineer

OpenAI's official Codex plugin lets you call Codex from inside Claude Code (install the Codex CLI on your machine first):

/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex
/reload-plugins
/codex:setup

Then add one paragraph to your project's CLAUDE.md (below the @.claude/active-team.md import that ccteams creates):

## Peer engineer
Codex (/codex:rescue --background) is a peer senior engineer with a different
perspective. For high-stakes decisions, task the architect agent and Codex on
the same problem in parallel and synthesize — without showing either the
other's answer.

The point is to use Codex as a peer rather than a rubber-stamp reviewer. On a high-stakes call, I give the same problem to the Opus-pinned architect and to Codex at the same time, neither seeing the other's answer, and let Fable 5 reconcile the two. You get the most out of this when the two models come from different lineages — Opus and Codex were trained differently enough that they tend to miss different things, so one often catches what the other glossed over.

Back to "add rate limiting"

Remember that one-liner from the top? Here's the whole prompt I use:

Add rate limiting to our API.
Context: Express + Redis, deployed as 3 instances behind a load balancer.

I just state the goal. Claude Code routes the rest on its own — it matches the task against each agent's description and delegates automatically, so Fable 5 sends the algorithm-and-storage question to architect, the implementation to builder, the edge cases to qa-reviewer, without me naming any of them.

And because each agent's model is pinned in its frontmatter, that routing decision is the model decision. The moment architect is chosen, the design call runs on Opus, where being wrong is expensive; builder's middleware and tests run on Sonnet. Fable 5 plans and synthesizes on a clean context and writes no code itself — so the priciest model in the room is billed only for the judgment that actually needed it.

That was the goal all along: not to spend less on Fable 5, but to spend it only where it pays for itself.

In short

Fable 5 is worth its price on judgment — planning, design, review — and wasteful on everything else
The hard part of this setup isn't picking models, it's designing the team that surrounds them
ccteams installs a considered team in one command — orchestration rules and a per-agent model: preset included, so the Fable-lead / Opus-reason / Sonnet-build split is live the moment you apply it
Pick your lead model with /model, restart Claude Code, and you're done — repin any agent's model: line if you want a different split

If this saves you some setup, a star on the repo is the nicest way to say so — and if the presets don't match how you'd split the work, open an issue and tell me 🙌