Neilos

Posted on Mar 31

CC 20x max is not enough? This is what I'm doing to fix that

#vibecoding #programming #agents #ai

There's a 200-comment Reddit thread right now of people watching their Claude Max plan vanish in minutes. One word — "Morning" — took 15% of someone's 5-hour limit. A fresh session, two messages, weekly quota wiped.

It's not just power users. Normal usage is hitting the wall.

The cap is real. But the opacity is worse — you can't see what's eating your budget, so you can't optimize around it. People are scared to use Opus, losing productivity not just when they hit the wall, but in constant anticipation of it.

The community is already finding the direction. One commenter: "best workflow is Opus high, then everything with Sonnet subagents." Right idea — but it stops at Sonnet, and it stays inside Anthropic's billing.

The pattern I've landed on: CC stays in charge, a cheap model does the work. Here's how it's structured.

ttal, logos, and MiniMax M2.7

Quick context for anyone not familiar:

ttal is an agent orchestration CLI — it manages tasks, spawns workers, runs pipelines, and routes work between agents.

logos is ttal's bash-only agent loop. Text in, text out. The model writes prose and shell commands, the sandbox executes them. No tool schemas, no JSON, no provider-specific plumbing — just a simple text convention any model can follow.

MiniMax M2.7 is a reasoning model released March 2026. $0.30/1M input, $1.20/1M output — about 10× cheaper than Sonnet. On Terminal Bench 2, the only direct benchmark with both models, it scores 57% vs Sonnet's 59%. In detection head-to-heads against Opus (Kilo Code), it found every bug and every security vulnerability — same result, fraction of the cost. The gap vs frontier shows up in architectural depth and complex multi-file reasoning, not in focused scoped tasks.

CC leads, MiniMax works

Every pipeline stage in ttal has a CC lead orchestrator. There are three: plan-review-lead, pr-review-lead, and code-lead. Each runs on Claude, holds context, makes decisions, and controls the flow. When focused work needs to happen, the lead delegates via ttal subagent run.

ttal subagent run is a CLI command — leads call it internally, but you can run it manually too:

# plan-review-lead delegates to review specialists
ttal subagent run gap-finder
ttal subagent run security-reviewer
ttal subagent run test-reviewer

# code-lead delegates focused single-file edits
ttal subagent run coder

# research via ttal ask — explore any codebase, URL, or repo
ttal ask "how does auth work?" --project backend

These run in parallel under the logos loop on M2.7. The lead picks up results, synthesizes, decides what's next. MiniMax never touches the orchestration. But the detection, the single-file edits, the exploration — all of it runs on a model that costs 10× less per token, entirely outside Claude's usage meter.

Why this works without quality loss

Most subagent work doesn't need frontier intelligence. Detection, review, single-file edits, exploration — bounded scope, clear criteria. The question isn't "is M2.7 as smart as Sonnet?" It's "is M2.7 good enough for this specific task?"

For review and detection it matches frontier quality. For single-file edits the scope is tight enough that it doesn't need to reason about the whole system. For exploration it just needs to read and report accurately. None of these require the architectural judgment and multi-file reasoning where frontier models genuinely earn their cost — and that judgment stays with the CC lead.

Why logos makes this possible

Most agent frameworks couple to the provider's tool-call API. M2.7 sometimes hallucinates tool calls — the model behaves as if it has tools it doesn't actually have. In a standard framework that's hard to recover from.

Logos handles all these edge cases. It detects hallucinated tool-call formats mid-stream, suppresses them, and injects corrective directives — the loop keeps running cleanly. And because logos uses no tool schemas, the surface area for these issues is minimal to begin with.

The other benefit: logos doesn't care which model you use. M2.7 today, whatever's cheapest next month. No rebuilding, no schema migration. Any model that can follow a simple text convention works.

What this actually changes

You stop self-censoring. Stop being scared to kick off a review because it might eat 15% of your session. The review runs on M2.7 under a CC lead that costs almost nothing to orchestrate.

The cap isn't going away. Anthropic is tightening, not loosening. A bigger plan isn't the answer — changing what the plan is used for is. CC for orchestration, decisions, and reasoning. Focused work on cheap models that don't touch your Claude budget at all.

ttal and logos are open source at github.com/tta-lab.

Top comments (2)

Daniel Yarmoluk • Apr 5

I structure knowledge graphs at .md level. I never max out anymore, I've shipped millions of lines of code. Think of repositories or skills reduced to .md levels to open the context window...evidenced with Mykola I scanned XXX databases.

Harjot Singh • Jun 1

i totally get the frustration with usage caps and the unpredictability of it all. having better transparency would definitely help users optimize their experience. on a related note, with Moonshift, you can deploy a full next.js + postgres + auth app in about 7 minutes, and you own the code on your github. if you're curious, i can set you up for a free run to check it out.