Jangwook Kim

Posted on May 2 • Originally published at jangwook.net

Why Anthropic Cut Off OpenClaw — The Claude Subscription Policy Shift and What It Costs You

#anthropic #claude #aicosts #openclaw

April 4th. Anthropic quietly updated their terms. The change is straightforward: Claude Pro and Max subscribers can no longer use their plan limits to power third-party agent frameworks like OpenClaw. From that date on, it's pay-as-you-go API charges.

I spotted this first in the OpenClaw community. My initial reaction was "another terms cleanup." Then I actually ran the numbers, and this turned out to be a bigger deal than it looked. I wrote a Python cost calculation script and ran it through several realistic scenarios. The short version: without any optimization, you're looking at up to 10x cost increases. With proper design, you can end up paying less than before.

This post is a numbers-first analysis. Less about whether Anthropic's decision is morally correct, more about what it means in dollars, how much the numbers shift under different optimization strategies, and what I'd do if I were in this situation today.

Why Anthropic Cut the Subscription Access

Anthropic's official position: "capacity is a resource we manage thoughtfully." Honestly, that reasoning holds up.

Here's what happens when someone runs a $200 Claude Max subscription through OpenClaw to power autonomous agents: agents aren't people. They don't take breaks. A single autonomous workflow can fire API calls dozens of times per minute, 24 hours a day. A human using Opus for a couple of hours of conversation each day is a very different usage pattern than a server calling Opus continuously. According to reporting, Anthropic's internal math showed these users consuming $1,000 to $5,000 worth of compute per month on a $200 subscription. I ran my own calculation on this, and the numbers check out.

500 sessions per day, 2,000 input + 500 output tokens each, Opus 4.7, 30 days: $337.50. That's already 1.7x over the $200 Max plan. If sessions run longer or fire more frequently, the ratio climbs fast.

OpenClaw creator Peter Steinberger pushed back immediately. He stated that switching to API rates would be "prohibitively expensive" and raised public concerns about Anthropic internalizing OpenClaw features before cutting external access. Some developers started openly discussing switching to DeepSeek or other models.

I'll be direct: I find it hard to call Anthropic's decision completely wrong. Flat-rate subscriptions applied to AI agents without limit are structurally unsustainable. That's accurate. What I do think was handled poorly: not giving existing users adequate transition time, and launching Fast Mode — a dramatically more expensive option — in the same announcement window. The combination of "we're protecting capacity" and "here's a 6x premium tier" happening simultaneously is a difficult look to defend.

Fast Mode: The 6x Multiplier and Its Hidden Mechanics

Fast Mode deserves its own section because it's more complicated than a simple speed upgrade.

Standard Claude Opus 4.7: $5 input, $25 output per million tokens. Fast Mode: $30 input, $150 output — 6x. Above 200K input tokens, the input price climbs to $60. Anthropic calls this "premium pricing for a premium speed feature." The math doesn't need much editorializing.

The part that's more dangerous in practice: retroactive context repricing. If you enable Fast Mode mid-conversation, the entire accumulated context is repriced at Fast Mode rates. Enable it after a 10,000-token exchange and you're paying Fast Mode prices on the whole conversation, not just the next message. This creates an unpredictable billing pattern that's genuinely hard to forecast — and one analyst described it as "hidden mechanics that turn forecast accuracy into fantasy," which is a fair characterization.

Also important: Fast Mode is not available on AWS Bedrock, Google Vertex AI, or Microsoft Azure Foundry. It's exclusive to the Anthropic direct API. Organizations running Claude through cloud platform integrations can't use this feature at all.

My take: Fast Mode is a legitimate tool for interactive UIs where latency is directly tied to user experience metrics and business KPIs. For automated agents, running it by default is financially irresponsible. It belongs in a circuit breaker pattern: disabled by default, enabled only when specific conditions warrant it.

Running the Numbers: Real Cost Scenarios

I verified these numbers with a Python script using official pricing from platform.claude.com. Here's what different usage levels actually cost under the new structure.

Case 1: Heavy OpenClaw User (500 sessions/day, formerly on $200 Max)

Assumptions: 2,000 input + 500 output tokens per session, 30 days

Opus 4.7 standard (no optimization):    $337.50/month
Opus 4.7 Fast Mode (6x):              $2,025.00/month
Opus 4.7 + 70% cache hit rate:          $243.00/month
Sonnet 4.6 + 70% cache hit rate:        $145.80/month
Haiku 4.5 + 70% cache hit rate:          $48.60/month
Multiple vs $200 Max plan:              1.7x to 10.1x

The worst case (Fast Mode, no caching) is a 10x increase. But drop to Sonnet 4.6 and hit 70% cache rate, and you're at $145.80 — cheaper than the old $200 Max plan. Haiku with the same optimization: $48.60.

Case 2: Mid-scale developer (50 sessions/day — blog automation, document analysis)

Sonnet 4.6, no optimization:    $20.25/month
With 80% cache hit rate:        $13.77/month
Cache + Batch API:               $6.88/month

At this scale, the subscription block actually reduces costs. If you were using Claude Max mostly for blog automation, direct API billing is significantly cheaper.

Case 3: Cost-optimized estimates by daily session volume (Sonnet 4.6, 75% cache, Batch API)

   10 sessions/day →   $1.42/month
   50 sessions/day →   $7.09/month
  100 sessions/day →  $14.18/month
  500 sessions/day →  $70.88/month
 1,000 sessions/day → $141.75/month

This table makes one thing clear: if you don't know your actual daily session volume, you need to instrument that before anything else. Guessing at cost based on a vague sense of "we have a lot of requests" doesn't work anymore.

Claude API prompt caching optimization patterns covers the implementation in detail. Hitting 70-80% cache rate is the highest-leverage change you can make right now.

Three Strategies That Actually Reduce Your Bill

Strategy 1: Model tier routing

The single most impactful change: stop sending every task to Opus 4.7. A routing layer that selects the right model for each task type can cut costs by 40-60% without meaningfully impacting output quality for most workflows.

def route_to_model(task_type: str) -> str:
    """Route requests to appropriate model tier based on complexity."""
    routing = {
        # Complex reasoning, code generation — worth the Opus price
        "code_generation":    "claude-opus-4-7",
        "complex_reasoning":  "claude-opus-4-7",
        "multi_step_agent":   "claude-opus-4-7",
        # Moderate complexity — Sonnet is the right call
        "summarization":      "claude-sonnet-4-6",
        "translation":        "claude-sonnet-4-6",
        "draft_review":       "claude-sonnet-4-6",
        # Simple classification and routing — Haiku handles this fine
        "classification":     "claude-haiku-4-5-20251001",
        "routing":            "claude-haiku-4-5-20251001",
        "simple_extraction":  "claude-haiku-4-5-20251001",
    }
    return routing.get(task_type, "claude-sonnet-4-6")

Strategy 2: Prompt caching on repeated context

System prompts, RAG documents, and other repetitive context elements qualify for cache markers that reduce input token costs by 90%. The 5-minute TTL means this works even in high-frequency workflows.

import anthropic
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code review specialist...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": user_request}]
)

# Track cache hit rate from usage data
usage = response.usage
hit_rate = usage.cache_read_input_tokens / max(
    usage.cache_creation_input_tokens + usage.cache_read_input_tokens, 1
)
print(f"Cache hit rate: {hit_rate:.1%}")

As I covered in AI agent cost reality analysis, agent costs crossing human labor costs most often happens when teams run Opus everywhere with no caching. Monitoring cache hit rate is cost visibility step one.

Strategy 3: Batch API for non-realtime workloads

Batch API gives you a flat 50% discount on all costs. The trade-off is results arrive within 24 hours rather than immediately. For document analysis, translations, overnight reports, and classification runs, this is straightforward to adopt.

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"task-{i}",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": doc}]
            }
        }
        for i, doc in enumerate(documents_to_process)
    ]
)

All three strategies combined — model routing, 70-80% cache rate, batch processing for non-realtime tasks — bring most automated agent workloads under the cost of the old $200 flat plan.

Enterprise Billing Got More Complicated

The enterprise account structure changed at the same time. Previously, enterprise contracts allowed some flexibility on usage caps. As of 2026, "you cannot disable billing for usage" is the official policy. Base fee of $20 per seat, all usage charged at standard API rates on top.

What this means practically: large teams can no longer operate with predictable fixed monthly Claude costs without real-time usage monitoring. Anyone running agents without per-team, per-project usage visibility is flying blind. There are documented cases of agents entering error retry loops that burned through millions of tokens before anyone noticed.

OpenTelemetry-based LLM pipeline observability has moved from nice-to-have to necessary. Token consumption, per-model costs, and cache hit rates need to be instrumented, not guessed at from the console after the month ends.

Is It Time to Switch to a Competing Model?

This is the most practical question, and I'll address it directly.

DeepSeek V4-Flash is priced at roughly one-tenth or less of Claude Sonnet 4.6 on input tokens. MIT licensed, open source. Shows competitive coding benchmark performance. The downsides: data sovereignty concerns given the Chinese company origin, SLA levels not comparable to Anthropic, and potentially less consistent latency.

GPT-5.5 is priced comparably to Claude Opus 4.7 but underperforms on SWE-bench coding benchmarks. There's no strong economic case for switching.

Gemini 3.1 Flash is cost-competitive with solid Google infrastructure reliability. Korean/Japanese language processing has improved but lags Opus 4.7 on complex code reasoning.

My recommendation: keep core agents on Claude API, use Haiku or DeepSeek for high-volume simple tasks. I've already run detailed scenario comparisons in the LLM API pricing comparison post — worth reading before making any switching decisions.

Is Managed Agents Worth Considering?

Anthropic's Managed Agents launched in public beta around the same time, at $0.08/session-hour plus token costs. Let me run those numbers.

Average session of 20 minutes: $0.027 per session. 500 sessions/day × 30 days = 15,000 sessions = $400/month in session fees alone, then token charges on top. In exchange, Anthropic handles state management, tool calls, error recovery, and session restarts.

The hidden cost of running your own agent infrastructure — DevOps time, monitoring tooling, error alerting, session state storage — when converted to hourly rates, can make $0.08/hour look reasonable. If your team spends 20-30 hours per month maintaining agent infrastructure, Managed Agents might be cheaper in total cost of ownership.

The limitation is code-level control. You can't replicate OpenClaw's flexibility. I see Managed Agents as the right fit for teams validating agent workflows quickly, not for teams already running complex custom agents. A hybrid approach — Managed Agents for standardized tasks, direct API for business-critical custom agents — is probably where most teams will land.

What to Do Now

Anthropic's decision isn't straightforwardly wrong. Applying a flat subscription to unlimited AI agent usage is structurally unsustainable. That's a fair position.

The handling left a lot to be desired. Insufficient transition time for existing users, and launching the Fast Mode premium tier simultaneously with the subscription restriction is a combination that erodes trust regardless of the underlying economics.

Immediate (this week):

If you're running third-party agent frameworks on Claude Max, switch to direct API billing now
Audit your codebase for any Fast Mode defaults and turn them off
Check actual monthly token consumption in the Anthropic console Usage tab

Short-term (this month):

Add a model routing layer: separate Opus/Sonnet/Haiku by task complexity
Add cache_control markers to system prompts and repeated context
Set up cache hit rate logging, targeting 70%+

Medium-term (next quarter):

Migrate non-realtime batch work to the Batch API
Test Managed Agents beta against your own infrastructure costs
Run competing models (DeepSeek, Gemini Flash) against your actual workloads for simple tasks

The policy shift is disruptive. But it also forces the question that should have been asked earlier: why is every agent task going to Opus, and is that actually the right call? That's not a bad question to answer. The developers who end up in the best position here are the ones who treat this as an infrastructure audit moment rather than a grievance.