Alex Cloudstar

Posted on May 15 • Originally published at alexcloudstar.com

Claude's June 15 Pricing Split: What Indie Devs Actually Need to Do Before the Meter Starts

#ai #devtools #indiehacking #saas

The first time I noticed how much programmatic Claude usage was hiding inside my $20 Pro subscription was when I added a GitHub Action that ran claude -p on every pull request. The action did three things: summarised the diff, drafted a release note, and flagged any new env vars touched. It worked. I shipped it. I forgot about it.

Three weeks later I had pushed forty PRs and the action had run about a hundred and ten times after retries and reruns. I never saw a bill. Not because the work was free, but because Anthropic was quietly eating the cost. I was paying $20 a month and consuming what would have been, by my own back-of-envelope math, somewhere between $80 and $180 of API usage in that window. The subscription was doing the heavy lifting, and the subscription was a great deal.

That deal ends on June 15, 2026.

Anthropic announced on May 13 that starting on that date, Claude subscriptions split into two billing pools. Interactive use (claude.ai, the terminal Claude Code session you sit in front of, Cowork) stays exactly where it is. Programmatic use (Claude Agent SDK, claude -p, Claude Code GitHub Actions, every third-party harness that talks to your subscription) moves to a new monthly Agent SDK credit pool, denominated in dollars, billed at full API rates. The headline subscription prices do not move. Your $20 plan is still $20. But the bucket your scripts have been drinking from quietly stops being all-you-can-eat.

This post is the practical version of that announcement. What the new pools actually look like, the cost math nobody is publishing, who wins, who loses, and the concrete things to change in your CI and your scripts before the meter flips on.

What Actually Changes On June 15

The shape of the change is simple. Today, when you call Claude from a script, a CI job, or any tool that authenticates through your subscription, that call hits the same rate-limited subscription bucket as a chat in claude.ai. Heavy users have rate limits, light users do not, and nobody pays per token. After June 15, the same call hits a separate dollar-denominated pool that resets monthly and bills every token at standard API rates.

Anthropic's pricing page sketches the new pools roughly like this:

Plan	Subscription price	New monthly Agent SDK credit
Free	$0	none
Pro	$20	$20
Max 5x	$100	$100
Max 20x	$200	$200
Team Standard	$25 per seat	about $20 per seat
Team Premium	$125 per seat	about $100 per seat
Enterprise	custom	custom

The credit is per user. It does not pool across a team. It does not roll over. Once it is gone, your programmatic calls either stop, or fall through to pay-as-you-go API billing if you have explicitly opted into "extra usage" on the account.

The list of things that draw from the new pool is wider than most people realise. It covers Claude Agent SDK calls, every claude -p invocation, Claude Code running inside GitHub Actions, and third-party harnesses like OpenClaw, Conductor, Jean, Hermes, and Zed's ACP integration. What stays on the existing subscription rate limits is shorter: the claude.ai web, desktop, and mobile apps, the interactive Claude Code terminal session you launch by typing claude and waiting for the prompt, and Cowork.

The line is "is there a human reading the response in real time." If yes, subscription. If no, meter.

For the math that follows, the API rates are the same ones the Anthropic console shows: Sonnet 4.6 is $3 per million input tokens and $15 per million output tokens. Opus is roughly five times that. Haiku is roughly five times cheaper.

The Cost Math Nobody Is Showing You

The friendly way to read the new pool is "you get $20 of free API a month bundled with Pro." The honest way to read it is "your $20 used to subsidise way more than $20 of API, and now it does not." Both are true. Which one applies to you depends entirely on how much programmatic usage you actually do.

Three concrete examples make the difference visible.

One claude -p invocation. Suppose you run claude -p "summarise this diff" against a medium PR. The diff is around 8k tokens. The system prompt and tool definitions add another 4k. Claude writes a 600-token summary. That is 12k input and 600 output, so 0.012 * $3 + 0.0006 * $15 = $0.045. Roughly four and a half cents. Cheap. You could run that 440 times before exhausting a $20 credit.

That sounds like a lot. It is not. Watch what happens when you put it in CI.

A Claude Code GitHub Action on every PR. Same diff, but the action also calls Claude twice more: once to draft a release note, once to look for risky changes. Three calls per PR, $0.13 each PR on average once you account for tool use and a longer context window the second time around. Ten PRs a month: $1.30, basically free. Fifty PRs a month, which is normal for a small team that ships daily: $6.50, still fine. Two hundred PRs across a small org or any team that uses PR-per-commit conventions: $26. You blew through your $20 credit on PR review alone and you have not run a single claude -p of your own yet.

A background agent loop. This is where it gets ugly. A loop that runs every ten minutes, reads recent logs, and decides if anything is worth paging you about. Each iteration is roughly 6k input and 400 output, so $0.024 a call. Six calls an hour, 144 a day, around $3.40 a day, $100 a month per loop. You hit the $100 Max credit in 29 days and the $200 Max credit in 58. If the loop crashes and restarts more aggressively, or if you have two of them, or if a single iteration accidentally pulls in 60k of log context because someone's stack trace was long, the numbers move fast.

This is also where the surprise charges live. The story making the rounds in May is the developer who racked up $200.98 in API charges because a commit message contained the string "HERMES.md" and got auto-flagged as third-party tool use, billing programmatic instead of interactive. Anthropic eventually reversed it after the screenshot went viral, but the lesson stuck. The classifier is doing more work than people assume, and the line between "I am chatting with Claude" and "Claude is acting on my behalf" is fuzzier than you would hope.

The single most useful thing you can do this week is run the actual numbers for your actual usage. Not what you think you do. What you actually do.

Who Wins, Who Loses

The plain way to read June 15 is that it sorts users into two buckets that used to be the same bucket.

The winners are light scripters. If you run claude -p two or three times a week, you have never come close to your subscription rate limit, and you have never seen the meter because there was no meter. Starting June 15 you get a $20 (or $100, or $200) bucket that you will not exhaust, ever. The pricing change is, for you, a free upgrade. You get explicit limits and a transparent budget you previously did not have. If that is your usage pattern, you can mostly stop reading here.

The losers are anyone running 24/7 automation or shared CI on a single seat. Zed's blog put a hard number on the implicit subsidy that is going away. They wrote that Claude subscriptions "previously subsidised agent usage at roughly 15 to 30 times compared to API pricing." Translate that and you get the real story of June 15: programmatic users were getting a 15x to 30x discount. That discount is gone. The credit pool just makes the new rack rate look a little nicer.

The other group of losers is teams. Credits do not pool across users. If you have three engineers on a Team plan and one of them runs all the CI automation, the other two seats' $20 credits sit unused while seat one runs out on day eleven. The fix is either to refactor automation to use a dedicated API key with its own billing, or to spread the work across seats in a way that makes no engineering sense but makes accounting sense. Both are awkward.

And then there is the truly heavy use case. Ben Hylak, CTO of Raindrop.ai, called the change "either really silly, or shows how bad of a spot Anthropic is in re: GPUs." That second reading is the interesting one. If Anthropic is rationing compute by raising the price of background agent loops while keeping interactive chat unchanged, they are signalling that the long tail of always-on automation has become economically painful to subsidise. That signal matters when you are deciding whether to bet a startup on always-on Claude.

What To Change Before June 15: A Checklist

Most of the work you can do in the next month is small. It is also the difference between waking up on June 16 to a clean monthly bill and waking up to a Slack message from your co-founder asking why the OpenAI invoice has a Claude line on it.

Audit every place you call Claude programmatically. This is the work nobody wants to do. It is also the only one that matters. A rough first pass:

rg -n "anthropic|claude" \
  .github/workflows \
  scripts \
  apps \
  packages

You are looking for every cron, every action, every script, every server route, every background worker. Write the list down on actual paper. For each one, answer two questions: how often does it run, and roughly how many tokens per run. If you do not know, instrument it (see below) and come back next week.

Add token-level logging to every claude -p and Agent SDK call. The Anthropic SDK returns usage in the response. Wrap every call in a tiny logger and dump the result to a file or to your existing observability stack. Concrete pattern:


const SONNET_INPUT_PER_M = 3;
const SONNET_OUTPUT_PER_M = 15;

const client = new Anthropic();

type ClaudeCallContext = {
  caller: string;
  workflow?: string;
};

export async function runClaude(
  args: Anthropic.MessageCreateParams,
  context: ClaudeCallContext,
) {
  const start = Date.now();
  const response = await client.messages.create(args);
  const inputTokens = response.usage.input_tokens;
  const outputTokens = response.usage.output_tokens;
  const cost =
    (inputTokens / 1_000_000) * SONNET_INPUT_PER_M +
    (outputTokens / 1_000_000) * SONNET_OUTPUT_PER_M;

  logger.info({
    event: 'claude_call',
    caller: context.caller,
    workflow: context.workflow,
    inputTokens,
    outputTokens,
    cost,
    durationMs: Date.now() - start,
  });

  return response;
}

A week of this data tells you which workflows are cheap, which are scary, and which are blowing your future credit pool in three days. The number that matters is dollars per day per workflow, not tokens per call.

Set explicit max_tokens everywhere. The single most common way to nuke a credit pool is a runaway response. Set max_tokens to something realistic for the task. A diff summary does not need 4096 tokens. 600 is plenty. A release note generator does not need 8000. 1500 is plenty. The model will respect the cap. Your wallet will thank you.

Cap GitHub Actions runs with concurrency and if guards. The fan-out pattern is the cost-equivalent of leaving a tap running. If your action runs on push, every commit triggers it. If you also run on pull_request, the same commit triggers it twice. Add a concurrency block that cancels in-progress runs for the same PR, and add an if guard that skips drafts:

concurrency:
  group: claude-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  review:
    if: github.event.pull_request.draft == false
    runs-on: ubuntu-latest
    steps:
      - uses: anthropic/claude-code-action@v1

That single change cut my own action cost in half because I had been leaving runs queued on every push to a draft branch.

Decide per-workflow: subscription credit, direct API, or off. This is the strategic choice. For each programmatic Claude call you found in step one, pick one of three lanes. Subscription credit is for things that are predictable, run a known number of times per month, and fit comfortably inside your plan's allocation. Direct API is for things that need real budget alerts, team-wide pooling, or volume that exceeds the credit. Off means you switch that workflow to a cheaper model (Haiku, or a non-Claude alternative) or you decide it does not earn its place.

Wire spending alerts on the API key your overflow falls through to. If you enable extra usage so jobs do not just hard-stop, set a budget alert at $50 and another at $200. The Anthropic console supports this. Without it, your June bill arrives as a surprise. With it, you get a Slack message at 50 percent of expected and you can decide whether to keep going or pull the cord.

Pre-cache deterministic prompts. If the same system prompt is going into every call, and it is 4k tokens long, you are paying for that prompt every single time. Anthropic's prompt caching cuts the cost of repeated prefix tokens by roughly 90 percent on hit. Configure it for any workflow where the prefix is stable. A linter that always sends the same rule set is the canonical case.

A lot of this is the same instinct that drove feature flags for solo developers: wrap the expensive thing in a switch you can flip when reality disagrees with your assumptions. The flag does not save you money. The fact that you can turn the workflow off without a deploy does.

Should You Just Skip The Credit Pool And Go Direct API?

For some workflows, yes. The credit pool is convenient. It is also limited in ways that matter at any real scale.

The case for staying on subscription credit is genuine. The credit is bundled with a plan you are already paying for, so the first $20 or $200 of API is sunk cost. The auth flow is the one you already have. You do not need a separate billing relationship. For light, sporadic, per-developer use, it is the simplest option and it just works.

The case for moving a workflow to a direct API key starts when any of three things are true. First, the workflow runs unattended and you need predictable budget alerts. The credit pool does not give you per-workflow caps. A dedicated API key with a budget on the Anthropic console does. Second, the workflow is shared across a team. A single API key with billing on the company card scales better than asking every engineer's $20 personal credit to subsidise CI. Third, the workflow is hot enough to blow through the credit halfway through the month. Once you are out of credit, the difference between "fall through to API" and "use a dedicated API key from the start" is mostly bookkeeping, except the dedicated key is on the company card and has alerts.

If you are debating between credit and direct API and you cannot decide, the tiebreaker is: who pays the bill, and do they want to see the line item? If the answer is "the company does, and accounting wants a single line that says Anthropic API," go direct. If the answer is "I do, and it goes on my personal card alongside the subscription," credit is fine.

There is also the option that nobody likes to talk about, which is dropping to Haiku for any workflow where the quality difference does not matter. Sonnet is the right default for a chat. For a script that classifies log lines or extracts JSON from a fixed template, Haiku is five times cheaper and indistinguishable. Audit which workflows actually need Sonnet output and which inherited Sonnet because that was the default.

A Brief Word On Trust

The June 15 change is the third pricing reversal in three months. On April 4 Anthropic cut off OAuth access for third-party harnesses without warning, breaking every OpenClaw and Conductor setup. On April 22, Claude Code briefly disappeared from the Pro plan for a "test affecting about 2 percent of new prosumer signups," then came back inside 24 hours after Simon Willison surfaced it. And now, on May 13, this. Anthropic is not being malicious. They are figuring out the economics of an explicitly subsidised product in public, in real time, and it shows.

The practical takeaway is not "Claude is bad now." Sonnet is still the model I reach for first when I want code that works. The takeaway is that planning a business around the assumption that today's pricing will be next year's pricing is naive. If a single workflow in your product depends entirely on Claude being cheap, you have a single point of failure. Building a thin abstraction over the model call so you can swap providers (Sonnet, Codex, a local model, whatever) without rewriting business logic is no longer a "nice to have." It is part of indie-dev hygiene now, the same way idempotency in your Stripe webhooks is part of any real billing setup.

The thin abstraction is not exotic. A function called runModel(prompt, options) that picks an SDK based on an env var is enough. The point is that the day Anthropic announces the next change, you can flip a flag and route traffic somewhere else for a week while you decide what to do. That option is the actual safety net. The $20 credit is not.

The Real Lesson

Programmatic Claude was subsidised. That is the truth almost nobody is saying out loud. The subscription price was set when Anthropic wanted users on the platform and was willing to absorb compute cost to get them. That is also a normal stage of any platform. It does not last forever and it never did. The surprise is not that the meter is starting. The surprise is that we treated the un-metered version as the long-term default.

The people who will be fine on June 16 are the people who instrumented their spend before they had to, who know which workflows earn their place, who set caps on the runaway ones, and who have a working fallback if Claude prices double again next quarter. The people who will not be fine are the ones who find out about the change from a billing email.

You have about a month. Run the audit. Add the logging. Cap the workflows. Decide which lane each one belongs in. Wire the alerts. Pre-cache the stable prompts. None of it is hard. All of it is the difference between a clean June and a noisy one.

If you only do one thing this week, do the audit. Everything else flows from knowing where Claude actually lives in your stack.

Top comments (3)

Mei Hammer • May 17

Great breakdown. For anyone already using claude -p in scripts or automation, I built a drop-in replacement called claude-no-p that routes prompts through a persistent interactive Claude session via MCP Channels + a Stop hook — so it stays on subscription billing instead of the metered pool.

It's a direct claude -p substitute, same flags, same output formats. Still early but working. See: github.com/HammerMei/poor-claude

Alex Cloudstar • May 18

This is a clever approach, Mei. Routing through a persistent interactive session to stay on the subscription billing side is exactly the kind of creative workaround that makes sense right now while Anthropic is still sorting out where the lines are.

A couple of things I'm curious about: how do you handle concurrency when multiple scripts call claude-no-p at the same time? A persistent session is great for serial use, but if two CI jobs try to talk to the same session simultaneously things could get messy. And does the Stop hook give you reliable exit codes and stdout capture so existing tooling that parses claude -p output doesn't need changes?

One thing worth flagging for anyone considering this: Anthropic's classifier for interactive vs programmatic is based on whether a human is reading the response in real time, and they have signalled they intend to enforce that distinction. Routing automated calls through an interactive session wrapper might get caught in a future tightening of the rules, so it's worth keeping an eye on. Not saying don't use it, just that it should probably sit alongside a direct API fallback rather than be the only path.

Either way, the fact that claude -p's interface is clean enough to drop in a substitute like this says something good about the design. Will check out the repo.

Mei Hammer • May 19

Thanks for the detailed response, Alex — and for flagging the concurrency gap directly.

On concurrency (honest answer): Right now there's no queuing — if two requests hit the same session simultaneously, the second one gets a "server busy" error and is dropped. Multiple callers need to use separate sessions (different --workdir or --session-id). Queuing is on the roadmap: the plan is to buffer requests at the poor-claude layer and drain them sequentially into the Claude Code session. Execution will stay serial (one turn at a time per session), but callers won't get dropped — they'll wait their turn.

On the interactive/programmatic line: You're right that it's a grey area worth watching. The primary use case here is actually middleware connecting Claude to a messaging platform (Rocket.Chat) where real humans read and respond in real time — that feels interactive by any reasonable definition, just with a chat UI instead of a terminal. A lot of people are building custom dashboards or better UIs on top of Claude Code in the same spirit. Anthropic's direction to bring more clarity is good, but the current framing seems drawn at the API surface rather than the actual human-in-the-loop question. Hopefully it evolves. API fallback is on the roadmap regardless.

Appreciate the thoughtful feedback — this kind of exchange makes the project better. ❤️