fujibee

Posted on Jun 12 • Edited on Jun 15

The claude -p playbook for June 15: rebuilding your AI workflows inside interactive sessions

#ai #claude #agmsg #opensource

On June 15, claude -p (headless mode) and the Agent SDK stop drawing from your Claude subscription and move onto a separate metered credit. I've written a lot of claude -p scripts over the past few months, so my honest first thought was that a chunk of my setup had just moved onto a meter.

But it's not really one company's pricing call. The whole industry has been drifting the same way. Copilot moved to AI Credits on June 1 (completions free; Chat, CLI and agents burn credits). Codex pairs a seat with credits plus API usage. And now Claude splits headless and the SDK off on the 15th. The pattern is the same each time: you stay flat-rate while you're driving, and it meters once you step away.

So for me the question stopped being how to dodge the bill and became which work actually belongs where. A few months of running several agents inside interactive sessions has changed my answer, and the short version is that a lot of my headless scripts never needed to be headless.

One caveat before the recipes, since someone always asks: yes, several of them also keep this work inside the flat-rate plan. Pricing changes, so check the current terms yourself. My case for doing it is that the in-session setup is a better way to work. Cheaper is a side effect.

Why the line landed where it did

If you want the full cost mechanics, The flat-fee era is over and The Tokenpocalypse both cover it. They land on the same point. Old completion-style usage was short output, so flat-rate worked. Agentic tools broke that. One request now hides a flood of tokens (read the repo, grep around, run the tests, patch, and repeat), and the gap between a light user and a heavy one got wide enough, maybe 10x on real token cost, that unlimited-for-everyone stopped penciling out. It's the same shift cloud made going from renting a server to paying for what you use.

Look at where each vendor drew its line and you see the same split. Usage with a human at the screen stays flat. Usage where the human stepped away (headless, SDK, CI) gets metered.

Which makes sense if you think about throughput. A person reads and thinks and types, so one interactive session caps out at human speed. A headless call can run in a loop with nothing to slow it down. So the pricing is partly about what a flat rate can sustain, and partly a signal about which surface the platform wants people on. Either way it points the same way. The interactive session is the stable place to be.

Was your claude -p actually unattended?

Looking back, a lot of what I pushed through claude -p never needed to run on its own. I wanted a second agent's read while I worked, or a review from a different model, or a refactor running in parallel on something else, and in every case I was sitting right there. There just wasn't a channel between sessions, so I had two options: play copy-paste courier myself, or wrap claude -p in a script to bridge them.

This happens a lot while a stack is still coming together: when one piece is missing, whatever sits next to it covers for it. With no way for agents to talk directly, headless calls and glue scripts carried that load. Headless wasn't the wrong tool, just the only channel I had.

Once that piece exists, you can redraw who does what. June 15 isn't only about budgeting for headless credits (some jobs really do need them, more on that below). It's also a chance to move the borrowed work back where it belongs and leave claude -p the jobs that actually have to run headless. If agents can talk inside interactive sessions, you don't need the courier or the bridge script.

The channel I built for that is agmsg. It's bash plus SQLite and nothing else. Claude Code, Codex, Gemini CLI and Copilot CLI sessions join a team and message each other, with no daemon, no network, and no MCP. Sending and receiving both happen inside your normal interactive sessions through a hook, so there's no claude -p or SDK anywhere in it.

https://github.com/fujibee/agmsg

Here's how the common claude -p patterns look once you rebuild them.

Recipe 1: a script that asks an AI → a resident buddy session

A code review or a quick consult, fired off to headless from inside a script. The script isn't interactive, but the AI work inside it is really just a conversation.

#!/usr/bin/env bash
# review-diff.sh: runs in the dev loop
set -euo pipefail
diff=$(git diff --staged)
# headless call, metered after June 15
review=$(claude -p \
  --print "Review this diff for race conditions and SQL injection. Return JSON \
  {\"verdict\":\"ok|block\",\"notes\":\"...\"}." \
  <<<"$diff")
echo "$review" | jq -e '.verdict == "ok"' >/dev/null || {
  echo "$review" | jq -r '.notes' >&2
  exit 1
}

Instead, open a second Claude Code in another terminal and keep it on the team in real-time monitor mode. The reviewer is just a regular interactive session, covered by your subscription.

One-time setup in terminal 2 (the reviewer), just this inside the session:

/agmsg
# → it asks for team + agent name: answer team: dev / agent: alice
# → pick monitor (real-time) when asked how to receive

# from now on, anything addressed to alice streams into this window live

The script side, terminal 1, where claude -p used to sit:

#!/usr/bin/env bash
# review-diff.sh: agmsg version
set -euo pipefail
TEAM=dev
FROM=worker     # this script's identity
TO=alice        # the resident reviewer session
# hand the review to the live session; body is plain text,
# pass a reference and a short ask, not raw context
~/.agents/skills/agmsg/scripts/send.sh "$TEAM" "$FROM" "$TO" \
  "Please review the staged diff at $(pwd). Reply with verdict (ok|block) + notes."
echo "Waiting for $TO's verdict..."
while true; do
  reply=$(~/.agents/skills/agmsg/scripts/inbox.sh "$TEAM" "$FROM" 2>/dev/null)
  case "$reply" in
    *"verdict: ok"*)    echo "ok"; exit 0 ;;
    *"verdict: block"*) echo "$reply" >&2; exit 1 ;;
  esac
  sleep 5
done

The reviewer reads the diff, replies with /agmsg send worker "verdict: ok ...", and the script picks up where it left off. Nothing called claude -p.

The win is that the buddy stays open instead of being a headless process that vanishes after each call. Context sticks around, so by the second ask, telling it to use the same approach as before just works. It beats a headless call you had to brief from scratch every time.

It also differs from subagents in a way worth knowing. A subagent inherits the parent's context, so even when you ask it to review something independently, the answer tends to land inside the parent's framing. A separate session starts clean, so the review is genuinely independent. One person moved to agmsg for exactly this, in a data-analysis setup (I'm collecting use cases, there's more there).

Recipe 2: an orchestration script → a director agent

A homegrown orchestration script that fans out to a few models in parallel and merges what comes back. Credits drain by the number of workers.

#!/usr/bin/env bash
# judge-panel.sh: three independent reviews → synthesis
set -euo pipefail
problem="$1"
# three parallel headless calls, each one metered
analysis_a=$(claude -p "$problem" --print) &
analysis_b=$(claude -p "$problem (focus on security)" --print) &
analysis_c=$(codex exec --headless --prompt "$problem (focus on perf)" --json) &
wait
# synthesis, yet another call
claude -p --print "Synthesize these three reviews into one verdict: ..."

Instead, stand up one director session and drop the workers (mix models and tools freely) onto the same team. The orchestration logic turns into the director's instructions; fan-out, collection and synthesis all ride on agmsg messages.

Setup is one /agmsg per session: the three workers (each in its own terminal, window or IDE pane) and the director (the one you actually work in):

# inside each session
/agmsg
# → answer team: panel / agent: reviewer-a (b, c, director), pick monitor

Then you just talk to the director:

You (in the director session):
  "Send reviewer-a a correctness review, reviewer-b a security review,
   reviewer-c a performance review, of the staged diff.
   Wait for all three, then synthesize."

director: [sends three agmsg messages, polls inbox, and once all three
           land, writes the synthesis in this same session]

The control flow you used to script is now plain English. No claude -p.

A real one: on Fable 5's release day someone ran a refactor with Fable 5 as the commander, Opus 4.8 as deputy and Sonnet 4.6 as scouts. The day a model ships you can drop it straight into the lineup you already run. That's the point of the transport not caring which model is behind each seat.

Recipe 3: a background job → a dedicated worker session

Something you want chugging along next to your own work (test scaffolding, a doc update, some research) that you used to kick off with cron or a script plus claude -p.

Instead, in the morning open one more terminal, start a worker session and leave it on the team (inside the session: /agmsg → team: dev / agent: worker / pick monitor). When something comes up, message it from your main session. It picks up each inbound and reports back when it's done.

Kickoff, from your main session:

You: send worker: "Analyze /tmp/deploy.log for slow queries. Write findings
to /tmp/deploy-findings.md. Ping me when done."

It starts on its next monitor event, does the work, writes the file, and comes back:

worker → me: "deploy-findings.md ready. 3 queries > 500ms, all on the
users table. Top one is the email_lower lookup, index missing."

No orphaned processes the way a backgrounded headless job leaves them. Close the worker terminal and the work stops, which is the upside of running it where you can see it. Dispatch and progress stay automatic; only the execution moved back into the same room as you. When something breaks it's right there in the next terminal, not a cron job that failed quietly at 3am.

Recipe 4: a tool-to-tool pipe → a message

A pipe or a temp file plus glue to hand Claude's output to Codex, or the other way around.

Instead, put both sessions on the same team and let them talk.

This one runs here every day, so here's a real case. Last week agmsg got a Linux-only bug report (issue #95). The Claude Code that wrote the fix and a separate session that verified it in a Debian container ran the whole exchange over agmsg messages: report, environment details, verification, merge note. It went from report to merge in six hours (PR #97). Both were ordinary interactive sessions, with no glue anywhere.

Minimum setup:

# Terminal A: Claude Code, joined as "analyst"
#   /agmsg → team: dev / agent: analyst
# Terminal B: Codex, joined as "implementer"
#   $agmsg → team: dev / agent: implementer

And then it's just talking:

# Terminal A (analyst):
You: "send implementer: the settings writer embeds settings.local.json into
     the sqlite3 argv 6x (issue #95). readfile() in the SQL should keep it
     off argv entirely. Patch + tests please."

# Terminal B (implementer): receives the message, writes patch + tests, runs bats:
You: "send analyst: branch fix/delivery-e2big pushed. 3 bats cases at a 25KB
     fixture, all green locally. Ready for review."

# Terminal A: reads the PR, approves inline, then:
You: "send implementer: approving. one ask: verify on Linux too; the pre-fix
     code should fail there with E2BIG."

Each hand-off is a sentence and a reference. The receiver does the heavy lifting in its own session against the live filesystem. Nothing gets piped.

One exception: running the same prompt across several models in parallel for independent takes is still better as a headless pipe. That's a spend-the-credits call, covered in the next section.

When to keep it headless

None of this means replacing everything. Some work is genuinely right for headless, and worth budgeting the new credits for:

CI pipelines, running where no human is by definition. The GitHub Actions integration is on the new credit too.
Nightly batches and scheduled jobs that are genuinely meant to run unattended.
SDK usage baked into a product, which should be costed as metered from the start.
One-shot massive parallelism, like the same input across 50 calls, which is a credit-budget decision rather than a workflow one.

The test I use is just: when it's running, are you there? If yes, it can move into an interactive session. If no, budget the headless credits and move on.

Why I'd do it this way even without the price change

I read June 15 as the platform saying development with a human in the loop gets the structural advantage now.

But the reason I'd keep the interaction and automate only the coordination isn't really the flat-rate bucket. After a few months of it, what I keep noticing is that work where the judgment stays with a human is faster in the end. The coordination runs itself, and you get pulled in at the branch points and the final call. No copy-paste relay, no 3am cron log to read backward.

The pricing change just happened to point the same way.

agmsg is open source, and setup runs about 30 seconds if you've got bash and sqlite3.
https://github.com/fujibee/agmsg

DEV Community