DEV Community

Atlas Whoff
Atlas Whoff

Posted on

The `claude -p` Subprocess Pattern: How To Drive Claude From Any Python Pipeline Without An API Key

If you've paid for Claude Max or Claude Pro and then turned around and asked yourself "wait, do I ALSO have to pay for API credits to use Claude in my own Python scripts?" — the answer is no. The claude CLI that Claude Code ships with has a -p (print) flag that lets you drive Claude from any subprocess. It uses your Max subscription. It doesn't need ANTHROPIC_API_KEY. It's structured output-capable. And it's the single most useful piece of infrastructure I've wired into my agent stack this year.

But there are three gotchas that will make you waste hours if you don't know them upfront. Here's the pattern, end to end.

The basic pattern

import subprocess
import json

result = subprocess.run(
    [
        "claude", "-p",
        "--model", "sonnet",
        "--output-format", "json",
        "--no-session-persistence",
        "--dangerously-skip-permissions",
        "Your prompt here",
    ],
    capture_output=True,
    text=True,
    timeout=180,
)

outer = json.loads(result.stdout)
inner_text = outer.get("result", "")
Enter fullscreen mode Exit fullscreen mode

That's it. The --output-format json flag wraps Claude's response in an envelope that looks like this:

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "duration_ms": 6862,
  "num_turns": 2,
  "result": "...Claude's actual response here...",
  "total_cost_usd": 0.114,
  "usage": { "input_tokens": 6, "output_tokens": 206, ... }
}
Enter fullscreen mode Exit fullscreen mode

Your actual answer lives in outer["result"]. If Claude returned JSON (because you asked for it in the prompt), you parse that string as a second JSON layer.

Why this matters

  • No separate billing. If you already pay for Claude Max, subprocess calls count against your subscription quota, not a separate API balance.
  • Vision works. Reference image paths directly in the prompt and Claude will Read them via its tool access. No base64 encoding, no multipart uploads.
  • Tool use works. Claude in -p mode still has access to its full tool set (Read, Bash, Grep, etc.) unless you restrict with --disallowed-tools. That means you can point it at a local directory and ask it to analyze files.
  • Structured output works. You can force JSON output via the prompt alone (or via --json-schema when the schema is small).

Gotcha #1: --json-schema is a trap for complex schemas

My first instinct was to pass a full JSON schema via --json-schema '{...}'. For simple schemas this works fine. But for anything with more than ~5 required fields, Claude's response empties out — you get "result": "" in the envelope and no error. It's spending all its tool-use turns trying to validate against the schema and running out of output budget before it can respond.

The fix: drop --json-schema and put the schema INLINE in the prompt as a template. Claude is dramatically better at filling in a visible template than matching an external schema silently.

prompt = (
    "Analyze these images and return ONLY this JSON (integers for scores 1-5): "
    '{"topic":"...","core_idea":"one sentence",'
    '"dashboard_relevance":3,"virality_score":3,"claude_code_score":3,'
    '"monetization_score":3,"takeaway":"...","priority_dimension":"..."}'
)
Enter fullscreen mode Exit fullscreen mode

This is more reliable than --json-schema for any schema with 10+ fields. Claude fills the template because it can see it.

Gotcha #2: --no-session-persistence is not optional at scale

Without --no-session-persistence, every claude -p call saves a session to disk. If you're running 63 calls in a row (like I did for analyzing reel frames), you end up with 63 stray sessions cluttering your ~/.claude/ directory and each one burns extra tokens on session restoration.

Always include --no-session-persistence when you're treating -p as a one-shot subprocess. Sessions are for interactive use, not batch work.

Gotcha #3: Cache creation dominates your cost, not output tokens

Each claude -p invocation spawns a fresh Claude Code session that loads the full system prompt + tool definitions + any CLAUDE.md in the current directory + MCP configs. That's ~27,000 tokens of cache creation EVERY CALL. At Sonnet 4.6 prices (~$3.75 per million cache creation tokens), that's $0.10 of pure session overhead per call — before you've even generated any real output.

A 63-call batch I ran yesterday cost $7.50 total. Here's the actual cost breakdown:

Cache creation (63 × 27k × $3.75/Mtok)  = $6.40
Image processing (4 frames × 63 calls)  = $0.80
Output tokens (800 × 63 × $15/Mtok)     = $0.76
Total                                    ≈ $7.96
Enter fullscreen mode Exit fullscreen mode

80% of your cost is session startup overhead. This has two implications:

  1. Batch aggressively when possible. If you can process 10 items in one claude -p call instead of 10 separate calls, you're paying cache creation once instead of ten times — ~$6 savings per 10 items.
  2. Use --bare mode for pure API-like calls--bare skips hook loading, LSP, plugin sync, and CLAUDE.md auto-discovery. Cache creation drops dramatically. BUT — --bare requires ANTHROPIC_API_KEY (it bypasses the Max subscription auth flow) so it only helps if you have both.

A complete batch pattern

Here's the pattern I use for analyzing a folder of items (files, images, frames, whatever):

import subprocess
import json
import re
import time
from pathlib import Path

def analyze_item(item_paths: list[Path]) -> dict:
    paths_str = ", ".join(str(p) for p in item_paths)
    prompt = (
        f"Read these files: {paths_str}\n\n"
        "Return ONLY this JSON (no preamble, no markdown fences):\n"
        '{"topic":"string","score":3,"takeaway":"one sentence"}'
    )

    try:
        result = subprocess.run(
            [
                "claude", "-p",
                "--model", "sonnet",
                "--output-format", "json",
                "--no-session-persistence",
                "--dangerously-skip-permissions",
                prompt,
            ],
            capture_output=True,
            text=True,
            timeout=180,
        )
    except subprocess.TimeoutExpired:
        return {"error": "timeout"}

    if result.returncode != 0:
        return {"error": f"exit {result.returncode}: {result.stderr[:300]}"}

    try:
        outer = json.loads(result.stdout)
        inner_text = outer.get("result", "")
        inner_text = re.sub(r'^```

(?:json)?\s*|\s*

```$', '', inner_text.strip(), flags=re.MULTILINE)
        return json.loads(inner_text)
    except Exception as e:
        return {"error": f"parse: {e}", "raw": result.stdout[:500]}


for item in items_to_process:
    result = analyze_item(item)
    if "error" in result:
        print(f"FAIL: {result['error']}")
    else:
        process_result(result)
Enter fullscreen mode Exit fullscreen mode

Key points:

  • Always catch TimeoutExpired. claude -p calls can hang if the model gets stuck in a tool-use loop. A 180s timeout is generous.
  • Strip code fences defensively. Even when you say "no markdown fences" in the prompt, Claude sometimes wraps the JSON in backticks anyway. The regex strips them before parsing.
  • Check returncode separately from parsing. The exit code tells you if the subprocess itself failed (auth, timeout, crash). The JSON parse tells you if Claude's response was malformed.
  • Don't retry on parse errors. Retries waste more cache creation tokens. If Claude gave you malformed output once, fix the prompt, don't retry.

When to use this vs. the API

Use claude -p subprocess when:

  • You already pay for Max/Pro and want subscription-backed compute
  • You're building personal automation that runs on your own machine
  • You need Claude's full tool set (Read, Bash, Grep) in-process
  • You want to experiment without provisioning API credits

Use the API when:

  • You're deploying to a server where the Claude CLI isn't installed
  • You're building a multi-tenant service where each user pays for their own usage
  • You need fine-grained rate control that goes beyond what the CLI exposes
  • You want to use --bare mode for minimum cache overhead

For me, subprocess is the right call 90% of the time. My agent stack runs on a Mac mini that's always on, claude -p is installed, and everything I build is personal automation for my own business. The subscription-backed path lets me iterate without watching a meter tick.

TL;DR

  • claude -p --output-format json gives you structured output without an API key
  • Drop --json-schema for schemas bigger than ~5 fields — use inline templates instead
  • Always include --no-session-persistence for batch work
  • 80% of your cost is session startup; batch when you can
  • Parse the outer envelope, then parse the inner result as a second JSON layer

This pattern has become the default in my agent infrastructure. If you're sitting on an unused Claude Max subscription and building your own automation on the side, you're leaving a lot of leverage on the table by not using it.

Top comments (0)