DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude Batch API: 50% Discount for Async Workloads (2026 Guide)

Originally published at claudeguide.io/claude-batch-api-guide

Claude Batch API: 50% Discount for Async Workloads (2026 Guide)

The Claude Message Batches API gives you a flat **50% discount on input and output tokens for any request that can wait up to 24 hours for a response.** No architectural changes to your model — you're running the same claude-3-5-haiku or claude-3-5-sonnet, just submitting work in bulk rather than waiting for each response inline.

This is one of the three levers in the Claude API cost reduction stack. Prompt caching handles repeated context. Model tiering handles the request routing. Batch API handles the async workloads you're probably already running synchronously.

What qualifies for Batch API

The core constraint: you submit requests and poll for results. You don't get a streaming response, and you can't wait synchronously for an answer in under a few seconds.

Workloads that fit naturally:

  • Nightly report generation — summarize usage logs, generate weekly digests, create reports across user accounts
  • Bulk document processing — extract structure from uploaded PDFs, classify support tickets, tag content
  • Batch embeddings or classification — categorize a product catalog, run sentiment analysis on customer feedback
  • Evaluation runs — test prompt changes against a benchmark dataset
  • Data transformation pipelines — reformat, clean, or enrich data at scale

Workloads that don't fit:

  • Anything where the user is waiting for a response (chat, autocomplete, search)
  • Real-time classification in a request/response flow
  • Any pipeline where step N depends on step N-1's result in under a minute

Pricing: the actual numbers

For claude-3-5-sonnet as of April 2026:

Mode Input Output
Standard $3.00 / 1M tokens $15.00 / 1M tokens
Batch API $1.50 / 1M tokens $7.50 / 1M tokens

The 50% discount applies equally to input and output tokens. Prompt caching discounts stack on top — if you also use a cached prefix, you get the cache hit rate on the input tokens first, then the batch discount applies to the remainder.

Implementation

The Batch API is a separate endpoint. You submit a list of requests as a single batch, get a batch ID back, and poll until all requests complete.

Submit a batch

import anthropic
import json

client = anthropic.Anthropic()

# Prepare your requests
requests = [
    {
        "custom_id": f"report-{user_id}",
        "params": {
            "model": "claude-3-5-haiku-20241022",
            "max_tokens": 1024,
            "messages": [
                {
                    "role": "user",
                    "content": f"Summarize this week's activity for user {user_id}: {activity_data}"
                }
            ]
        }
    }
    for user_id, activity_data in user_activity_map.items()
]

# Submit the batch
batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Request count: {batch.request_counts.processing}")
Enter fullscreen mode Exit fullscreen mode

Poll for completion

import time

def wait_for_batch(client, batch_id, poll_interval=60):
    """Poll until batch completes. Returns the batch object."""
    while True:
        batch = client.messages.batches.retrieve(batch_id)

        if batch.processing_status == "ended":
            return batch

        counts = batch.request_counts
        print(f"Progress: {counts.succeeded + counts.errored}/{counts.processing + counts.succeeded + counts.errored}")
        time.sleep(poll_interval)

batch = wait_for_batch(client, batch.id)
Enter fullscreen mode Exit fullscreen mode

Retrieve results

# Stream results — don't load everything into memory at once
for result in client.messages.batches.results(batch.id):
    custom_id = result.custom_id

    if result.result.type == "succeeded":
        message = result.result.message
        content = message.content[0].text
        # Process content...
    elif result.result.type == "errored":
        error = result.result.error
        print(f"Request {custom_id} failed: {error}")
Enter fullscreen mode Exit fullscreen mode

TypeScript/Node.js version


typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Submit
const batch = await client.messages.batches.create({
  requests: userIds.map((userId) =

PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 → $187/month on a customer support agent.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-batch-api-guide)

*30-day money-back guarantee. Instant download.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)