Claude API Concurrent Requests and Rate Limit Handling Guide (2026)

#throughput #performance #python #node

Originally published at claudeguide.io/claude-api-concurrent-requests

Claude API Concurrent Requests and Rate Limit Handling Guide (2026)

The Claude API enforces rate limits by requests per minute (RPM) and tokens per minute (TPM) per tier — running concurrent requests with exponential backoff and a semaphore-based concurrency limit is the correct approach for high-throughput workloads. This guide covers rate limit tiers, parallel request patterns in Python and Node.js, queuing strategies, and the cost math for optimizing throughput without triggering 429 errors.

Understanding Claude API Rate Limits

Anthropic enforces limits at two levels:

Limit Type	What It Controls
Requests per minute (RPM)	How many API calls you can make per minute
Tokens per minute (TPM)	Total input + output tokens per minute
Tokens per day (TPD)	Daily token ceiling (lower tiers)

Rate limit tiers by usage level (approximate — check Anthropic console for your current tier):

Tier	RPM (Sonnet)	TPM (Sonnet)
Free	5	25,000
Build (Tier 1)	50	40,000
Scale (Tier 2)	1,000	80,000
Scale (Tier 3)	2,000	160,000
Scale (Tier 4)	4,000	400,000

Rate limit errors return HTTP 429 with a Retry-After header indicating when to retry.

The Core Pattern: Semaphore + Exponential Backoff

The standard approach for concurrent requests:

Semaphore: Limit max concurrent in-flight requests to stay under RPM
Exponential backoff: When you hit 429, wait and retry with increasing delays
Token budget tracking: Monitor TPM to avoid token-based rate limits

Python Implementation

import anthropic
import asyncio
import time
from typing import Optional

client = anthropic.AsyncAnthropic()

async def call_with_backoff(
    prompt: str,
    semaphore: asyncio.Semaphore,
    max_retries: int = 5
) -

---

## Token-Per-Minute (TPM) Tracking

RPM limits are easy to visualize, but TPM limits often bite first with large prompts. Track token usage per window:

python
import time
from collections import deque

class TokenBudgetTracker:
"""Track tokens used in a rolling 60-second window."""

def __init__(self, tpm_limit: int):
    self.tpm_limit = tpm_limit
    self.usage_window: deque = deque()  # (timestamp, tokens) pairs

def record_usage(self, tokens: int):
    now = time.time()
    self.usage_window.append((now, tokens))
    # Remove entries older than 60 seconds
    while self.usage_window and self.usage_window[0][0] < now - 60:
        self.usage_window.popleft()

def tokens_used_last_minute(self) -