Originally published at claudeguide.io/claude-api-concurrent-requests
Claude API Concurrent Requests and Rate Limit Handling Guide (2026)
The Claude API enforces rate limits by requests per minute (RPM) and tokens per minute (TPM) per tier — running concurrent requests with exponential backoff and a semaphore-based concurrency limit is the correct approach for high-throughput workloads. This guide covers rate limit tiers, parallel request patterns in Python and Node.js, queuing strategies, and the cost math for optimizing throughput without triggering 429 errors.
Understanding Claude API Rate Limits
Anthropic enforces limits at two levels:
| Limit Type | What It Controls |
|---|---|
| Requests per minute (RPM) | How many API calls you can make per minute |
| Tokens per minute (TPM) | Total input + output tokens per minute |
| Tokens per day (TPD) | Daily token ceiling (lower tiers) |
Rate limit tiers by usage level (approximate — check Anthropic console for your current tier):
| Tier | RPM (Sonnet) | TPM (Sonnet) |
|---|---|---|
| Free | 5 | 25,000 |
| Build (Tier 1) | 50 | 40,000 |
| Scale (Tier 2) | 1,000 | 80,000 |
| Scale (Tier 3) | 2,000 | 160,000 |
| Scale (Tier 4) | 4,000 | 400,000 |
Rate limit errors return HTTP 429 with a Retry-After header indicating when to retry.
The Core Pattern: Semaphore + Exponential Backoff
The standard approach for concurrent requests:
- Semaphore: Limit max concurrent in-flight requests to stay under RPM
- Exponential backoff: When you hit 429, wait and retry with increasing delays
- Token budget tracking: Monitor TPM to avoid token-based rate limits
Python Implementation
import anthropic
import asyncio
import time
from typing import Optional
client = anthropic.AsyncAnthropic()
async def call_with_backoff(
prompt: str,
semaphore: asyncio.Semaphore,
max_retries: int = 5
) -
---
## Token-Per-Minute (TPM) Tracking
RPM limits are easy to visualize, but TPM limits often bite first with large prompts. Track token usage per window:
python
import time
from collections import deque
class TokenBudgetTracker:
"""Track tokens used in a rolling 60-second window."""
def __init__(self, tpm_limit: int):
self.tpm_limit = tpm_limit
self.usage_window: deque = deque() # (timestamp, tokens) pairs
def record_usage(self, tokens: int):
now = time.time()
self.usage_window.append((now, tokens))
# Remove entries older than 60 seconds
while self.usage_window and self.usage_window[0][0] < now - 60:
self.usage_window.popleft()
def tokens_used_last_minute(self) -
Top comments (0)