DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude API Rate Limits: Tiers, Headers, and Retry Strategies (2026)

Originally published at claudeguide.io/claude-api-rate-limits-2026

Claude API Rate Limits: Tiers, Headers, and Retry Strategies (2026)

The Claude API enforces rate limits on tokens per minute (TPM) and requests per minute (RPM) per API key. Understanding the limit structure — and what the response headers tell you — is the difference between a production system that degrades gracefully and one that goes down in 2026.

Rate limit tiers

Anthropic uses a tier system based on account age and spend history. Limits increase automatically as you meet the thresholds.

Usage Tier 1 (new accounts)

Granted on signup with a valid credit card.

Model RPM TPM TPD
claude-3-5-haiku 50 50,000 5M
claude-3-5-sonnet 50 40,000 4M
claude-3-7-sonnet 50 40,000 4M
claude-opus-4 50 20,000 2M

Usage Tier 2

Requires $0–$100 spend, account age ≥7 days.

Model RPM TPM TPD
claude-3-5-haiku 1,000 200,000
claude-3-5-sonnet 1,000 160,000
claude-3-7-sonnet 1,000 160,000

Usage Tier 3

Requires $100+ spend, account age ≥14 days.

Model RPM TPM
claude-3-5-haiku 2,000 400,000
claude-3-5-sonnet 2,000 320,000

Tier 4 and above

Requires $500–$1,000+ spend. Contact Anthropic for enterprise limits beyond Tier 4.

Check your current tier at console.anthropic.com → Settings → Limits.


Rate limit response headers

Every API response includes headers that tell you your current rate limit state. Read these before you need them:

anthropic-ratelimit-requests-limit: 1000
anthropic-ratelimit-requests-remaining: 999
anthropic-ratelimit-requests-reset: 2026-04-26T12:01:00Z

anthropic-ratelimit-tokens-limit: 200000
anthropic-ratelimit-tokens-remaining: 187432
anthropic-ratelimit-tokens-reset: 2026-04-26T12:01:00Z

anthropic-ratelimit-input-tokens-limit: 150000
anthropic-ratelimit-input-tokens-remaining: 143000
anthropic-ratelimit-input-tokens-reset: 2026-04-26T12:01:00Z

anthropic-ratelimit-output-tokens-limit: 50000
anthropic-ratelimit-output-tokens-remaining: 44432
anthropic-ratelimit-output-tokens-reset: 2026-04-26T12:01:00Z

retry-after: 5
Enter fullscreen mode Exit fullscreen mode

The retry-after header appears only on 429 responses. It tells you exactly how many seconds to wait before retrying.

Reading headers with the SDK

The Python SDK exposes headers on the response:

response = client.messages.with_raw_response.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

headers = response.headers
remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 0))
remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 0))

print(f"Requests remaining: {remaining_requests}")
print(f"Tokens remaining: {remaining_tokens}")

message = response.parse()
print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

When you hit a rate limit: the error

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded. Please try again in 30 seconds."
  }
}
Enter fullscreen mode Exit fullscreen mode

HTTP status: 429 Too Many Requests

This is recoverable. The correct response is to wait and retry — not to fail permanently.


Retry patterns

Exponential backoff with jitter (the standard)

import anthropic
import time
import random

client = anthropic.Anthropic()

def call_with_backoff(messages, model="claude-3-5-sonnet-20241022", max_retries=5):
    """Retry with exponential backoff + jitter on rate limit errors."""
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages,
            )
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise  # Give up after max_retries

            # Exponential backoff: 2^attempt seconds + jitter
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code == 529:  # Overloaded
                wait = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait)
            else:
                raise
Enter fullscreen mode Exit fullscreen mode

Using retry-after header

When you have the exact wait time from the header:

import anthropic
import time
import httpx

def call_with_retry_after(messages):
    """Use the retry-after header for precise wait time."""
    while True:
        try:
            response = client.messages.with_raw_response.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=1024,
                messages=messages,
            )
            return response.parse()
        except anthropic.RateLimitError as e:
            # The SDK wraps the response — extract retry-after
            retry_after = 30  # default fallback
            if hasattr(e, 'response') and e.response:
                retry_after = int(e.response.headers.get("retry-after", 30))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after + 1)  # +1 for safety
Enter fullscreen mode Exit fullscreen mode

The SDK's built-in retry

The official SDK retries automatically on 429s and 529s with exponential backoff. The default is 2 retries:

# Increase built-in retries
client = anthropic.Anthropic(
    max_retries=4,
    timeout=httpx.Timeout(60.0, connect=5.0)
)
Enter fullscreen mode Exit fullscreen mode

The SDK's built-in retry is sufficient for most cases. Add your own layer only when you need custom logic (logging, fallback models, circuit breaking).


Staying under limits proactively

Monitor remaining tokens before large requests


python
def safe_large_request(large_prompt, threshold=0.2):
    """Only send if 

PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 → $187/month on a customer support agent.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-api-rate-limits-2026)

*30-day money-back guarantee. Instant download.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)