Jangwook Kim

Posted on May 9 • Originally published at jangwook.net

Anthropic SDK vs OpenAI SDK: Developer Experience Compared — Type Safety, Error Handling, and Streaming Patterns

#anthropic #openai #python #sdk

pip install anthropic openai — then I started pulling them apart side by side. anthropic 0.100.0, openai 2.36.0. The version numbers alone tell a story: Anthropic is still in 0.x territory while OpenAI is already on a 2.x branch. The numbers matter less than what they signal about design philosophy.

I ran both SDKs in a temporary sandbox, inspecting type hierarchies, error classes, streaming source code, and tool call formats directly. Here's what I found.

First Impressions: What Version Numbers Actually Tell You

anthropic 0.100.0 is its 100th minor release without ever shipping 1.0. That's either deliberate API stability caution or a marketing choice. openai 2.36.0 has already gone through one major version bump.

Both SDKs are built on httpx internally, both stream using SSE (Server-Sent Events). Looking at top-level client initialization reveals the philosophical differences.

# anthropic.Anthropic() exclusive params
client = anthropic.Anthropic(
    api_key=None,
    auth_token=None,
    credentials=None,   # enterprise credential objects
    config=None,        # profile-based configuration
    profile=None,       # named profile
    webhook_key=None,
    _token_cache=NOT_GIVEN,
)

# openai.OpenAI() exclusive params
client = openai.OpenAI(
    api_key=None,
    admin_api_key=None,
    workload_identity=None,   # IAM-based auth
    organization=None,
    project=None,
    webhook_secret=None,
    websocket_base_url=None,  # for Realtime API
    _enforce_credentials=True,
)

Anthropic leans into enterprise credential management — multiple accounts via config files and named profiles. OpenAI focuses on team-level billing and IAM-based authentication through organization, project, and workload_identity.

Shared params include max_retries=2, timeout, default_headers, http_client — identical defaults on both.

Type System: 408 vs 230

This was the most surprising number from the sandbox.

anthropic.types exported types: 408
openai.types exported types: 230

The gap is significant. Claude's API returns richer response structures — ToolUseBlock, ThinkingBlock, CitationContentBlockLocation, BashCodeExecutionOutputBlock are all valid response content types, each with a matching TypedDict param. OpenAI centers around ChatCompletion with a simpler, more uniform response format.

Anthropic-exclusive types worth knowing:

Type	Feature
`ThinkingBlock` / `ThinkingConfigParam`	Extended Thinking
`CacheControlEphemeralParam`	Prompt caching (TTL: `'5m'` / `'1h'`)
`CitationContentBlockLocation`	Citation location tracking
`BashCodeExecutionOutputBlock`	Code execution tool results
`MemoryTool20250818Param`	Agent memory tool
`ServerToolCaller20260120Param`	Server-side tool executor
`AnthropicBetaParam`	Beta feature header control

OpenAI-exclusive:

Feature	Description
`AssistantEventHandler`	Assistants API event streaming
Realtime API	WebSocket-based bidirectional streaming
Fine-tuning module	`fine_tuning` module built in
`OAuthError`	Under `AuthenticationError`

More types isn't strictly better — BashCodeExecutionToolResultBlockParam and BashCodeExecutionToolResultErrorParam as separate types gives you precise autocomplete but steepens the learning curve. OpenAI's simpler shape makes onboarding faster.

Tool Call Format: input_schema vs function.parameters

The biggest API design divergence between the two.

# Anthropic tool definition
anthropic_tool = {
    "name": "get_weather",
    "description": "Get current weather",
    "input_schema": {              # JSON Schema is the root
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    }
}

# OpenAI tool definition
openai_tool = {
    "type": "function",            # extra wrapper layer
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {            # nested inside function
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}

Tool results also differ:

# Anthropic: tool result as content block in a user message
messages.append({
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": "toolu_01A...",
            "content": "15°C sunny"
        }
    ]
})

# OpenAI: tool result as a distinct role message
messages.append({
    "role": "tool",
    "tool_call_id": "call_abc123",
    "content": "15°C sunny"
})

Anthropic unifies everything under content blocks — messages are just arrays of typed blocks. OpenAI uses a dedicated tool role. Anthropic's approach is arguably more consistent; OpenAI's feels more familiar to developers who've used chat APIs with role-based message history.

If you're building a multi-model router that needs to handle both, this format difference is where most of the adapter complexity lands.

Error Handling Architecture: What's Shared and What Isn't

I printed both SDK error hierarchies from the sandbox directly.

# Anthropic error hierarchy (0.100.0)
APIError
├─ APIResponseValidationError
├─ APIWebhookValidationError
├─ APIStatusError
│   ├─ BadRequestError
│   ├─ AuthenticationError
│   ├─ PermissionDeniedError
│   ├─ NotFoundError
│   ├─ ConflictError
│   ├─ RequestTooLargeError     ← Anthropic-only
│   ├─ RateLimitError
│   ├─ ServiceUnavailableError  ← Anthropic-only
│   ├─ OverloadedError          ← Anthropic-only (HTTP 529)
│   ├─ DeadlineExceededError    ← Anthropic-only
│   └─ InternalServerError
└─ APIConnectionError
    └─ APITimeoutError

# OpenAI error hierarchy (2.36.0)
APIError
├─ APIResponseValidationError
├─ APIStatusError
│   ├─ BadRequestError
│   ├─ AuthenticationError
│   │   └─ OAuthError           ← OpenAI-only
│   ├─ PermissionDeniedError
│   ├─ NotFoundError
│   ├─ ConflictError
│   ├─ UnprocessableEntityError
│   ├─ RateLimitError
│   └─ InternalServerError
└─ APIConnectionError
    └─ APITimeoutError

The Anthropic-exclusive errors matter in production. OverloadedError is HTTP 529 — Claude server traffic overload. Catching it separately from RateLimitError lets you apply different backoff strategies. DeadlineExceededError is distinct from APITimeoutError — it's a processing time issue, not a connection timeout, which means the fix is different. RequestTooLargeError differs from context length errors.

from anthropic import OverloadedError, RateLimitError, DeadlineExceededError
import time

def safe_call(client, **kwargs):
    try:
        return client.messages.create(**kwargs)
    except OverloadedError:
        # Server overloaded — back off longer
        time.sleep(10)
        return client.messages.create(**kwargs)
    except RateLimitError as e:
        wait = int(e.response.headers.get('retry-after', 60))
        time.sleep(wait)
        return client.messages.create(**kwargs)
    except DeadlineExceededError:
        # Processing time exceeded — split into smaller requests
        raise

Both SDKs default to max_retries=2 and auto-retry on 429 and 5xx errors.

Streaming Pattern: Identical Core, Different Surface

I read both streaming implementations from source files in the sandbox. The Stream.__iter__ implementations are nearly identical.

# anthropic._streaming.Stream (actual source)
class Stream(Generic[_T], metaclass=_SyncStreamMeta):
    response: httpx.Response
    _decoder: SSEBytesDecoder

    def __iter__(self) -> Iterator[_T]:
        for item in self._iterator:
            yield item

    def _iter_events(self) -> Iterator[ServerSentEvent]:
        yield from self._decoder.iter_bytes(self.response.iter_bytes())

# openai._streaming.Stream (nearly identical)
class Stream(Generic[_T]):  # no metaclass
    response: httpx.Response
    _decoder: SSEBytesDecoder

    def __iter__(self) -> Iterator[_T]:
        for item in self._iterator:
            yield item

    def _iter_events(self) -> Iterator[ServerSentEvent]:
        yield from self._decoder.iter_bytes(self.response.iter_bytes())

The only structural difference is Anthropic uses _SyncStreamMeta metaclass. The usage APIs differ more meaningfully:

# Anthropic streaming
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for text in stream.text_stream:   # clean text extraction
        print(text, end="", flush=True)
    final = stream.get_final_message()

# OpenAI streaming
with client.chat.completions.stream(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content:             # more path traversal required
            print(delta.content, end="", flush=True)
    final = stream.get_final_completion()

stream.text_stream on the Anthropic side is ergonomically nicer for pure text output. The Vercel AI SDK approach to Claude streaming agents builds on a similar pattern and is worth reading if you're taking streaming to production.

Anthropic-Exclusive: Prompt Caching, Extended Thinking, Citations

Prompt Caching

CacheControlEphemeralParam has a ttl field — '5m' or '1h'. This was new to me. Previously ephemeral cache was the only option; now you can set expiry.

client.messages.create(
    model="claude-opus-4-7",
    system=[
        {
            "type": "text",
            "text": "Very long system document... (tens of thousands of tokens)",
            "cache_control": {"type": "ephemeral", "ttl": "1h"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize"}]
)

The practical impact is real — repeated API calls against the same large document see significant token cost reductions on cached portions. The Claude API Prompt Caching guide covers the four patterns for applying this in production.

Extended Thinking

ThinkingConfigParam and ThinkingBlock let Claude expose its reasoning chain as structured output. OpenAI has no equivalent.

response = client.messages.create(
    model="claude-opus-4-7",
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Complex math problem"}]
)

for block in response.content:
    if block.type == "thinking":
        print("Reasoning:", block.thinking)  # structured, typed
    elif block.type == "text":
        print("Answer:", block.text)

Citations

CitationContentBlockLocation tracks which document fragment Claude drew from. Useful for RAG systems where you need to surface source attribution alongside answers.

OpenAI-Exclusive: Assistants API, Realtime, Fine-Tuning

Assistants API — AssistantEventHandler enables stateful agents combining file search, code interpreter, and custom functions. Anthropic has no SDK-level abstraction equivalent.

Realtime API — that websocket_base_url parameter exists for a reason. WebSocket-based bidirectional streaming is supported directly by the SDK. Useful for voice agents and live interactive experiences.

Fine-tuning — OpenAI's fine_tuning module lets you manage tuning jobs from within the SDK. Anthropic's fine-tuning is a separate enterprise contract path with no public SDK interface.

OAuthError — a subtype of AuthenticationError, useful when your auth flow uses OAuth and you need to distinguish OAuth failures from standard key-based auth failures.

Which SDK to Pick

The honest answer: it depends on which model you're using, and which features you actually need.

	Anthropic SDK 0.100.0	OpenAI SDK 2.36.0
Exported types	408	230
Error classes	16 (includes 529)	13
Default max_retries	2	2
Streaming core	httpx + SSE	httpx + SSE
Prompt caching	✓ SDK-level	✗
Extended thinking	✓	✗
Realtime API	✗	✓
Assistants API	✗	✓
Fine-tuning	✗	✓
Citations	✓	✗
Tool result format	content block	tool role message

Pick Anthropic SDK when: your workload is long-context document processing (use prompt caching to cut costs), you need reasoning transparency (Extended Thinking), you're building document QA with citation tracking, or your team prioritizes type safety.

Pick OpenAI SDK when: you need voice interfaces or live interaction (Realtime API), you want the Assistants API's file search + code interpreter combination, you need org/project-level billing separation, or you're fine-tuning and managing custom models.

Running both? Put an abstraction layer in front. PydanticAI handles multi-provider routing at the agent level, which lets you avoid propagating tool format differences throughout your codebase.

What This Comparison Actually Reveals

An SDK isn't just a wrapper around an HTTP API. It's a design document that shows what the company thinks developers will do with the model.

Anthropic's 408 types tell you: we expect you to care about caching tokens, tracking citations, and introspecting reasoning chains. OpenAI's Realtime API and Assistants tell you: we expect you to build live user-facing experiences and stateful conversational systems.

I use the Anthropic SDK for Claude-based projects and pull in OpenAI only where I specifically need Realtime or fine-tuned models. When I need both in the same codebase, I isolate the SDK calls behind an interface layer so the tool format differences don't leak into business logic.

The SDK you pick should follow from the model choice, not lead it.

DEV Community