DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)

Originally published at claudeguide.io/claude-python-sdk-guide

Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)

The Claude Python SDK (anthropic) lets you call Claude's API in Python with a clean, typed interface — install with pip install anthropic, set your ANTHROPIC_API_KEY, and you're sending messages in under 10 lines of code. This tutorial covers everything from installation to advanced features: streaming, tool use, prompt caching, and multi-turn conversations. All code examples are tested against the current API.


Installation and Authentication

pip install anthropic
Enter fullscreen mode Exit fullscreen mode

Set your API key as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."
Enter fullscreen mode Exit fullscreen mode

Or load it in Python via .env:

pip install python-dotenv
Enter fullscreen mode Exit fullscreen mode
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")
Enter fullscreen mode Exit fullscreen mode

Your First API Call

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain prompt caching in one paragraph."}
    ]
)

print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

This is the minimal working example. The client reads ANTHROPIC_API_KEY from your environment automatically. No additional configuration needed.


Understanding the Response Object

print(message.id)           # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model)        # claude-sonnet-4-5
print(message.stop_reason) # end_turn
print(message.usage.input_tokens)   # 23
print(message.usage.output_tokens)  # 120
print(message.content[0].text)      # The actual response text
Enter fullscreen mode Exit fullscreen mode

message.usage is critical for cost tracking. Input tokens × model price + output tokens × model price = your cost per call. See Claude API Cost and Prompt Caching Break-Even for the full pricing breakdown.


System Prompts

System prompts set the persona and behavior for the entire conversation:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a senior Python developer. Answer concisely with code examples.",
    messages=[
        {"role": "user", "content": "How do I handle rate limit errors?"}
    ]
)
Enter fullscreen mode Exit fullscreen mode

Multi-Turn Conversations

Build conversation history by appending messages:

conversation = []

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=conversation
    )

    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Multi-turn exchange
print(chat("What is the difference between a list and a tuple in Python?"))
print(chat("When would you choose a tuple over a list?"))
print(chat("Give me a code example with both."))
Enter fullscreen mode Exit fullscreen mode

The conversation list acts as the full history. Claude has no persistent memory — you manage state client-side.


Streaming Responses

For long outputs or interactive UIs, use streaming to display text as it arrives:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a detailed explanation of async/await in Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Streaming benchmark: First token typically arrives in 300-500ms for Sonnet. Without streaming, you wait for the full response (can be 5-15 seconds for long outputs). For any user-facing application, streaming is strongly recommended.



Tool Use (Function Calling)

Tool use lets Claude call your Python functions when it needs to:

tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a given ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL"
                }
            },
            "required": ["ticker"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the current price of Apple stock?"}]
)

# Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    print(f"Claude wants to call: {tool_call.name}")
    print(f"With inputs: {tool_call.input}")

    # Call your actual function
    result = get_stock_price(tool_call.input["ticker"])  # your function

    # Return the result to Claude
    final_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the current price of Apple stock?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_call.id,
                    "content": str(result)
                }]
            }
        ]
    )
    print(final_response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

For a deeper dive on tool use patterns, see Claude Agent SDK Guide.


Prompt Caching

Prompt caching dramatically reduces cost when you reuse large system prompts or document context:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code review assistant. [... large instructions ...]",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Review this function: [code]"}]
)

# Check cache status
print(response.usage.cache_creation_input_tokens)  # First call: tokens written to cache
print(response.usage.cache_read_input_tokens)       # Subsequent calls: tokens read from cache
Enter fullscreen mode Exit fullscreen mode

Cached tokens cost 90% less than standard input tokens. For a system prompt over 2,000 tokens, caching pays for itself after 1 call.


Error Handling

import anthropic
from anthropic import APIError, RateLimitError, APIConnectionError

client = anthropic.Anthropic()

def call_claude_safely(prompt: str, retries: int = 3):
    for attempt in range(retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError:
            if attempt < retries - 1:
                import time
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise
        except APIConnectionError as e:
            print(f"Connection error: {e}")
            raise
        except APIError as e:
            print(f"API error {e.status_code}: {e.message}")
            raise
Enter fullscreen mode Exit fullscreen mode

Async Usage

For async applications (FastAPI, async scripts):

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello, Claude!"}]
    )

    print(message.content[0].text)

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Use AsyncAnthropic for any async context. The interface is identical to the sync client, with await added.


Counting Tokens Before Calling

Estimate cost before sending a large prompt:

response = client.messages.count_tokens(
    model="claude-sonnet-4-5",
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Your large prompt here..."}]
)

print(f"Input tokens: {response.input_tokens}")
Enter fullscreen mode Exit fullscreen mode

This uses the same tokenizer as the actual API call, so the count is accurate.


Model Selection

# For most tasks — best balance of capability and cost
model = "claude-sonnet-4-5"

# For simple, high-volume tasks (10x cheaper than Sonnet)
model = "claude-haiku-4-5"

# For complex reasoning (3x more expensive than Sonnet)
model = "claude-opus-4-5"
Enter fullscreen mode Exit fullscreen mode

See Claude Haiku vs Sonnet vs Opus: Which Model to Use for the full comparison with cost benchmarks.


Frequently Asked Questions

How do I install the Claude Python SDK?

Run pip install anthropic. Then set your API key: export ANTHROPIC_API_KEY="sk-ant-...". You can also pass the key directly: client = anthropic.Anthropic(api_key="sk-ant-..."), but environment variables are preferred for security.

What Python version does the Anthropic SDK support?

The anthropic package supports Python 3.7 and above. Python 3.9+ is recommended for full type hint support and best performance.

How do I handle long documents that exceed the context window?

For documents that exceed 200K tokens, split them into chunks and process sequentially, or use prompt caching with the cache_control parameter to cache a shared context. The 1M token context window is available on Claude 3.5 models — see the Claude 1M Context Window guide for details.

Is the Python SDK thread-safe?

The synchronous Anthropic client is thread-safe. You can share a single client instance across threads. For async applications, use AsyncAnthropic and await each call.

How do I track API costs in Python?

Use response.usage.input_tokens and response.usage.output_tokens from each response. Multiply by the per-token price for your model. Log these to a database or monitoring service. The Claude API Cost guide has current per-token prices.

What's the difference between stream=True and client.messages.stream()?

client.messages.stream() is the recommended streaming interface — it returns a context manager with a text_stream iterator. The lower-level stream=True parameter on create() returns raw Server-Sent Events. Use client.messages.stream() for most use cases.


Top comments (0)