DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude API Python SDK: Complete Quickstart (2026)

Originally published at claudeguide.io/claude-api-python-sdk-quickstart

Claude API Python SDK: Complete Quickstart (2026)

The Anthropic Python SDK is the fastest way to get Claude running in your code. This guide covers everything from installation to the patterns you'll use in every real project in 2026.

Install

pip install anthropic
Enter fullscreen mode Exit fullscreen mode

Or with uv (faster):

uv add anthropic
Enter fullscreen mode Exit fullscreen mode

Authentication

Get your API key from console.anthropic.com → API Keys → Create Key.

The SDK reads the key from the ANTHROPIC_API_KEY environment variable automatically:

export ANTHROPIC_API_KEY="sk-ant-..."
Enter fullscreen mode Exit fullscreen mode

Or pass it explicitly:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")
Enter fullscreen mode Exit fullscreen mode

For production, use environment variables. Never hardcode keys.

Your first message

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(message.content[0].text)
# Paris
Enter fullscreen mode Exit fullscreen mode

The response is a Message object. The text lives at message.content[0].text.

The response object

Understanding the full response structure:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

print(message.id)              # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model)           # claude-3-5-sonnet-20241022
print(message.stop_reason)     # end_turn
print(message.usage.input_tokens)   # 10
print(message.usage.output_tokens)  # 25
print(message.content[0].type)      # text
print(message.content[0].text)      # Hello! How can I help you today?
Enter fullscreen mode Exit fullscreen mode

System prompts

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful assistant that responds only in haiku.",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ]
)
Enter fullscreen mode Exit fullscreen mode

The system parameter takes a string. It sets Claude's persona, task context, and constraints for the entire conversation.

Multi-turn conversations

messages = []

# Turn 1
messages.append({"role": "user", "content": "My name is Alex."})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})

# Turn 2
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
print(response.content[0].text)
# Your name is Alex.
Enter fullscreen mode Exit fullscreen mode

The API is stateless — you send the full conversation history each time. Build a list, append user and assistant turns, send the list.

Streaming

For long responses, stream tokens as they arrive:

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Write a short story about a robot."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the full message after streaming completes
final = stream.get_final_message()
print(f"\nTokens used: {final.usage.input_tokens + final.usage.output_tokens}")
Enter fullscreen mode Exit fullscreen mode

Async support

For async applications (FastAPI, async scripts):

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

For async streaming:

async def stream_response():
    async with client.messages.stream(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Count to 10"}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Choosing the right model

Model Use case Cost
claude-3-5-haiku-20241022 High-volume, speed-sensitive tasks Cheapest
claude-3-5-sonnet-20241022 Most tasks — best balance Mid
claude-3-7-sonnet-20250219 Extended thinking, complex reasoning Higher
claude-opus-4 Hardest tasks, highest quality Most expensive

Start with Haiku for anything repetitive. Use Sonnet as default. Only reach for Opus or extended thinking when Sonnet falls short.

Error handling

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.APIConnectionError:
    print("Network error — check your connection")
except anthropic.RateLimitError:
    print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")
Enter fullscreen mode Exit fullscreen mode

The SDK has built-in retry logic for transient errors (429s, 529s). By default it retries 2 times with exponential backoff:

# Configure retries
client = anthropic.Anthropic(
    max_retries=3,          # default is 2
    timeout=60.0,           # seconds
)
Enter fullscreen mode Exit fullscreen mode

Tracking token usage and cost

message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this article..."}]
)

input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens

# Haiku pricing (April 2026)
HAIKU_INPUT = 1.00 / 1_000_000   # $1.00 per 1M input tokens
HAIKU_OUTPUT = 5.00 / 1_000_000  # $5.00 per 1M output tokens

cost = (input_tokens * HAIKU_INPUT) + (output_tokens * HAIKU_OUTPUT)
print(f"This call cost: ${cost:.6f}")
Enter fullscreen mode Exit fullscreen mode

Prompt caching (50-90% cost reduction)

For repeated system prompts or document analysis, add cache_control to cache the expensive prefix:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": very_long_document,  # Only charged on cache miss
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize the key points."}]
)
Enter fullscreen mode Exit fullscreen mode

First call writes the cache. Subsequent calls with the same prefix get a 90% discount on those tokens. See the prompt caching cost analysis for the break-even math.

A minimal production wrapper


python
import anthropic
from typing import Optional

client = anthropic.Anthropic()

def ask_claude(
    prompt: str,
    system: Optional[str] = None,
    model: str = "claude-3-5-haiku-20241022",
    max_tokens: int = 1024,
) -

PDF guide + Excel cost calculator.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-api-python-sdk-quickstart)

*30-day money-back guarantee. Instant download.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)