Sangmin Lee

Posted on May 27 • Originally published at claudeguide.io

Claude API Python SDK: Complete Quickstart (2026)

#python #sdk #quickstart

Originally published at claudeguide.io/claude-api-python-sdk-quickstart

Claude API Python SDK: Complete Quickstart (2026)

The Anthropic Python SDK is the fastest way to get Claude running in your code. This guide covers everything from installation to the patterns you'll use in every real project in 2026.

Install

pip install anthropic

Or with uv (faster):

uv add anthropic

Authentication

Get your API key from console.anthropic.com → API Keys → Create Key.

The SDK reads the key from the ANTHROPIC_API_KEY environment variable automatically:

export ANTHROPIC_API_KEY="sk-ant-..."

Or pass it explicitly:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

For production, use environment variables. Never hardcode keys.

Your first message

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(message.content[0].text)
# Paris

The response is a Message object. The text lives at message.content[0].text.

The response object

Understanding the full response structure:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

print(message.id)              # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model)           # claude-3-5-sonnet-20241022
print(message.stop_reason)     # end_turn
print(message.usage.input_tokens)   # 10
print(message.usage.output_tokens)  # 25
print(message.content[0].type)      # text
print(message.content[0].text)      # Hello! How can I help you today?

System prompts

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful assistant that responds only in haiku.",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ]
)

The system parameter takes a string. It sets Claude's persona, task context, and constraints for the entire conversation.

Multi-turn conversations

messages = []

# Turn 1
messages.append({"role": "user", "content": "My name is Alex."})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})

# Turn 2
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
print(response.content[0].text)
# Your name is Alex.

The API is stateless — you send the full conversation history each time. Build a list, append user and assistant turns, send the list.

Streaming

For long responses, stream tokens as they arrive:

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Write a short story about a robot."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the full message after streaming completes
final = stream.get_final_message()
print(f"\nTokens used: {final.usage.input_tokens + final.usage.output_tokens}")

Async support

For async applications (FastAPI, async scripts):

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

asyncio.run(main())

For async streaming:

async def stream_response():
    async with client.messages.stream(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Count to 10"}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)

Choosing the right model

Model	Use case	Cost
`claude-3-5-haiku-20241022`	High-volume, speed-sensitive tasks	Cheapest
`claude-3-5-sonnet-20241022`	Most tasks — best balance	Mid
`claude-3-7-sonnet-20250219`	Extended thinking, complex reasoning	Higher
`claude-opus-4`	Hardest tasks, highest quality	Most expensive

Start with Haiku for anything repetitive. Use Sonnet as default. Only reach for Opus or extended thinking when Sonnet falls short.

Error handling

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.APIConnectionError:
    print("Network error — check your connection")
except anthropic.RateLimitError:
    print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

The SDK has built-in retry logic for transient errors (429s, 529s). By default it retries 2 times with exponential backoff:

# Configure retries
client = anthropic.Anthropic(
    max_retries=3,          # default is 2
    timeout=60.0,           # seconds
)

Tracking token usage and cost

message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this article..."}]
)

input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens

# Haiku pricing (April 2026)
HAIKU_INPUT = 1.00 / 1_000_000   # $1.00 per 1M input tokens
HAIKU_OUTPUT = 5.00 / 1_000_000  # $5.00 per 1M output tokens

cost = (input_tokens * HAIKU_INPUT) + (output_tokens * HAIKU_OUTPUT)
print(f"This call cost: ${cost:.6f}")

Prompt caching (50-90% cost reduction)

For repeated system prompts or document analysis, add cache_control to cache the expensive prefix:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": very_long_document,  # Only charged on cache miss
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize the key points."}]
)

First call writes the cache. Subsequent calls with the same prefix get a 90% discount on those tokens. See the prompt caching cost analysis for the break-even math.

A minimal production wrapper


python
import anthropic
from typing import Optional

client = anthropic.Anthropic()

def ask_claude(
    prompt: str,
    system: Optional[str] = None,
    model: str = "claude-3-5-haiku-20241022",
    max_tokens: int = 1024,
) -

PDF guide + Excel cost calculator.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-api-python-sdk-quickstart)

*30-day money-back guarantee. Instant download.*

DEV Community

Claude API Python SDK: Complete Quickstart (2026)

Claude API Python SDK: Complete Quickstart (2026)

Install

Authentication

Your first message

The response object

System prompts

Multi-turn conversations

Streaming

Async support

Choosing the right model

Error handling

Tracking token usage and cost

Prompt caching (50-90% cost reduction)

A minimal production wrapper

Top comments (0)