Originally published at claudeguide.io/claude-vision-multimodal-guide
Claude Vision and Multimodal Guide: Images, PDFs, and Documents (2026)
Claude's vision capabilities let you send images, screenshots, PDFs, and documents directly to the API — pass them as base64-encoded content or URLs in the messages array, and Claude analyzes them alongside text with no additional configuration in 2026. Claude 3.5 models support vision natively with a 200K token context window. This guide covers every input type with working Python examples and production patterns.
What Claude Can Analyze
Claude's multimodal input handles:
- Images: JPEG, PNG, GIF, WebP
- PDFs: Native PDF support in Claude 3.5+ models
- Screenshots: UI analysis, bug reports, test verification
- Charts and graphs: Data extraction and interpretation
- Diagrams: Architecture diagrams, flowcharts, ERDs
- Handwriting: Reasonably accurate OCR of handwritten text
- Code in images: Extract and explain code from screenshots
Sending Images via URL
The simplest approach — if your image is publicly accessible:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/chart.png"
}
},
{
"type": "text",
"text": "What does this chart show? Extract the key data points."
}
]
}
]
)
print(response.content[0].text)
Sending Images via Base64
For local files or private images:
import base64
import anthropic
def encode_image(image_path: str) -
---
## PDF Analysis
Claude 3.5 models support native PDF input:
python
import base64
def encode_pdf(pdf_path: str) -
Top comments (0)