DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude Vision and Multimodal Guide: Images, PDFs, and Documents (2026)

Originally published at claudeguide.io/claude-vision-multimodal-guide

Claude Vision and Multimodal Guide: Images, PDFs, and Documents (2026)

Claude's vision capabilities let you send images, screenshots, PDFs, and documents directly to the API — pass them as base64-encoded content or URLs in the messages array, and Claude analyzes them alongside text with no additional configuration in 2026. Claude 3.5 models support vision natively with a 200K token context window. This guide covers every input type with working Python examples and production patterns.


What Claude Can Analyze

Claude's multimodal input handles:

  • Images: JPEG, PNG, GIF, WebP
  • PDFs: Native PDF support in Claude 3.5+ models
  • Screenshots: UI analysis, bug reports, test verification
  • Charts and graphs: Data extraction and interpretation
  • Diagrams: Architecture diagrams, flowcharts, ERDs
  • Handwriting: Reasonably accurate OCR of handwritten text
  • Code in images: Extract and explain code from screenshots

Sending Images via URL

The simplest approach — if your image is publicly accessible:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/chart.png"
                    }
                },
                {
                    "type": "text",
                    "text": "What does this chart show? Extract the key data points."
                }
            ]
        }
    ]
)

print(response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Sending Images via Base64

For local files or private images:

import base64
import anthropic

def encode_image(image_path: str) -

---

## PDF Analysis

Claude 3.5 models support native PDF input:

Enter fullscreen mode Exit fullscreen mode


python
import base64

def encode_pdf(pdf_path: str) -

Top comments (0)