Mukunda Rao Katta

Posted on May 25

Stop Guessing Anthropic's Dict Shape for Image Blocks

#hermeschallenge #ai #python #agents

Twenty minutes I should not have spent

The task was simple. Add a screenshot to a tool result so the model could see what the UI looked like after running a click action.

I was already using httpx directly, no Anthropic SDK, posting raw JSON. I had the tool_result block working. The text part worked fine. Adding an image felt like a five-minute job.

It was not.

First attempt:

{
    "type": "tool_result",
    "tool_use_id": "toolu_01abc",
    "content": "image/png base64 data here"
}

That sent a string, not a list. The API rejected it.

Second attempt:

{
    "type": "tool_result",
    "tool_use_id": "toolu_01abc",
    "content": [
        {"type": "image", "data": base64_data, "media_type": "image/png"}
    ]
}

Wrong again. The data and media_type fields are not top-level on an image block. They live inside a source object.

Third attempt. I opened the API reference, searched for "image block", found the correct shape, typed it out.

{
    "type": "tool_result",
    "tool_use_id": "toolu_01abc",
    "content": [
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/png",
                "data": base64_data
            }
        }
    ]
}

That worked. Twenty minutes gone. And I had just memorized something I would forget the next time I needed it.

The shape of the fix

llm-content-blocks is a builder for those dicts. Nothing more.

from llm_content_blocks import image_block, tool_result_block, text_block

result = tool_result_block(
    tool_use_id="toolu_01abc",
    content=[
        text_block("Action completed successfully."),
        image_block(base64_data="iVBORw0KGgo...", media_type="image/png"),
    ]
)

The output is the exact dict the API expects. No SDK. No magic. You pass it directly to your HTTP client.

import httpx

response = httpx.post(
    "https://api.anthropic.com/v1/messages",
    headers={"x-api-key": api_key, "anthropic-version": "2023-06-01"},
    json={
        "model": "claude-sonnet-4-6",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": [result]
            }
        ]
    }
)

All five block types are covered.

from llm_content_blocks import (
    text_block,
    image_block,
    image_url_block,
    tool_use_block,
    tool_result_block,
    document_block,
)

# Text
text_block("Hello, model.")

# Image from base64
image_block(base64_data="...", media_type="image/jpeg")

# Image from URL
image_url_block(url="https://example.com/chart.png")

# Tool use (what the model sends when calling a tool)
tool_use_block(
    id="toolu_01xyz",
    name="get_weather",
    input={"city": "Austin"}
)

# Tool result (what you send back)
tool_result_block(
    tool_use_id="toolu_01xyz",
    content=[text_block("72F, sunny")]
)

# Document (PDF or plain text)
document_block(base64_data="...", media_type="application/pdf")

Each function returns a plain Python dict. You can inspect it, serialize it, log it, or pass it straight to json.dumps.

What it does NOT do

It does not make HTTP requests. There is no client.messages.create() here.
It does not validate your API key or handle auth.
It does not parse model responses. It only builds request payloads.
It does not depend on the Anthropic SDK. Zero dependencies, zero transitive installs.

If you need the full SDK, use the full SDK. This library is for the cases where you do not want it.

Inside the lib: SDK-free design

The Anthropic SDK is useful. It handles auth, retries, streaming, type-safe response parsing. But it also pulls in httpx, pydantic, typing-extensions, and a handful of other packages. On a standard Lambda, that is a meaningful cold start hit.

llm-content-blocks is 26 tests and a single module with no imports beyond the standard library. The install is about 8KB. It fits in an Edge function, a Lambda Layer, or any environment where you want to keep dependencies minimal.

The design tradeoff is explicit. The library knows one thing: what dicts the Anthropic wire format expects. It does not know how to send them. That boundary is intentional.

The source of truth is the Anthropic Messages API reference. The library encodes that reference as builders so you do not have to re-read it every time.

# What image_block actually does under the hood
def image_block(base64_data: str, media_type: str) -> dict:
    return {
        "type": "image",
        "source": {
            "type": "base64",
            "media_type": media_type,
            "data": base64_data,
        }
    }

No magic. The value is that you do not have to write this yourself, and the next engineer reading the code does not have to wonder if data belongs at the top level or inside source.

The 26 tests cover every block type and the common edge cases: empty content lists on tool results, URL vs. base64 image sources, nested content arrays. The test file doubles as the best documentation for expected output shapes.

# From the test suite
def test_tool_result_with_image():
    result = tool_result_block(
        tool_use_id="toolu_abc",
        content=[
            text_block("Screenshot taken."),
            image_block(base64_data="abc123", media_type="image/png"),
        ]
    )
    assert result["type"] == "tool_result"
    assert result["content"][1]["source"]["type"] == "base64"
    assert result["content"][1]["source"]["data"] == "abc123"

Reading the tests tells you the exact output shape for every case. No API docs required.

When this is useful

You are posting raw JSON with requests or httpx. You do not want the SDK weight but you still want correct dict shapes.

Lambda cold starts are a constraint. Every extra package adds latency. This library adds none.

Edge functions or WASM environments. Minimal environments where native extensions or heavy dependency trees are not an option.

Prototyping. You want to try the API quickly without setting up the full SDK.

Testing. Your test fixtures need realistic content block structures. Build them with the same functions your production code uses.

Multi-provider codebases. You are already using a generic HTTP client for OpenAI and want to keep a consistent pattern for Anthropic calls.

When NOT to use this

If you are using the Anthropic SDK and happy with it, keep using it. The SDK's type annotations, response parsing, and streaming support are genuinely good. This library adds nothing over the SDK if you are already using it.

If you need response parsing, token counting, streaming, or retry logic, this library does not provide those. Look at the siblings table below or use the SDK.

Install

pip install llm-content-blocks

Zero dependencies. Works on Python 3.8 and above.

# verify
python -c "from llm_content_blocks import text_block; print(text_block('ok'))"
# {'type': 'text', 'text': 'ok'}

Siblings

These libraries pair well with llm-content-blocks at adjacent boundaries.

Lib	Boundary	Repo
agent-message-window	Manages the message list these blocks go into, with paired-tool-call protection	MukundaKatta/agent-message-window
tool-output-format	Formats tool output as LLM-friendly markdown before you wrap it in a `tool_result` block	MukundaKatta/tool-output-format
llm-content-blocks-rs	Rust port with the same 30 tests and serde-derive serialization to the Anthropic shape	MukundaKatta/llm-content-blocks-rs
agenttap	Wire-level prompt introspection, lets you see the exact JSON these blocks produce on the wire	MukundaKatta/agenttap

What's next

The current builders cover every block type in the Messages API as of this writing. A few things would extend the library without breaking the zero-dependency contract:

A validation mode that checks required fields before you send, catching mistakes like a missing tool_use_id on a tool_result at build time instead of getting a 400 back from the API.

A from_sdk_response adapter that takes an Anthropic SDK response object and extracts the content blocks as plain dicts, useful if you want to log or transform model outputs without keeping an SDK dependency in your logging pipeline.

The library is at MukundaKatta/llm-content-blocks. Issues and pull requests are open.

DEV Community