DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Sanitize Your LLM Message Lists Before Every API Call

LLM message lists are fragile. Providers are strict about what they accept. Two consecutive assistant messages cause an error. A tool_use block without a matching tool_result causes an error. A message with an empty content list causes an error. An alternating user/assistant pattern violated by a missing role causes an error.

These errors happen more than you think, especially in agents that build message lists dynamically.

agent-message-sanitize validates and fixes message lists before they go to the API.


The Shape of the Fix

from agent_message_sanitize import sanitize, SanitizeResult

raw_messages = load_messages_from_history()
result: SanitizeResult = sanitize(raw_messages, provider="anthropic")

if result.warnings:
    for w in result.warnings:
        logger.warning("message_sanitize", warning=w)

# Use the clean messages
response = client.messages.create(
    model="claude-sonnet-4-6",
    messages=result.messages,
    max_tokens=1024,
)
Enter fullscreen mode Exit fullscreen mode

SanitizeResult.messages is the fixed message list. SanitizeResult.warnings lists what was changed and why.


What It Does NOT Do

agent-message-sanitize does not validate message content. It validates message structure. A message with valid structure but nonsensical content passes sanitization.

It does not fix all possible issues. If your message list is so corrupted that the correct structure is ambiguous, the sanitizer raises SanitizeError rather than guessing.

It does not modify the original list in place. result.messages is a new list. The original is not modified.


Inside the Library

The sanitizer runs a sequence of fixes in order:

  1. Remove consecutive same-role messages: merge two consecutive user messages into one; merge two consecutive assistant messages into one.

  2. Fix empty content: a message with content: [] or content: "" is removed. A message with content: [{}] (empty block) has the empty block removed.

  3. Pair tool_use and tool_result: ensure every tool_use in an assistant message has a corresponding tool_result in the next user message. If unpaired, the tool_use block is removed (the result is already missing, so the pair is unrecoverable).

  4. Enforce alternation: after all other fixes, the message list must alternate user/assistant starting with user. Violations are reported as warnings; automatic fix merges or removes the offending message.

  5. Remove trailing assistant messages: a message list ending with an assistant message is valid for the API but may cause confusion. Optional: strip it with strip_trailing_assistant=True.

FIXES = [
    merge_consecutive_same_role,
    remove_empty_content,
    pair_tool_use_tool_result,
    enforce_alternation,
]
Enter fullscreen mode Exit fullscreen mode

Provider-specific: provider="anthropic" applies Anthropic-specific rules (tool pairing requirements, content block format). provider="openai" applies OpenAI-specific rules.


When to Use It

Use it as a pre-flight check before every API call that uses a dynamically constructed message list. Agent loops, conversation replay, context window management that trims messages.

Context window management is where sanitization matters most. When you trim old messages to fit the context window, you may cut a tool_use block whose tool_result is in the trimmed range. The sanitizer catches this and removes the now-unpaired tool_use.

Skip it for simple conversations with one user message and one assistant response. Sanitization adds a pass over the message list; it is not free. For critical paths with millions of calls, profile whether it matters.


Install

pip install git+https://github.com/MukundaKatta/agent-message-sanitize
Enter fullscreen mode Exit fullscreen mode
from agent_message_sanitize import sanitize, SanitizeError
from agent_message_window import MessageWindow

window = MessageWindow(max_tokens=180_000, model="claude-sonnet-4-6")

def prepare_messages(full_history: list[dict]) -> list[dict]:
    # Trim to context window
    windowed = window.fit(full_history)

    # Sanitize after trimming (trimming may break pairs)
    try:
        result = sanitize(windowed, provider="anthropic")
        if result.warnings:
            logger.info("sanitize_warnings", warnings=result.warnings)
        return result.messages
    except SanitizeError as e:
        logger.error("sanitize_failed", error=str(e))
        # Fallback: use only the last user message
        user_msgs = [m for m in windowed if m["role"] == "user"]
        return user_msgs[-1:] if user_msgs else []
Enter fullscreen mode Exit fullscreen mode

Sibling Libraries

Library What it solves
agent-message-window Sliding context window with tool pair preservation
agentfit Fit message history into token budget
conversation-codec Persist and load conversation history
llm-output-validator Validate LLM output after the call
agentvet Validate tool call arguments before execution

The natural pipeline: conversation-codec loads history, agent-message-window trims to context window, agent-message-sanitize fixes structural issues, then the API call. Three composable steps, each handling one concern.


What's Next

Schema validation for message content: validate that content blocks have required fields (type, text for text blocks, type/name/input for tool_use, etc.). Right now the sanitizer only checks structure, not content shape.

Provider-specific rule plugins: a plugin interface that lets teams add their own provider rules without patching the library. Useful for enterprise teams that use fine-tuned or hosted models with stricter message format requirements.

A diff output: alongside warnings, a changes list that specifies exactly what was removed, merged, or reordered. This is more actionable than a warning string for automated monitoring.


Built as part of the agent-stack family: composable Python primitives for production LLM agents.

Top comments (0)