Give Your Agent a Scratchpad: Keyed In-Memory Notes Between Tool Calls

#hermeschallenge #ai #python #agents

Complex agent tasks span multiple tool calls. The first call retrieves a list. The second call processes each item. The third call aggregates results. Somewhere in between, the agent needs to remember intermediate state.

The naive approach is to put everything into the conversation history as tool results. This works until the context window fills up, or until you need to access state from three tool calls ago without passing it through every intermediate step.

agent-scratchpad is a keyed in-memory notepad that persists for the duration of an agent session.

The Shape of the Fix

from agent_scratchpad import Scratchpad

pad = Scratchpad()

# Tool call 1: retrieve customer list
def get_customers(segment: str) -> list:
    customers = db.query(segment=segment)
    pad.set("customer_count", len(customers))
    pad.set("customer_ids", [c.id for c in customers])
    return customers

# Tool call 3: generate report (doesn't need customers passed explicitly)
def generate_summary() -> str:
    count = pad.get("customer_count", default=0)
    ids = pad.get("customer_ids", default=[])
    return f"Processed {count} customers: {ids[:5]}..."

# Check what's in the pad
print(pad.keys())   # ["customer_count", "customer_ids"]
print(pad.all())    # {"customer_count": 42, "customer_ids": [...]}

State is stored by key, not passed through every function signature. Tools that produce intermediate state can set it. Tools that consume it can get it. The conversation history stays clean.

What It Does NOT Do

agent-scratchpad does not persist across sessions. When the agent process ends, the scratchpad is gone. For cross-session state, use conversation-codec or agent-state-checkpoint.

It does not synchronize across multiple concurrent agents. Each Scratchpad instance is independent. If you have multiple agents running simultaneously and need shared state, you need a shared store.

It does not have a maximum size by default. If tools keep writing to the scratchpad without clearing, it grows indefinitely. Add size limits explicitly with Scratchpad(max_keys=100).

Inside the Library

The implementation is a dict with a few helpers:

class Scratchpad:
    def __init__(self, max_keys: int | None = None):
        self._data: dict = {}
        self._max_keys = max_keys

    def set(self, key: str, value) -> None:
        if self._max_keys and len(self._data) >= self._max_keys and key not in self._data:
            raise ScratchpadFullError(f"Max keys ({self._max_keys}) reached")
        self._data[key] = value

    def get(self, key: str, default=None):
        return self._data.get(key, default)

    def pop(self, key: str, default=None):
        return self._data.pop(key, default)

    def keys(self) -> list[str]:
        return list(self._data.keys())

    def all(self) -> dict:
        return dict(self._data)

    def clear(self) -> None:
        self._data.clear()

pop() is important for ephemeral state: after consuming a value, remove it so the scratchpad does not accumulate stale data.

Namespaces: pad.namespace("retrieval") returns a NamespacedScratchpad that prefixes all keys with retrieval:. Useful for tools that belong to a logical phase of the agent loop — they can clear their namespace without affecting other tools' state.

When to Use It

Use it when your agent loop has distinct phases that pass data forward. Retrieval phases that collect data, processing phases that transform it, aggregation phases that summarize it. The scratchpad is the coordination point between phases.

It is also useful for accumulating partial results. If your agent processes 20 items in separate tool calls and needs to aggregate them at the end, appending to a scratchpad list is cleaner than passing a growing result list through every tool call.

Skip it for simple agents with few tool calls and no inter-tool state. If each tool call is independent and the LLM context history is sufficient for coordination, the scratchpad adds structure without value.

Install

pip install git+https://github.com/MukundaKatta/agent-scratchpad

from agent_scratchpad import Scratchpad

# Inject into your tool definitions
pad = Scratchpad()

tools = {
    "search_web": lambda q: search_and_record(q, pad),
    "extract_facts": lambda text: extract_and_record(text, pad),
    "write_report": lambda: write_from_scratchpad(pad),
}

def search_and_record(query: str, pad: Scratchpad) -> list[dict]:
    results = web_search(query)
    pad.set("last_search_query", query)
    pad.set("search_result_count", len(results))
    # Accumulate across multiple searches
    all_results = pad.get("all_results", default=[])
    pad.set("all_results", all_results + results)
    return results

def write_from_scratchpad(pad: Scratchpad) -> str:
    all_results = pad.get("all_results", default=[])
    count = pad.get("search_result_count", default=0)
    query = pad.get("last_search_query", default="unknown")
    return f"Based on {count} results for '{query}':\n" + format_results(all_results)

Sibling Libraries

Library	What it solves
`agent-state-checkpoint`	Durable JSON checkpoint that persists across restarts
`conversation-codec`	Persist full conversation history between sessions
`agent-message-window`	Manage conversation history for context window
`agent-fn-registry`	Central registry for tool functions and metadata
`agent-context-builder`	Compose system prompts from named sections

The scratchpad is ephemeral by design. For durable state, agent-state-checkpoint saves a full state dict to disk. The two are complementary: use the scratchpad for intra-session coordination, checkpoints for cross-session persistence.

What's Next

Typed access would help with safety. A TypedScratchpad[T] that validates values on set() and coerces on get() would prevent silent type errors when a downstream tool expects a list and finds a string.

Expiring keys are useful for state that becomes stale after a few tool calls. A set_with_ttl(key, value, ttl_calls=3) that auto-removes the key after N subsequent set() calls would handle the common case of ephemeral intermediate state.

A diff method: pad.diff(snapshot) that returns keys added, changed, and removed since the snapshot was taken. Useful for debugging complex agent loops where you want to track how the scratchpad evolves across tool calls.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.